summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281)Simon Pilgrim2018-10-151-137/+17
| | | | | | | | | | As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type. This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case...... Differential Revision: https://reviews.llvm.org/D53257 llvm-svn: 344512
* [ARM] Regenerate cttz testsSimon Pilgrim2018-10-141-136/+283
| | | | | | Improve codegen view as part of PR32655 llvm-svn: 344479
* [ARM] Regenerate popcnt testsSimon Pilgrim2018-10-131-54/+257
| | | | | | Improve codegen view as part of PR32655 llvm-svn: 344465
* [ARM] Account for implicit IT when calculating inline asm sizePeter Smith2018-10-081-0/+47
| | | | | | | | | | | | | | | | | | | | | When deciding if it is safe to optimize a conditional branch to a CBZ or CBNZ the offsets of the BasicBlocks from the start of the function are estimated. For inline assembly the generic getInlineAsmLength() function is used to get a worst case estimate of the inline assembly by multiplying the number of instructions by the max instruction size of 4 bytes. This unfortunately doesn't take into account the generation of Thumb implicit IT instructions. In edge cases such as when all the instructions in the block are 4-bytes in size and there is an implicit IT then the size is underestimated. This can cause an out of range CBZ or CBNZ to be generated. The patch takes a conservative approach and assumes that every instruction in the inline assembly block may have an implicit IT. Fixes pr31805 Differential Revision: https://reviews.llvm.org/D52834 llvm-svn: 343960
* [ARM] Fix correctness checks in promoteToConstantPool.Eli Friedman2018-09-282-16/+23
| | | | | | | | | | | | | | | | | Correctly check for relocations in the constant to promote. And don't allow promoting a constant multiple times. This partially fixes https://bugs.llvm.org//show_bug.cgi?id=32780 ; it's not a complete fix because we also need to prevent ARMConstantIslands from cloning the constant. (-arm-promote-constant is currently off by default, and it stays off with this patch. I'll look into turning it on again when all the known issues are fixed.) Differential Revision: https://reviews.llvm.org/D51472 llvm-svn: 343361
* [ARM] Use preferred alignment for constants in promoteToConstantPool.Eli Friedman2018-09-281-0/+11
| | | | | | | | | | | | | | | This mostly affects IR generated by non-clang frontends because clang generally sets the alignment of globals explicitly. Fixes https://bugs.llvm.org//show_bug.cgi?id=32394 . (-arm-promote-constant is currently off by default, and it stays off with this patch. I'll look into turning it on again when all the known issues are fixed.) Differential Revision: https://reviews.llvm.org/D51469 llvm-svn: 343359
* [ARM] Allow execute only code on Cortex-m23David Spickett2018-09-283-0/+4
| | | | | | | | | | | The NoMovt feature prevents the use of MOVW/MOVT instructions on Cortex-M23 for performance reasons. These instructions are required for execute only code so NoMovt should be disabled when that option is enabled. Differential Revision: https://reviews.llvm.org/D52551 llvm-svn: 343302
* [CodeGen] Always print register ties in MI::dump()Francis Visoiu Mistrih2018-09-262-2/+2
| | | | | | | | | It was the case when calling MO::dump(), but MI::dump() was still depending on hasComplexRegisterTies(). The MIR output is not affected. llvm-svn: 343107
* [ARM] Fix for PR39060Sam Parker2018-09-263-9/+182
| | | | | | | | | | | | | When calculating whether a value can safely overflow for use by an icmp, we weren't checking that the value couldn't wrap around. To do this we need the icmp to be using a constant, as well as the incoming add or sub. bugzilla report: https://bugs.llvm.org/show_bug.cgi?id=39060 Differential Revision: https://reviews.llvm.org/D52463 llvm-svn: 343092
* [CodeGen] Enable tail calls for functions with NonNull attributes.David Green2018-09-261-0/+12
| | | | | | | | | | | Adding NonNull as attributes to returned pointers has the unfortunate side effect of disabling tail calls. This patch ignores the NonNull attribute when we decide whether to tail merge, in the same way that we ignore the NoAlias attribute, as it has no affect on the call sequence. Differential Revision: https://reviews.llvm.org/D52238 llvm-svn: 343091
* Revert r342870 "[ARM] bottom-top mul support ARMParallelDSP"Hans Wennborg2018-09-263-558/+0
| | | | | | | | | | | | | | | | | | | | This broke Chromium's Android build (https://crbug.com/889390) and the polly-aosp buildbot (http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable). > Originally committed in rL342210 but was reverted in rL342260 because > it was causing issues in vectorized code, because I had forgotten to > ensure that we're operating on scalar values. > > Original commit message: > > On failing to find sequences that can be converted into dual macs, > try to find sequential 16-bit loads that are used by muls which we > can then use smultb, smulbt, smultt with a wide load. > > Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 343082
* [ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33Sjoerd Meijer2018-09-241-0/+19
| | | | | | | | | | | | A sequence of VMUL and VADD instructions always give the same or better performance than a fused VMLA instruction on the Cortex-M4 and Cortex-M33. Executing the VMUL and VADD back-to-back requires the same cycles, but having separate instructions allows scheduling to avoid the hazard between these 2 instructions. Differential Revision: https://reviews.llvm.org/D52289 llvm-svn: 342874
* [ARM][ARMLoadStoreOptimizer]Luke Cheeseman2018-09-241-0/+40
| | | | | | | | | | | - The load store optimizer is currently merging multiple loads/stores into VLDM/VSTM with more than 16 doubleword registers - This is an UNPREDICTABLE instruction and shouldn't be done - It looks like the Limit for how many registers included in a merge got dropped at some point so I am reintroducing it in this patch - This fixes https://bugs.llvm.org/show_bug.cgi?id=38389 Differential Revision: https://reviews.llvm.org/D52085 llvm-svn: 342872
* [ARM] bottom-top mul support ARMParallelDSPSam Parker2018-09-243-0/+558
| | | | | | | | | | | | | | | | Originally committed in rL342210 but was reverted in rL342260 because it was causing issues in vectorized code, because I had forgotten to ensure that we're operating on scalar values. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 342870
* [ARM] Fix unwind information for floating point registersOliver Stannard2018-09-191-0/+15
| | | | | | | | | | | | Fixes the unwind information generated for floating-point registers. Previously, all padding registers were assumed to be four bytes wide. Now, the width of the register is used to specify the amount of padding. Patch by Jackson Woodruff! Differential revision: https://reviews.llvm.org/D51494 llvm-svn: 342545
* [TargetLowering] Android has sincos functionsJohn Brawn2018-09-181-0/+18
| | | | | | | | | Since Android API version 9 the Android libm has had the sincos functions, so they should be recognised as libcalls and sincos optimisation should be applied. Differential Revision: https://reviews.llvm.org/D52025 llvm-svn: 342471
* Revert "[ARM] Cleanup ARM CGP isSupportedValue"Volodymyr Sapsai2018-09-181-34/+0
| | | | | | | | | | | | | | | This reverts r342395 as it caused error > Argument value type does not match pointer operand type! > %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25 > i8in function atomic_flag_test_and_set > fatal error: error in backend: Broken function found, compilation aborted! on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/ More details are available at https://reviews.llvm.org/D52080 llvm-svn: 342431
* [ARM] Cleanup ARM CGP isSupportedValueSam Parker2018-09-171-0/+34
| | | | | | | | | | | | isSupportedValue explicitly checked and accepted many types of value, primarily for debugging reasons. Remove most of these checks and do a bit of refactoring now that the pass is more stable. This also enables ZExts to be sources, but this has very little practical benefit at the moment extend instructions will still be introduced. Differential Revision: https://reviews.llvm.org/D52080 llvm-svn: 342395
* [ARM] Disallow icmp with negative imm and overflowSam Parker2018-09-171-0/+22
| | | | | | | | | | We allow overflowing instructions if they're decreasing and only used by an unsigned compare. Add the extra condition that the icmp cannot be using a negative immediate. Differential Revision: https://reviews.llvm.org/D52102 llvm-svn: 342392
* Revert r342210 "[ARM] bottom-top mul support in ARMParallelDSP"Reid Kleckner2018-09-142-460/+0
| | | | | | | | | | It causes assertion failures while building Skia for Android in Chromium: https://ci.chromium.org/buildbot/chromium.clang/ToTAndroid/4550 Reduction forthcoming. llvm-svn: 342260
* [ARM] bottom-top mul support in ARMParallelDSPSam Parker2018-09-142-0/+460
| | | | | | | | | | On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 342210
* [ARM] Allow truncs as sources in ARM CGPSam Parker2018-09-131-1/+0
| | | | | | | | | | We previously only allowed truncs as sinks, but now allow them as sources too. We do this by checking that the result type is the narrow type that we're trying to optimise for. Differential Revision: https://reviews.llvm.org/D51978 llvm-svn: 342141
* [ARM] Fix FixConst for ARMCodeGenPrepareSam Parker2018-09-131-1/+10
| | | | | | | | | | Part of FixConsts wrongly assumes either a 8- or 16-bit constant which can result in the wrong constants being generated during promotion. Differential Revision: https://reviews.llvm.org/D52032 llvm-svn: 342140
* ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.Tim Northover2018-09-131-0/+49
| | | | | | | | | | | | The Technical Reference Manuals for these two CPUs state that branching to an unaligned 32-bit instruction incurs an extra pipeline reload penalty. That's bad. This also enables the optimization at -Os since it costs on average one byte per loop in return for 1 cycle per iteration, which is pretty good going. llvm-svn: 342127
* [ARM] Tighten f64<->f16 conversion requirementsDiogo N. Sampaio2018-09-122-0/+29
| | | | | | | | | | | | | | Fix missing Requires fields. Patch by Bernard Ogden (bogden) Reviewers: SjoerdMeijer, javed.absar, t.p.northover Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D51631 llvm-svn: 342061
* [ARM] Exchange MAC operands in ARMParallelDSPSam Parker2018-09-123-0/+737
| | | | | | | | | | | | | | | | SMLAD and SMLALD instructions also come in the form of SMLADX and SMLALDX which perform an exchange on their second operand. To support this, more of the loads in the MAC candidates are compared for sequential access and a boolean value has been added to BinOpChain. AddMACCandiate has been refactored into a small pattern matching state machine to reduce the amount of duplicated code, but also to enable the matching to be more flexible. CreateParallelMACPairs now iterates through all the candidates to find parallel ones. Differential Revision: https://reviews.llvm.org/D51424 llvm-svn: 342033
* [ARM] Allow bitcasts in ARMCodeGenPrepareSam Parker2018-09-121-0/+44
| | | | | | | | Allow bitcasts in the use-def chains, treating them as sources. Differential Revision: https://reviews.llvm.org/D50758 llvm-svn: 342032
* [ARM] Add smlald support in ARMParallelDSPSam Parker2018-09-113-0/+364
| | | | | | | | | Search from i64 reducing phis, as well as i32, to allow the generation of smlald instructions. Differential Revision: https://reviews.llvm.org/D51101 llvm-svn: 341941
* ARM: fix Thumb2 CodeGen for ldrex with folded frame-index.Tim Northover2018-09-072-0/+121
| | | | | | | | | | | Because t2LDREX (& t2STREX) were marked as AddrModeNone, but did allow a FrameIndex operand, rewriteT2FrameIndex asserted. This gives them a proper addressing-mode and tells the rewriter about it so that encodable offsets are exploited and others are rejected. Should fix PR38828. llvm-svn: 341642
* The initial .text section generated in object files was missing theEric Christopher2018-09-061-0/+3
| | | | | | | | | | | | | | | | | | | | SHF_ARM_PURECODE flag when being built with the -mexecute-only flag. All code sections of an ELF must have the flag set for the final .text section to be execute-only, otherwise the flag gets removed. A HasData flag is added to MCSection to aid in the determination that the section is empty. A virtual setTargetSectionFlags is added to MCELFObjectTargetWriter to allow subclasses to set target specific section flags to be added to sections which we then use in the ARM backend to set SHF_ARM_PURECODE. Patch by Ivan Lozano! Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D48792 llvm-svn: 341593
* [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))Sanjay Patel2018-09-051-0/+92
| | | | | | | | | | | | | | | | | | | | | This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. Here, we only do the transform when the target tells us that sqrt can be lowered with inline code. This is the basic case. Some potential enhancements are in the TODO comments: 1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper). 2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF. 3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right). Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should also be covered by existing tests). Differential Revision: https://reviews.llvm.org/D51630 llvm-svn: 341481
* [MinGW] [ARM] Add stubs for potential automatic dllimported variablesMartin Storsjo2018-08-313-4/+85
| | | | | | | | | | | The runtime pseudo relocations can't handle the ARM format embedded addresses in movw/movt pairs. By using stubs, the potentially dllimported addresses can be touched up by the runtime pseudo relocation framework. Differential Revision: https://reviews.llvm.org/D51450 llvm-svn: 341176
* [CodeGen] emit inline asm clobber list warnings for reserved (cont)Ties Stuij2018-08-301-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a continuation of https://reviews.llvm.org/D49727 Below the original text, current changes in the comments: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } Compiled for thumb, this gives: $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ Reviewers: efriedma, olista01, javed.absar Reviewed By: efriedma Subscribers: eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D51165 llvm-svn: 341062
* Fix "Q" and "R" inline assembly template modifiers for big-endian ArmFlorian Hahn2018-08-301-0/+10
| | | | | | | | | | | | | | Consider the endianness of the target when printing register names. This is in line with the documentation at http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers Patch by Jackson Woodruff <jackson.woodruff@arm.com> Reviewers: t.p.northover, echristo, javed.absar, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D49778 llvm-svn: 341052
* [GlobalMerge] Fix GlobalMerge on bss external global variables.Huihui Zhang2018-08-301-0/+65
| | | | | | | | | | | | | | | | | Summary: Global variables that are external and zero initialized are supposed to be merged with global variables in the bss section rather than the data section. Reviewers: efriedma, rengolin, t.p.northover, javed.absar, asl, john.brawn, pcc Reviewed By: efriedma Subscribers: dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D51379 llvm-svn: 341008
* [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.Eli Friedman2018-08-221-3/+5
| | | | | | | | | | The inline sequence is very long (about 70 bytes on Thumb1), so it's not really a good idea to inline it, especially when optimizing for size. Differential Revision: https://reviews.llvm.org/D47917 llvm-svn: 340458
* [ARM] Avoid injecting constant islands in movw+movt pairs on WindowsMartin Storsjo2018-08-221-0/+902
| | | | | | | | | | | | | | | | On Windows, movw+movt pairs with relocations are handled with a single relocation that covers them both. Therefore we can't inject anything between these instructions, otherwise the relocation (which in LLVM only is treated as the movw instruction's relocation, while the movt instruction's relocation is dropped) will end up bogus. These instructions are bundled up until right before the constant islands pass, making this effectively the only place that can split them apart. Differential Revision: https://reviews.llvm.org/D51032 llvm-svn: 340451
* [ARM] Handle all-ones mask explicitly in targetShrinkDemandedConstant.Eli Friedman2018-08-221-0/+35
| | | | | | | | | | | This avoids a potential infinite loop setting and unsetting bits in the mask. Reduced from a failure on the polly-aosp bot. Differential Revision: https://reviews.llvm.org/D51066 llvm-svn: 340446
* [ARM] Rotated operand patterns for *xtb16Sam Parker2018-08-221-0/+143
| | | | | | | | | Add intrinsic isel patterns for sxtb16, sxtab16, uxtb16 and uxtab16 so that they can perform a ror. Differential Revision: https://reviews.llvm.org/D51034 llvm-svn: 340405
* [DAGCombiner] Reduce load widths of shifted masksSam Parker2018-08-211-25/+514
| | | | | | | | | | | During combining, ReduceLoadWdith is used to combine AND nodes that mask loads into narrow loads. This patch allows the mask to be a shifted constant. This results in a narrow load which is then left shifted to compensate for the new offset. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 340261
* [SelectionDAG] Improve the legalisation lowering of UMULO.Eli Friedman2018-08-162-0/+279
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no way in the universe, that doing a full-width division in software will be faster than doing overflowing multiplication in software in the first place, especially given that this same full-width multiplication needs to be done anyway. This patch replaces the previous implementation with a direct lowering into an overflowing multiplication algorithm based on half-width operations. Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication. Baring any oversights introduced by porting the algorithm to DAG, confidence in correctness of this algorithm is extremely high. Following table shows the change in both t = runtime and s = space. The change is expressed as a multiplier of original, so anything under 1 is “better” and anything above 1 is worse. +-------+-----------+-----------+-------------+-------------+ | Arch | u64*u64 t | u64*u64 s | u128*u128 t | u128*u128 s | +-------+-----------+-----------+-------------+-------------+ | X64 | - | - | ~0.5 | ~0.64 | | i686 | ~0.5 | ~0.6666 | ~0.05 | ~0.9 | | armv7 | - | ~0.75 | - | ~1.4 | +-------+-----------+-----------+-------------+-------------+ Performance numbers have been collected by running overflowing multiplication in a loop under `perf` on two x86_64 (one Intel Haswell, other AMD Ryzen) based machines. Size numbers have been collected by looking at the size of function containing an overflowing multiply in a loop. All in all, it can be seen that both performance and size has improved except in the case of armv7 where code size has regressed for 128-bit multiply. u128*u128 overflowing multiply on 32-bit platforms seem to benefit from this change a lot, taking only 5% of the time compared to original algorithm to calculate the same thing. The final benefit of this change is that LLVM is now capable of lowering the overflowing unsigned multiply for integers of any bit-width as long as the target is capable of lowering regular multiplication for the same bit-width. Previously, 128-bit overflowing multiply was the widest possible. Patch by Simonas Kazlauskas! Differential Revision: https://reviews.llvm.org/D50310 llvm-svn: 339922
* [ARM] Ignore GEPs in ARMCodeGenPrepareSam Parker2018-08-161-0/+51
| | | | | | | | | | | While searching through the use-def tree, ignore GetElementPtrInst instructions because they don't need promoting and neither do their indices. Otherwise, the wide indices prevent the transformation from happening. Differential Revision: https://reviews.llvm.org/D50762 llvm-svn: 339871
* [ARM] Allow zext in ARMCodeGenPrepareSam Parker2018-08-162-13/+18
| | | | | | | | Treat zext instructions as roots, like we do for truncs. Differential Revision: https://reviews.llvm.org/D50759 llvm-svn: 339868
* [ARM] Allow signed icmps in ARMCodeGenPrepareSam Parker2018-08-162-1/+100
| | | | | | | | | | | | | | | | | Originally committed in r339755 which was reverted in r339806 due to an asan issue. The issue was caused by my assumption that operands to a CallInst mapped to the FunctionType Params. CallInsts are now handled by iterating over their ArgOperands instead of Operands. Original Message: Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339858
* Revert "[ARM] Allow signed icmps in ARMCodeGenPrepare"Vitaly Buka2018-08-151-100/+0
| | | | | | | | use-after-poison in check-llvm under asan This reverts commit r339755. llvm-svn: 339806
* [ARM] TypeSize lower bound for ARMCodeGenPrepareSam Parker2018-08-151-0/+24
| | | | | | | | | We only try to promote types with are smaller than 16-bits, but we also need to check that the type is not less than 8-bits. Differential Revision: https://reviews.llvm.org/D50769 llvm-svn: 339770
* [ARM] Allow signed icmps in ARMCodeGenPrepareSam Parker2018-08-151-0/+100
| | | | | | | | | | | Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339755
* [ARM] Allow pointer values in ARMCodeGenPrepareSam Parker2018-08-153-178/+297
| | | | | | | | | | | | Add pointers to the list of allowed types, but don't try to promote them. Also fixed a bug with the promotion of undef values, so a new value is now created instead of mutating in place. We also now only promote if there's an instruction in the use-def chains other than the icmp, sinks and sources. Differential Revision: https://reviews.llvm.org/D50054 llvm-svn: 339754
* [ARM] Added FP16 VREV Vector Instrinsic CodeGen supportLuke Geeson2018-08-131-13/+11
| | | | llvm-svn: 339546
* [ARM] Adjust AND immediates to make them cheaper to select.Eli Friedman2018-08-105-14/+10
| | | | | | | | | | | | | | | | | | | | | | | LLVM normally prefers to minimize the number of bits set in an AND immediate, but that doesn't always match the available ARM instructions. In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer a two-instruction sequence movs+ands or movs+bics. Some potential improvements outlined in ARMTargetLowering::targetShrinkDemandedConstant, but seems to work pretty well already. The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX instruction due to a larger-than-expected mask. (It's orthogonal, in some sense, but as far as I can tell it's either impossible or nearly impossible to reproduce the bug without this change.) According to my testing, this seems to consistently improve codesize by a small amount by forming bic more often for ISD::AND with an immediate. Differential Revision: https://reviews.llvm.org/D50030 llvm-svn: 339472
OpenPOWER on IntegriCloud