summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* [SelectionDAG] Add fcmp UNDEF handling to SelectionDAG::FoldSetCCSimon Pilgrim2019-04-053-100/+112
| | | | | | | | | | Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.). This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........ Differential Revision: https://reviews.llvm.org/D60006 llvm-svn: 357765
* [SelectionDAG] Compute known bits of CopyFromRegPiotr Sobczak2019-04-051-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if the virtual reg used has one def only. This can be particularly useful when calling isBaseWithConstantOffset() with the ISD::CopyFromReg argument, as more optimizations may get enabled in the result. Also add a missing truncation on X86, found by testing of this patch. Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa Reviewers: bogner, craig.topper, RKSimon Reviewed By: RKSimon Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59535 llvm-svn: 357745
* [ARM GlobalISel] Support DBG_VALUEDiana Picus2019-04-042-0/+113
| | | | | | Make sure we can map and select DBG_VALUE. llvm-svn: 357681
* Revert r357452 - 'SimplifyCFG SinkCommonCodeFromPredecessors: Also sink ↵David L. Jones2019-04-041-2/+2
| | | | | | | | | | function calls without used results (PR41259)' This revision causes tests to fail under ASAN. Since the cause of the failures is not clear (could be ASAN, could be a Clang bug, could be a bug in this revision), the safest course of action seems to be to revert while investigating. llvm-svn: 357667
* [DAGCombiner] loosen restrictions for moving shuffles after vector binopSanjay Patel2019-04-032-10/+10
| | | | | | | | | | | | There are 3 changes to make this correspond to the same transform in instcombine: 1. Remove the legality check - we can't create anything less legal than we started with. 2. Ease the use restriction, so we only bail out if both operands have >1 use. 3. Ease the use restriction for binops with a repeated operand (eg, mul x, x). As discussed in D60150, there's a scalarization opportunity that will be made easier by allowing this transform more generally. llvm-svn: 357580
* SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without ↵Hans Wennborg2019-04-021-2/+2
| | | | | | | | | | | | | | | | used results (PR41259) The code was previously checking that candidates for sinking had exactly one use or were a store instruction (which can't have uses). This meant we could sink call instructions only if they had a use. That limitation seemed a bit arbitrary, so this patch changes it to "instruction has zero or one use" which seems more natural and removes the need to special-case stores. Differential revision: https://reviews.llvm.org/D59936 llvm-svn: 357452
* [ARM] Optimize expressions like "return x != 0;" for Thumb1.Eli Friedman2019-04-021-2/+46
| | | | | | | | | | | | | There's an existing optimization for x != C, but somehow it was missing a special case for 0. While I'm here, also cleaned up the code/comments a bit: the second value produced by the MERGE_VALUES was actually dead, since a CMOV only produces one result. Differential Revision: https://reviews.llvm.org/D59616 llvm-svn: 357437
* [ARM] Don't try to create "push {r12, lr}" in Thumb1 at -Oz.Eli Friedman2019-04-011-22/+26
| | | | | | | | | | | | It's a little tricky to make this issue show up because prologue/epilogue emission normally likes to push at least two registers... but it doesn't when lr is force-spilled due to function length. Not sure if that really makes sense, but I decided not to touch it for now. Differential Revision: https://reviews.llvm.org/D59385 llvm-svn: 357436
* [ARM] Regenerate execute-only float comparison testsSimon Pilgrim2019-03-291-25/+63
| | | | | | Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357293
* [DAGCombine] Prune unnused nodes.Nirav Dave2019-03-292-8/+5
| | | | | | | | | | | | | | | | | | | Summary: Nodes that have no uses are eventually pruned when they are selected from the worklist. Record nodes newly added to the worklist or DAG and perform pruning after every combine attempt. Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight Reviewed By: jyknight Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58070 llvm-svn: 357283
* [ARM] Regenerate vector comparison testsSimon Pilgrim2019-03-293-152/+544
| | | | | | Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357281
* [ARM GlobalISel] Run regbankselect test for Thumb. NFCIDiana Picus2019-03-281-1/+2
| | | | | | | This should just work, since ARM mode and Thumb2 mode are at the same level of support now and should map the same to GPR and FPR. llvm-svn: 357159
* [ARM GlobalISel] Fix G_STORE with s1Diana Picus2019-03-282-7/+44
| | | | | | | G_STORE for 1-bit values uses a STRBi12, which stores the whole byte. Zero out the undefined bits before writing. llvm-svn: 357154
* [ARM GlobalISel] Fix selection of G_SELECTDiana Picus2019-03-283-6/+6
| | | | | | | | | | | G_SELECT uses a 1-bit scalar for the condition, and is currently implemented with a plain CMPri against 0. This means that values such as 0x1110 are interpreted as true, when instead the higher bits should be treated as undefined and therefore ignored. Replace the CMPri with a TSTri against 0x1, which performs an implicit AND, yielding the expected result. llvm-svn: 357153
* Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes."Nirav Dave2019-03-273-3/+5
| | | | | | | This patch appears to trigger very large compile time increases in halide builds. llvm-svn: 357116
* [ARM] Don't confuse the scheduler for very large VLDMDIA etc.Eli Friedman2019-03-271-0/+31
| | | | | | | | | | | | | ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the memory operands of load and store-multiple operations. This doesn't really fix the problem properly, but it's enough to prevent crashing, at least. Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 . Differential Revision: https://reviews.llvm.org/D59834 llvm-svn: 357109
* [DAG] Avoid smart constructor-based dangling nodes.Nirav Dave2019-03-263-5/+3
| | | | | | | | | | | | | | | Various SelectionDAG non-combine operations (e.g. the getNode smart constructor and legalization) may leave dangling nodes by applying optimizations or not fully pruning unused result values. This can result in nodes that are never added to the worklist and therefore can not be pruned. Add a node inserter as the current node deleter to make sure such nodes have the chance of being pruned. Many minor changes, mostly positive. llvm-svn: 356996
* [ARM GlobalISel] 64-bit memops should be alignedDiana Picus2019-03-252-31/+89
| | | | | | | | | | We currently use only VLDR/VSTR for all 64-bit loads/stores, so the memory operands must be word-aligned. Mark aligned operations as legal and narrow non-aligned ones to 32 bits. While we're here, also mark non-power-of-2 loads/stores as unsupported. llvm-svn: 356872
* [ARM] Don't form "ands" when it isn't scheduled correctly.Eli Friedman2019-03-221-0/+54
| | | | | | | | | | | | In r322972/r323136, the iteration here was changed to catch cases at the beginning of a basic block... but we accidentally deleted an important safety check. Restore that check to the way it was. Fixes https://bugs.llvm.org/show_bug.cgi?id=41116 Differential Revision: https://reviews.llvm.org/D59680 llvm-svn: 356809
* [AArch64, ARM] Add support for Exynos M5Evandro Menezes2019-03-221-0/+28
| | | | | | Add Exynos M5 support and test cases. llvm-svn: 356793
* [ARM] Eliminate redundant "mov rN, sp" instructions in Thumb1.Eli Friedman2019-03-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them to something like "str r0, [sp]". For regular stack variables, this optimization was already implemented: we lower loads and stores using frame indexes, which are expanded later. However, when constructing a call frame for a call with more than four arguments, the existing optimization doesn't apply. We need to use stores which are actually relative to the current value of sp, and don't have an associated frame index. This patch adds a special case to handle that construct. At the DAG level, this is an ISD::STORE where the address is a CopyFromReg from SP (plus a small constant offset). This applies only to Thumb1: in Thumb2 or ARM mode, a regular store instruction can access SP directly, so the COPY gets eliminated by existing code. The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related cleanup: we shouldn't pretend that it can select anything other than frame indexes. Differential Revision: https://reviews.llvm.org/D59568 llvm-svn: 356601
* RegAllocFast: Remove early selection loop, the spill calculation will report ↵Matt Arsenault2019-03-192-1036/+1036
| | | | | | | | | | | | | | | | cost 0 anyway for free regs The 2nd loop calculates spill costs but reports free registers as cost 0 anyway, so there is little benefit from having a separate early loop. Surprisingly this is not NFC, as many register are marked regDisabled so the first loop often picks up later registers unnecessarily instead of the first one available in the allocation order... Patch by Matthias Braun llvm-svn: 356499
* [ARM] Add MachineVerifier logic for some Thumb1 instructions.Eli Friedman2019-03-151-0/+22
| | | | | | | | | | | tMOVr and tPUSH/tPOP/tPOP_RET have register constraints which can't be expressed in TableGen, so check them explicitly. I've unfortunately run into issues with both of these recently; hopefully this saves some time for someone else in the future. Differential Revision: https://reviews.llvm.org/D59383 llvm-svn: 356303
* [ARM] Remove EarlyCSE from backendSam Parker2019-03-152-7/+2
| | | | | | | | | | | There is an issue with early CSE hitting an assert, so temporarily remove the pass from the Arm backend. Bug: https://bugs.llvm.org/show_bug.cgi?id=41081 Differential Revision: https://reviews.llvm.org/D59410 llvm-svn: 356259
* [ARM] Remove icmp undef from reduced testsSimon Pilgrim2019-03-152-10/+13
| | | | | | | | Pre-commit for D59363 (Add icmp UNDEF handling to SelectionDAG::FoldSetCC) Approved by @efriedma (Eli Friedman) llvm-svn: 356252
* [ARM][ParallelDSP] Disable for big-endianSam Parker2019-03-155-0/+11
| | | | | | | | | Bail early when we don't have a preheader and also if the target is big endian because it's written with only little endian in mind! Differential Revision: https://reviews.llvm.org/D59368 llvm-svn: 356243
* [NFC][ARM] Update testSam Parker2019-03-141-5/+5
| | | | | | Change some regex to handle commutable instructions. llvm-svn: 356159
* ARM: Add ImmArg to intrinsicsMatt Arsenault2019-03-142-26/+0
| | | | | | | | | | I found these by asserting in clang for any GCCBuiltin that doesn't require mangling and requires a constant for the builtin. This means that intrinsics are missing which don't use GCCBuiltin, don't have builtins defined in clang, or were missing the constant annotation in the builtin definition. llvm-svn: 356144
* [ARM][ParallelDSP] Enable multiple uses of loadsSam Parker2019-03-143-0/+469
| | | | | | | | | | | | | | | | When choosing whether a pair of loads can be combined into a single wide load, we check that the load only has a sext user and that sext also only has one user. But this can prevent the transformation in the cases when parallel macs use the same loaded data multiple times. To enable this, we need to fix up any other uses after creating the wide load: generating a trunc and a shift + trunc pair to recreate the narrow values. We also need to keep a record of which loads have already been widened. Differential Revision: https://reviews.llvm.org/D59215 llvm-svn: 356132
* [ARM] Run ARMParallelDSP in the IRPasses phaseSam Parker2019-03-144-69/+279
| | | | | | | | | Run EarlyCSE before ParallelDSP and do this in the backend IR opt phase. Differential Revision: https://reviews.llvm.org/D59257 llvm-svn: 356130
* [DAGCombiner] If a TokenFactor would be merged into its user, consider the ↵Nirav Dave2019-03-136-24/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | user later. Summary: A number of optimizations are inhibited by single-use TokenFactors not being merged into the TokenFactor using it. This makes we consider if we can do the merge immediately. Most tests changes here are due to the change in visitation causing minor reorderings and associated reassociation of paired memory operations. CodeGen tests with non-reordering changes: X86/aligned-variadic.ll -- memory-based add folded into stored leaq value. X86/constant-combiners.ll -- Optimizes out overlap between stores. X86/pr40631_deadstore_elision -- folds constant byte store into preceding quad word constant store. Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet Reviewed By: courbet Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59260 llvm-svn: 356068
* [SDAG] Expand pow2 mulo using shiftsNikita Popov2019-03-121-10/+11
| | | | | | | | | | | Expand MULO with constant power of two operand into a shift. The overflow is checked with (x << shift) >> shift == x, where the right shift will be logical for umulo and arithmetic for smulo (with exception for multiplications by signed_min). Differential Revision: https://reviews.llvm.org/D59041 llvm-svn: 355937
* [ARM][NFC] Delete original smlad testsSam Parker2019-03-1219-2106/+0
| | | | | | Because I don't understand svn. llvm-svn: 355908
* [ARM][NFC] Move smlad testsSam Parker2019-03-1219-0/+2106
| | | | | | Created a test/CodeGen/ARM/ParallelDSP folder. llvm-svn: 355907
* [ARM] Use non-constant operand in umulo-32.ll; NFCNikita Popov2019-03-091-24/+9
| | | | | | | | | | Currently the store+load is folded and both operands of the umulo end up being constants. To avoid this getting folded away entirely, make sure at least one operand is non-constant. Also remove some allocas which don't seem relevant to the test. llvm-svn: 355776
* [ARM] Generate test checks for umulo-32.ll; NFCNikita Popov2019-03-091-9/+45
| | | | | | | The second test case is going to be changed by D59041, so generate full baseline checks. llvm-svn: 355775
* [LSR] Attempt to increase the accuracy of LSR's setup costDavid Green2019-03-071-0/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | In some loops, we end up generating loop induction variables that look like: {(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1} As opposed to the simpler: {(zext i16 (%i0 * %i1) to i32),+,-1} i.e we count up from -limit to 0, not the simpler counting down from limit to 0. This is because the scores, as LSR calculates them, are the same and the second is filtered in place of the first. We end up with a redundant SUB from 0 in the code. This patch tries to make the calculation of the setup cost a little more thoroughly, recursing into the scev members to better approximate the setup required. The cost function for comparing LSR costs is: return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds, C1.ScaleCost, C1.ImmCost, C1.SetupCost) < std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds, C2.ScaleCost, C2.ImmCost, C2.SetupCost); So this will only alter results if none of the other variables turn out to be different. Differential Revision: https://reviews.llvm.org/D58770 llvm-svn: 355597
* [ARM] Fix select_cc lowering for fp16Oliver Stannard2019-03-051-0/+676
| | | | | | | | | | | When lowering a select_cc node where the true and false values are of type f16, we can't use a general conditional move because the FP16 instructions do not support conditional execution. Instead, we must ensure that the condition code is one of the four supported by the VSEL instruction. Differential revision: https://reviews.llvm.org/D58813 llvm-svn: 355385
* [ARM] Fix selection of VLDR.16 instruction with imm offsetOliver Stannard2019-03-041-0/+105
| | | | | | | | | | | | The isScaledConstantInRange function takes upper and lower bounds which are checked after dividing by the scale, so the bounds checks for half, single and double precision should all be the same. Previously, we had wrong bounds checks for half precision, so selected an immediate the instructions can't actually represent. Differential revision: https://reviews.llvm.org/D58822 llvm-svn: 355305
* [ARM] Fix FP16 stack loads/stores for Thumb2 with frame pointerOliver Stannard2019-03-011-0/+22
| | | | | | | | | | The new addressing mode added for the v8.2A FP16 instructions uses bit 8 of the immediate to encode the sign of the offset, like the other FP loads/stores, so need to be treated the same way. Differential revision: https://reviews.llvm.org/D58816 llvm-svn: 355201
* [ARM] Consider undefined-on-NaN conditions in checkVSELConstraintsOliver Stannard2019-03-011-34/+251
| | | | | | | | | | This function was not checking for the condition code variants which are undefined if either input is NaN, so we were missing selection of the VSEL instruction in some cases when using -fno-honor-nans or -ffast-math. Differential revision: https://reviews.llvm.org/D58812 llvm-svn: 355199
* [ARM GlobalISel] Support G_CTLZ for Thumb2Diana Picus2019-03-012-0/+34
| | | | | | Same as ARM mode but with different opcode. llvm-svn: 355191
* [ARM GlobalISel] Check target flags in test. NFCIDiana Picus2019-03-012-12/+12
| | | | | | | | There was a time when we couldn't dump target-specific flags such as arm-sbrel etc, so the tests didn't check for them. We can now be more specific in our tests. llvm-svn: 355189
* [ARM GlobalISel] Support global variables for Thumb2Diana Picus2019-02-288-26/+660
| | | | | | | | | | | | | | | | | | | Add the same level of support as for ARM mode (i.e. still no TLS support). In most cases, it is sufficient to replace the opcodes with the t2-equivalent, but there are some idiosyncrasies that I decided to preserve because I don't understand the full implications: * For ARM we use LDRi12 to load from constant pools, but for Thumb we use t2LDRpci (I'm not sure if the ideal would be to use t2LDRi12 for Thumb as well, or to use LDRcp for ARM). * For Thumb we don't have an equivalent for MOV|LDRLIT_ga_pcrel_ldr, so we have to generate MOV|LDRLIT_ga_pcrel plus a load from GOT. The tests are in separate files because they're hard enough to read even without doubling the number of checks. llvm-svn: 355077
* [ARM] Add Cortex-M35PLuke Cheeseman2019-02-262-0/+28
| | | | | | | | | | - Add LLVM backend support for Cortex-M35P - Documentation can be found at https://developer.arm.com/products/processors/cortex-m/cortex-m35p Differentail Revision: https://reviews.llvm.org/D57763 llvm-svn: 354868
* [ARM] Add some more missing T1 opcodes for the peephole optimisierDavid Green2019-02-251-1/+0
| | | | | | | | | | | This adds a few extra Thumb1 opcodes to improve the peephole opimisers ability to remove redundant cmp instructions. tADC and tSBC require a small fixup to prevent MOVS being moved past the instruction, giving the wrong flags. Differential Revision: https://reviews.llvm.org/D58281 llvm-svn: 354791
* Fixed typos in tests: s/CHEKC/CHECK/Dmitri Gribenko2019-02-252-2/+2
| | | | | | | | | | | | Reviewers: ilya-biryukov Subscribers: nemanjai, javed.absar, jsji, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D58611 llvm-svn: 354785
* Fixed typos in tests: s/CEHCK/CHECK/Dmitri Gribenko2019-02-252-2/+2
| | | | | | | | | | | | Reviewers: ilya-biryukov Subscribers: sanjoy, sdardis, javed.absar, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58608 llvm-svn: 354781
* [ARM] Make fullfp16 instructions not conditionalisable.Simon Tatham2019-02-251-0/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More or less all the instructions defined in the v8.2a full-fp16 extension are defined as UNPREDICTABLE if you put them in an IT block (Thumb) or use with any condition other than AL (ARM). LLVM didn't know that, and was happy to conditionalise them. In order to force these instructions to count as not predicable, I had to make a small Tablegen change. The code generation back end mostly decides if an instruction was predicable by looking for something it can identify as a predicate operand; there's an isPredicable bit flag that overrides that check in the positive direction, but nothing that overrides it in the negative direction. (I considered the alternative approach of actually removing the predicate operand from those instructions, but thought that it would be more painful overall for instructions differing only in data type to have different shapes of operand list. This way, the only code that has to notice the difference is the if-converter.) So I've added an isUnpredicable bit alongside isPredicable, and set that bit on the right subset of FP16 instructions, and also on the VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be unpredicable for all data types. I've included a couple of representative regression tests, both of which previously caused an fp16 instruction to be conditionalised in ARM state and (with -arm-no-restrict-it) to be put in an IT block in Thumb. Reviewers: SjoerdMeijer, t.p.northover, efriedma Reviewed By: efriedma Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57823 llvm-svn: 354768
* [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPsDavid Green2019-02-221-2/+1
| | | | | | | | | | | | This adds a number of missing Thumb1 opcodes so that the peephole optimiser can remove redundant CMP instructions. Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri instruction can be used with frame indices. In thumb1 we use tADDframe. Differential Revision: https://reviews.llvm.org/D57833 llvm-svn: 354667
OpenPOWER on IntegriCloud