summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* Elide argument copies during instruction selectionReid Kleckner2017-03-011-0/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Avoids tons of prologue boilerplate when arguments are passed in memory and left in memory. This can happen in a debug build or in a release build when an argument alloca is escaped. This will dramatically affect the code size of x86 debug builds, because X86 fast isel doesn't handle arguments passed in memory at all. It only handles the x86_64 case of up to 6 basic register parameters. This is implemented by analyzing the entry block before ISel to identify copy elision candidates. A copy elision candidate is an argument that is used to fully initialize an alloca before any other possibly escaping uses of that alloca. If an argument is a copy elision candidate, we set a flag on the InputArg. If the the target generates loads from a fixed stack object that matches the size and alignment requirements of the alloca, the SelectionDAG builder will delete the stack object created for the alloca and replace it with the fixed stack object. The load is left behind to satisfy any remaining uses of the argument value. The store is now dead and is therefore elided. The fixed stack object is also marked as mutable, as it may now be modified by the user, and it would be invalid to rematerialize the initial load from it. Supersedes D28388 Fixes PR26328 Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29668 llvm-svn: 296683
* [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combineArtur Pilipenko2017-03-013-26/+9
| | | | | | | | | | | | Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336). Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 296651
* [ARM] GlobalISel: Lower call params that need extensionsDiana Picus2017-03-011-17/+43
| | | | | | | | | | | | | | | | | | | | | | Lower i1, i8 and i16 call parameters by extending them before storing them on the stack. Also make sure we encode the correct, extended size in the corresponding memory operand, and that we compute the correct stack size in the end. The latter is a bit more complicated because we used to compute the stack size in the getStackAddress method, based on the Size and Offset of the parameters. However, if the last parameter is sign extended, we'd be using the wrong, non-extended size, and we'd end up with a smaller stack than we need to hold the extended value. Instead of hacking this up based on the value of Size in getStackAddress, we move our stack size handling logic to assignArg, where we have access to the CCState which knows everything we could possibly want to know about the stack. This way we don't need to duplicate any knowledge or resort to any ugly hacks. On this same occasion, update the IRTranslator test to check the sizes of the stores everywhere, not just for sign extended paramteres. llvm-svn: 296631
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-286-102/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296476
* [ARM] GlobalISel: Lower i32 and fp call parameters on the stackDiana Picus2017-02-281-9/+46
| | | | | | | | | | | | Lower i32, float and double parameters that need to live on the stack. This boils down to creating some G_GEPs starting from the stack pointer and storing the values there. During the process we also keep track of the stack size and use the final value in the ADJCALLSTACKDOWN/UP instructions. We currently assert for smaller types, since they usually require extensions. They will be handled in a separate patch. llvm-svn: 296473
* [ARM] GlobalISel: Select 32-bit G_CONSTANTDiana Picus2017-02-281-0/+19
| | | | | | Put it into a register by means of a MOVi. llvm-svn: 296471
* [ARM] GlobalISel: Add mapping for G_CONSTANTDiana Picus2017-02-281-0/+18
| | | | | | | Like G_FRAME_INDEX, G_CONSTANT has one register operand and one non-register operand. llvm-svn: 296469
* [ARM] GlobalISel: Legalize 32-bit constantsDiana Picus2017-02-281-0/+20
| | | | llvm-svn: 296468
* [ARM] GlobalISel: Select G_GEPDiana Picus2017-02-281-0/+29
| | | | | | At this point, G_GEP is just an add, so we treat it exactly like a G_ADD. llvm-svn: 296462
* [ARM] GlobalISel: Add reg bank mapping for G_GEPDiana Picus2017-02-281-0/+27
| | | | | | This should be the same as the mapping for G_ADD etc. llvm-svn: 296455
* [ARM] GlobalISel: Legalize G_GEP with 32-bit offsetsDiana Picus2017-02-281-0/+27
| | | | | | | At the moment we're only interested in GEPs for putting call parameters on the stack, so we'll stick to 32-bit offsets. llvm-svn: 296452
* [CGP] Split some critical edges coming out of indirect branchesMichael Kuperstein2017-02-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the *direct* branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296416
* [ARM] don't transform an add(ext Cond), C to select unless there's a setcc ↵Sanjay Patel2017-02-272-14/+16
| | | | | | | | | | | | | | | | | | | | | | | of the condition The transform in question claims to be doing: // fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c)) ...starting in PerformADDCombineWithOperands(), but it wasn't actually checking for a setcc node for the sext/zext patterns. This is exactly the opposite of a transform I'd like to add to DAGCombiner's foldSelectOfConstants(), so I was seeing infinite loops with my draft of a patch applied. The changes in select_const.ll look positive (less instructions). The change in arm-and-tst-peephole.ll is unrelated. We're changing the input IR in that test to preserve the intent of the test, but that's not affected by this code change. Differential Revision: https://reviews.llvm.org/D30355 llvm-svn: 296389
* [DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE ↵Artur Pilipenko2017-02-271-0/+29
| | | | | | | | | | | | | | | | | | targets This pattern is essentially a i16 load from p+1 address: %p1.i16 = bitcast i8* %p to i16* %p2.i8 = getelementptr i8, i8* %p, i64 2 %v1 = load i16, i16* %p1.i16 %v2.i8 = load i8, i8* %p2.i8 %v2 = zext i8 %v2.i8 to i16 %v1.shl = shl i16 %v1, 8 %res = or i16 %v1.shl, %v2 Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining. llvm-svn: 296336
* Do full codegen for various tests. NFCAmaury Sechet2017-02-271-13/+82
| | | | llvm-svn: 296305
* Revert "[CGP] Split some critical edges coming out of indirect branches"Daniel Jasper2017-02-261-1/+0
| | | | | | | This reverts commit r296149 as it leads to crashes when compiling for PPC. llvm-svn: 296295
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-02-266-81/+102
| | | | | | | | UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. llvm-svn: 296279
* The automatic CHECK: to CHECK-LABEL: conversion, back in 2013,Artyom Skrobov2017-02-251-7/+7
| | | | | | | had missed most labels in this test because they didn't end with a colon. llvm-svn: 296254
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-256-102/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252
* [ARM] add tests for alternate forms of select-of-constants; NFCSanjay Patel2017-02-241-0/+33
| | | | llvm-svn: 296178
* [ARM] auto-generate complete checks; NFCSanjay Patel2017-02-241-8/+34
| | | | | | | The affected test may change with a patch I'm looking at for DAGCombiner, so I want to make sure it's not a regression. llvm-svn: 296175
* [CGP] Split some critical edges coming out of indirect branchesMichael Kuperstein2017-02-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the *direct* branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296149
* [DAGCombiner] add missing folds for scalar select of {-1,0,1}Sanjay Patel2017-02-241-24/+11
| | | | | | | | | | | | | | | | | | | | | | | | The motivation for filling out these select-of-constants cases goes back to D24480, where we discussed removing an IR fold from add(zext) --> select. And that goes back to: https://reviews.llvm.org/rL75531 https://reviews.llvm.org/rL159230 The idea is that we should always canonicalize patterns like this to a select-of-constants in IR because that's the smallest IR and the best for value tracking. Note that we currently do the opposite in some cases (like the cases in *this* patch). Ie, the proposed folds in this patch already exist in InstCombine today: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151 As this patch shows, most targets generate better machine code for simple ext/add/not ops rather than a select of constants. So the follow-up steps to make this less of a patchwork of special-case folds and missing IR canonicalization: 1. Have DAGCombiner convert any select of constants into ext/add/not ops. 2 Have InstCombine canonicalize in the other direction (create more selects). Differential Revision: https://reviews.llvm.org/D30180 llvm-svn: 296137
* [ARM] GlobalISel: Select G_STOREDiana Picus2017-02-241-0/+50
| | | | | | Same as selecting G_LOAD. llvm-svn: 296122
* Minor test fixDiana Picus2017-02-241-2/+2
| | | | | | The test was using a size of 8 for loading/storing pointers. It should be 4. llvm-svn: 296120
* [ARM] GlobalISel: Add reg bank mappings for storesDiana Picus2017-02-241-0/+43
| | | | | | Same as the ones for loads. llvm-svn: 296115
* [ARM] GlobalISel: Legalize storesDiana Picus2017-02-241-0/+43
| | | | | | Allow the same types that we allow for loads. llvm-svn: 296108
* Revert "[ARM] GlobalISel: Legalize stores"Diana Picus2017-02-241-37/+0
| | | | | | This reverts commit r296103 because the test broke on one of the bots. Sorry! llvm-svn: 296104
* [ARM] GlobalISel: Legalize storesDiana Picus2017-02-241-0/+37
| | | | | | Allow the same types that we allow for loads. llvm-svn: 296103
* Add some testcases for bitfields with illegal widths.Eli Friedman2017-02-241-0/+234
| | | | | | | | clang will generate IR like this for input using packed bitfields; very simple semantically, but it's a bit tricky to actually generate good code. llvm-svn: 296080
* Revert r269060 to pacify bots.Michael Kuperstein2017-02-241-1/+0
| | | | llvm-svn: 296064
* [CGP] Split some critical edges coming out of indirect branchesMichael Kuperstein2017-02-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the *direct* branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296060
* ARM: make sure FastISel bails on f64 operations for Cortex-M4.Tim Northover2017-02-231-0/+62
| | | | | | | | | | | FastISel wasn't checking the isFPOnlySP subtarget feature before emitting double-precision operations, so it got completely invalid CodeGen for doubles on Cortex-M4F. The normal ISel testing wasn't spectacular either so I added a second RUN line to improve that while I was in the area. llvm-svn: 296031
* [ARM] GlobalISel: Lower call returnsDiana Picus2017-02-231-20/+40
| | | | | | | | Introduce a common ValueHandler for call returns and formal arguments, and inherit two different versions for handling the differences (at the moment the only difference is the way physical registers are marked as used). llvm-svn: 295973
* [ARM] GlobalISel: Lower call parameters in regsDiana Picus2017-02-231-0/+77
| | | | | | | | Add support for lowering calls with parameters than can fit into regs. Use the same ValueHandler that we used for function returns, but rename it to match its new, extended purpose. llvm-svn: 295971
* Fix assertion failure in ARMConstantIslandPass.Kristof Beyls2017-02-231-9/+44
| | | | | | | | | | The ARMConstantIslandPass didn't have support for handling accesses to constant island objects through ARM::t2LDRBpci instructions. This adds support for that. This fixes PR31997. llvm-svn: 295964
* [DAGCombiner] revert r295336Bill Seurer2017-02-223-9/+26
| | | | | | | | | | | r295336 causes a bootstrapped clang to fail for many compilations on powerpc BE. See http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315 for example. Reverting as per the developer's request. llvm-svn: 295849
* [ARM] Classification Improvements to ARM Sched-Models. NFCI.Javed Absar2017-02-221-0/+175
| | | | | | | | | | | | | | This patch adds missing sched classes for Thumb2 instructions. This has been missing so far, and as a consequence, machine scheduler models for individual sub-targets have tended to be larger than they needed to be. These patches should help write schedulers better and faster in the future for ARM sub-targets. Reviewer: Diana Picus Differential Revision: https://reviews.llvm.org/D29953 llvm-svn: 295811
* Fix PR31896.Evgeniy Stepanov2017-02-211-0/+16
| | | | | | Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset). llvm-svn: 295762
* [ARM] GlobalISel: Lower calls to void() functionsDiana Picus2017-02-211-0/+23
| | | | | | | For now, we hardcode a BLX instruction, and generate an ADJCALLSTACKDOWN/UP pair with amount 0. llvm-svn: 295716
* [ARM] Add a div regression test for Cortex-M23Sanne Wouda2017-02-201-0/+67
| | | | | | | | | | | | | | | | Summary: This file was missed in the commit for Cortex-M23 and Cortex-M33 support. See https://reviews.llvm.org/D29073?id=85814 . Reviewers: rengolin, javed.absar, samparker Reviewed By: samparker Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D30162 llvm-svn: 295655
* GlobalISel: verify that generic loads & stores have a mem operand.Tim Northover2017-02-173-18/+18
| | | | | | | The mem operand is used by GlobalISel to convey atomic constraints so dropping it is invalid. llvm-svn: 295476
* [ARM] add tests for select-of-constants; NFCSanjay Patel2017-02-171-0/+257
| | | | llvm-svn: 295459
* [ARM] GlobalISel: Use Subtarget in LegalizerDiana Picus2017-02-171-3/+5
| | | | | | | | Start using the Subtarget to make decisions about what's legal. In particular, we only mark floating point operations as legal if we have VFP2, which is something we should've done from the very start. llvm-svn: 295439
* [ARM] GlobalISel: Add end-to-end tests for doubleDiana Picus2017-02-171-0/+21
| | | | | | Test some really basic functionality through the whole GlobalISel pipeline. llvm-svn: 295438
* [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combineArtur Pilipenko2017-02-163-26/+9
| | | | | | | | | | | | Resubmit -r295314 with PowerPC and AMDGPU tests updated. Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295336
* [ARM] GlobalISel: Select floating point loadsDiana Picus2017-02-161-0/+56
| | | | llvm-svn: 295321
* Rever -r295314 "[DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in ↵Artur Pilipenko2017-02-163-9/+26
| | | | | | | | load combine" This change causes some of AMDGPU and PowerPC tests to fail. llvm-svn: 295316
* [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combineArtur Pilipenko2017-02-163-26/+9
| | | | | | | | | | Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295314
* [ARM] GlobalISel: Select G_SEQUENCE and G_EXTRACTDiana Picus2017-02-161-0/+45
| | | | | | | | Since they're only used for passing around double precision floating point values into the general purpose registers, we'll lower them to VMOVDRR and VMOVRRD. llvm-svn: 295310
OpenPOWER on IntegriCloud