summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combineArtur Pilipenko2017-03-011-8/+19
| | | | | | | | | | | | Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336). Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 296651
* [DAGCombiner] use dyn_cast values in foldSelectOfConstants(); NFCSanjay Patel2017-02-281-6/+8
| | | | llvm-svn: 296502
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-281-369/+373
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296476
* Revert "DAG: Check if extract_vector_elt is legal or custom"Matt Arsenault2017-02-271-1/+1
| | | | | | | This reverts r295782. This could potentially result in some legalization loops and I avoided the need for this. llvm-svn: 296393
* [X86][SSE] Attempt to extract vector elements through target shufflesSimon Pilgrim2017-02-271-0/+15
| | | | | | | | | | DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered. This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it. Differential Revision: https://reviews.llvm.org/D30176 llvm-svn: 296381
* [DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE ↵Artur Pilipenko2017-02-271-0/+4
| | | | | | | | | | | | | | | | | | targets This pattern is essentially a i16 load from p+1 address: %p1.i16 = bitcast i8* %p to i16* %p2.i8 = getelementptr i8, i8* %p, i64 2 %v1 = load i16, i16* %p1.i16 %v2.i8 = load i8, i8* %p2.i8 %v2 = zext i8 %v2.i8 to i16 %v1.shl = shl i16 %v1, 8 %res = or i16 %v1.shl, %v2 Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining. llvm-svn: 296336
* [DAGCombine] NFC. MatchLoadCombine extract MemoryByteOffset lambda helperArtur Pilipenko2017-02-271-9/+13
| | | | | | This refactoring will simplify the upcoming change to fix the bug in folding patterns with non-zero offsets on BE targets. llvm-svn: 296332
* [DAGCombine] NFC. MatchLoadCombine remember the first byte provider, not the ↵Artur Pilipenko2017-02-271-3/+5
| | | | | | | | load node This refactoring will simplify the upcoming change to fix a bug in folding patterns with non-zero offsets on BE targets. llvm-svn: 296331
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-02-261-373/+369
| | | | | | | | UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. llvm-svn: 296279
* No need to copy the variable [NFC]Artyom Skrobov2017-02-251-2/+1
| | | | llvm-svn: 296259
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-251-369/+373
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252
* [DAGCombiner] add missing folds for scalar select of {-1,0,1}Sanjay Patel2017-02-241-3/+32
| | | | | | | | | | | | | | | | | | | | | | | | The motivation for filling out these select-of-constants cases goes back to D24480, where we discussed removing an IR fold from add(zext) --> select. And that goes back to: https://reviews.llvm.org/rL75531 https://reviews.llvm.org/rL159230 The idea is that we should always canonicalize patterns like this to a select-of-constants in IR because that's the smallest IR and the best for value tracking. Note that we currently do the opposite in some cases (like the cases in *this* patch). Ie, the proposed folds in this patch already exist in InstCombine today: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151 As this patch shows, most targets generate better machine code for simple ext/add/not ops rather than a select of constants. So the follow-up steps to make this less of a patchwork of special-case folds and missing IR canonicalization: 1. Have DAGCombiner convert any select of constants into ext/add/not ops. 2 Have InstCombine canonicalize in the other direction (create more selects). Differential Revision: https://reviews.llvm.org/D30180 llvm-svn: 296137
* [DAG] add convenience function to get -1 constant; NFCISanjay Patel2017-02-231-32/+15
| | | | llvm-svn: 296004
* [DAGCombiner] revert r295336Bill Seurer2017-02-221-19/+8
| | | | | | | | | | | r295336 causes a bootstrapped clang to fail for many compilations on powerpc BE. See http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315 for example. Reverting as per the developer's request. llvm-svn: 295849
* DAG: Check if extract_vector_elt is legal or customMatt Arsenault2017-02-211-1/+1
| | | | | | | Avoids test regressions in future AMDGPU commits when more vector types are custom lowered. llvm-svn: 295782
* Strip trailing whitespace.Simon Pilgrim2017-02-201-1/+1
| | | | llvm-svn: 295653
* [DAGCombiner] split i1 select-of-constants from non-i1 case; NFCISanjay Patel2017-02-171-9/+25
| | | | | | I can't find any tests of the non-i1 code path, so it may be unnecessary at this point. llvm-svn: 295463
* Fix signed/unsigned comparison warning.Simon Pilgrim2017-02-171-2/+2
| | | | llvm-svn: 295453
* [DAGCombine] Recognise any_extend_vector_inreg and truncation style shuffle ↵Simon Pilgrim2017-02-171-0/+125
| | | | | | | | | | | | | | masks During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing). This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types. The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712). Differential Revision: https://reviews.llvm.org/D29454 llvm-svn: 295451
* [DAGCombiner] improve readability; NFCISanjay Patel2017-02-171-44/+32
| | | | llvm-svn: 295447
* [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combineArtur Pilipenko2017-02-161-8/+19
| | | | | | | | | | | | Resubmit -r295314 with PowerPC and AMDGPU tests updated. Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295336
* Rever -r295314 "[DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in ↵Artur Pilipenko2017-02-161-19/+8
| | | | | | | | load combine" This change causes some of AMDGPU and PowerPC tests to fail. llvm-svn: 295316
* [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combineArtur Pilipenko2017-02-161-8/+19
| | | | | | | | | | Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295314
* [DAG] Don't try to create an INSERT_SUBVECTOR with an illegal sourceMichael Kuperstein2017-02-151-1/+7
| | | | | | | | | | | | We currently can't legalize those, but we should really not be creating them in the first place, since legalization would probably look similar to the way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD. This fixes PR311956. Differential Revision: https://reviews.llvm.org/D29961 llvm-svn: 295213
* [DAGCombiner] Teach DAG combine that inserting an extract_subvector result ↵Craig Topper2017-02-131-0/+6
| | | | | | into the same location of a an undef vector can just use the original input to the extract. llvm-svn: 294932
* [DAGCombiner] Remove the half vector width check for the combine of ↵Craig Topper2017-02-121-4/+3
| | | | | | | | EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR. This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors. llvm-svn: 294930
* [DAGCombiner] Make the combine of INSERT_SUBVECTOR into a CONCAT_VECTOR more ↵Craig Topper2017-02-111-16/+9
| | | | | | generic to support larger concats. llvm-svn: 294875
* [DAGCombiner] Support non-zero offset in load combineArtur Pilipenko2017-02-091-3/+8
| | | | | | | | | | | | | | | Enable folding patterns which load the value from non-zero offset: i8 *a = ... i32 val = a[4] | (a[5] << 8) | (a[6] << 16) | (a[7] << 24) => i32 val = *((i32*)(a+4)) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D29394 llvm-svn: 294582
* [DAGCombiner] NFC. Mark ByteProvider accessors as constArtur Pilipenko2017-02-081-2/+2
| | | | llvm-svn: 294494
* [DAGCombiner] Push truncate through adde when the carry isn't used.Amaury Sechet2017-02-081-0/+12
| | | | | | | | | | | | Summary: As per title. Reviewers: mkuper, spatel, bkramer, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29528 llvm-svn: 294394
* Revert "[DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry)"Daniel Jasper2017-02-071-6/+0
| | | | | | | | | | | | | This reverts commit r294186. On an internal test, this triggers an out-of-memory error on PPC, presumably because there is another dagcombine that does the exact opposite triggering and endless loop consuming more and more memory. Chandler has started at creating a reduced test case and we'll attach it as soon as possible. llvm-svn: 294288
* [DAGCombiner] Support bswap as a part of load combine patternsArtur Pilipenko2017-02-061-0/+3
| | | | | | | | Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D29397 llvm-svn: 294201
* [DAGCombiner] Make DAGCombiner smarter about overflowAmaury Sechet2017-02-061-20/+9
| | | | | | | | | | | | Summary: Leverage it to transform addc into add. Reviewers: mkuper, spatel, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29524 llvm-svn: 294187
* [DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry)Amaury Sechet2017-02-061-0/+6
| | | | | | | | | | | | Summary: This is extracted from D29443 . Reviewers: mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29564 llvm-svn: 294186
* [DAGCombiner] Leverage add's commutativityAmaury Sechet2017-02-051-6/+14
| | | | | | | | | | | | Summary: This avoid the need to duplicate all pattern and actually end up exposing some opportunity to optimize existing pattern that did not exists in both directions on an existing test case. Reviewers: mkuper, spatel, bkramer, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29541 llvm-svn: 294125
* [DAGCombiner] Canonicalize the order of a chain of INSERT_SUBVECTORs.Craig Topper2017-02-041-4/+24
| | | | | | Based on similar code for INSERT_VECTOR_ELT. llvm-svn: 294110
* [DAGCombiner] Use DAG.getAnyExtOrTrunc to simplify some code. NFCCraig Topper2017-02-041-5/+1
| | | | llvm-svn: 294109
* [DAGCombiner] In visitINSERT_VECTOR_ELT, move check for BUILD_VECTOR being ↵Craig Topper2017-02-041-4/+4
| | | | | | legal below code that just canonicalizes INSERT_VECTOR_ELT without creating BUILD_VECTORS. llvm-svn: 294108
* Formatting in DAGCombiner. NFCAmaury Sechet2017-02-041-0/+2
| | | | llvm-svn: 294091
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-02-021-372/+368
| | | | | | | | | UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915
* Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFCAmaury Sechet2017-02-021-1/+1
| | | | llvm-svn: 293903
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-021-368/+372
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893
* [DAGCombine] require UnsafeFPMath for re-association of additionNicolai Haehnle2017-01-311-6/+18
| | | | | | | | | | | | | | | | | | | Summary: The affected transforms all implicitly use associativity of addition, for which we usually require unsafe math to be enabled. The "Aggressive" flag is only meant to convey information about the performance of the fused ops relative to a fmul+fadd sequence. Fixes Bug 31626. Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD Subscribers: jholewinski, nemanjai, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D28675 llvm-svn: 293635
* [DAGCombiner] Use unsigned for a constant vector index instead of APInt.Craig Topper2017-01-291-2/+2
| | | | | | The type system requires that the number of vector elements should fit in 32-bits so this should be safe. llvm-svn: 293414
* [DAGCombiner] Remove unnecessary check on the size of the type of the index ↵Craig Topper2017-01-291-3/+1
| | | | | | | | of EXTRACT_SUBVECTOR. The type system already requires that the number of vector elements must fit in 32-bits so an index should as well. Even if the type of the index were larger all we care about is that the constant index can fit in 64-bits so that we can call getZExtValue. llvm-svn: 293413
* [DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before ↵Craig Topper2017-01-291-9/+9
| | | | | | trying to use getConstantOperandVal. llvm-svn: 293412
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-01-261-395/+369
| | | | | | | | UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-01-261-369/+395
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184
* [DAGCombiner] Fold extract_subvector of undef to undef. Fold away inserting ↵Craig Topper2017-01-261-0/+8
| | | | | | undef subvectors. llvm-svn: 293152
* Fix buildbot failures introduced by 293036Artur Pilipenko2017-01-251-2/+5
| | | | | | Fix unused variable, specify types explicitly to make VC compiler happy. llvm-svn: 293039
OpenPOWER on IntegriCloud