summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [DAGCombiner] reduce code duplication; NFCSanjay Patel2019-05-101-10/+8
| | | | llvm-svn: 360462
* [CodeGen] Add comment about FSUB <-> FNEG xformsCameron McInally2019-05-091-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D61741 llvm-svn: 360366
* [DAGCombiner] Limit number of nodes explored as store candidates.Florian Hahn2019-05-091-2/+5
| | | | | | | | | | | | | | To find the candidates to merge stores we iterate over all nodes in a chain for each store, which leads to quadratic compile times for large basic blocks with a large number of stores. Reviewers: niravd, spatel, craig.topper Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D61511 llvm-svn: 360357
* [NFC] Add a static function to do the endian checkQingShan Zhang2019-05-081-15/+37
| | | | | | | | Add a new function to do the endian check, as I will commit another patch later, which will also need the endian check. Differential Revision: https://reviews.llvm.org/D61236 llvm-svn: 360226
* [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactorFlorian Hahn2019-05-071-2/+3
| | | | | | | | | | | | | | | | | | When simplifying TokenFactors, we potentially iterate over all operands of a large number of TokenFactors. This causes quadratic compile times in some cases and the large token factors cause additional scalability problems elsewhere. This patch adds some limits to the number of nodes explored for the cases mentioned above. Reviewers: niravd, spatel, craig.topper Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D61397 llvm-svn: 360171
* Fix pr33010, a 2 year old crashing regressionPhilip Reames2019-05-061-0/+4
| | | | | | | | The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>. The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this. Instead, simply prevent it's creation. Arguably, we shouldn't be using *Target*FrameIndices in StatepointLowering at all, but that's a much deeper change. llvm-svn: 360090
* [SDAG][AArch64] Boolean and/or reduce to umax/min reduce (PR41635)Nikita Popov2019-05-061-0/+12
| | | | | | | | | | | This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635 by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is legal but former is not) for zero-or-all-ones boolean reductions (which are detected based on sign bits). Differential Revision: https://reviews.llvm.org/D61398 llvm-svn: 360054
* [DAGCombine] Remove repeated variables. NFCI.Simon Pilgrim2019-05-031-8/+3
| | | | llvm-svn: 359915
* [DAGCombiner] try repeated fdiv divisor transform before building estimate ↵Sanjay Patel2019-05-021-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | (2nd try) The original patch was committed at rL359398 and reverted at rL359695 because of infinite looping. This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop. Original commit message: This was originally part of D61028, but it's an independent diff. If we try the repeated divisor reciprocal transform before producing an estimate sequence, then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5 vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the full-precision division is only 3 cycle throughput, so that's probably the better perf default option and avoids problems from x86's inaccurate estimates. The last 2 tests show that users still have the option to override the defaults by using the function attributes for reciprocal estimates, but those patterns are potentially made faster by converting the vector ops (including ymm ops) to scalar math. Differential Revision: https://reviews.llvm.org/D61149 llvm-svn: 359793
* Revert "[DAGCombiner] try repeated fdiv divisor transform before building ↵Sanjay Patel2019-05-011-3/+3
| | | | | | | | | estimate" This reverts commit fb9a5307a94e6f1f850e4d89f79103b123f16279 (rL359398) because it can cause an infinite loop due to opposing combines. llvm-svn: 359695
* [DAGCombiner] Do not generate ISD::ADDE node if adde is not legal for the ↵Zi Xuan Wu2019-04-301-1/+3
| | | | | | | | | | | | | | | target when combine ISD::TRUNC node Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry), if adde is not legal for the target. Even it's at type-legalize phase. Because adde is special and will not be legalized at operation-legalize phase later. This fixes: PR40922 https://bugs.llvm.org/show_bug.cgi?id=40922 Differential Revision: https://reviews.llvm.org//D60854 llvm-svn: 359532
* [DAG] Refactor DAGCombiner::ReassociateOpsBjorn Pettersson2019-04-291-45/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extract the logic for doing reassociations from DAGCombiner::reassociateOps into a helper function DAGCombiner::reassociateOpsCommutative, and use that helper to trigger reassociation on the original operand order, or the commuted operand order. Codegen is not identical since the operand order will be different when doing the reassociations for the commuted case. That causes some unfortunate churn in some test cases. Apart from that this should be NFC. Reviewers: spatel, craig.topper, tstellar Reviewed By: spatel Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61199 llvm-svn: 359476
* [DAGCombiner] try repeated fdiv divisor transform before building estimateSanjay Patel2019-04-281-3/+3
| | | | | | | | | | | | | | | | | | This was originally part of D61028, but it's an independent diff. If we try the repeated divisor reciprocal transform before producing an estimate sequence, then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5 vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the full-precision division is only 3 cycle throughput, so that's probably the better perf default option and avoids problems from x86's inaccurate estimates. The last 2 tests show that users still have the option to override the defaults by using the function attributes for reciprocal estimates, but those patterns are potentially made faster by converting the vector ops (including ymm ops) to scalar math. Differential Revision: https://reviews.llvm.org/D61149 llvm-svn: 359398
* [DAGCombine] Cleanup visitEXTRACT_SUBVECTOR. NFCI.Simon Pilgrim2019-04-261-10/+11
| | | | | | Use ArrayRef::slice, reduce some rather awkward long lines for legibility and run clang-format. llvm-svn: 359326
* [X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 ↵Simon Pilgrim2019-04-261-0/+3
| | | | | | | | | | targets (PR40758) As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs). Differential Revision: https://reviews.llvm.org/D61068 llvm-svn: 359293
* [DAGCombiner] scale repeated FP divisor by splat factorSanjay Patel2019-04-241-3/+13
| | | | | | | | | | | | | | If we have a vector FP division with a splatted divisor, use the existing transform that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then potentially be converted into a scalar FP division by existing combines (rL358984) as seen in the tests here. That can be a potentially big perf difference if scalar fdiv has better timing (including avoiding possible frequency throttling for vector ops). Differential Revision: https://reviews.llvm.org/D61028 llvm-svn: 359147
* [DAGCombiner] generalize binop-of-splats scalarizationSanjay Patel2019-04-231-46/+38
| | | | | | | | | | | | | | If we only match build vectors, we can miss some patterns that use shuffles as seen in the affected tests. Note that the underlying calls within getSplatSourceVector() have the potential for compile-time explosion because of exponential recursion looking through binop opcodes, but currently the list of supported opcodes is very limited. Both of those problems should be addressed in follow-up patches. llvm-svn: 358984
* [DAGCombiner] Combine OR as ADD when no common bits are setBjorn Pettersson2019-04-231-16/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The DAGCombiner is rewriting (canonicalizing) an ISD::ADD with no common bits set in the operands as an ISD::OR node. This could sometimes result in "missing out" on some combines that normally are performed for ADD. To be more specific this could happen if we already have rewritten an ADD into OR, and later (after legalizations or combines) we expose patterns that could have been optimized if we had seen the OR as an ADD (e.g. reassociations based on ADD). To make the DAG combiner less sensitive to if ADD or OR is used for these "no common bits set" ADD/OR operations we now apply most of the ADD combines also to an OR operation, when value tracking indicates that the operands have no common bits set. Reviewers: spatel, RKSimon, craig.topper, kparzysz Reviewed By: spatel Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59758 llvm-svn: 358965
* [DAGCombiner] make variable name less ambiguous; NFCSanjay Patel2019-04-221-4/+4
| | | | llvm-svn: 358886
* [DAGCombiner] prepare shuffle-of-splat to handle more patterns; NFCSanjay Patel2019-04-221-11/+16
| | | | llvm-svn: 358884
* [DAGCombine] Add SimplifyDemandedBits helper that handles demanded elts mask ↵Simon Pilgrim2019-04-171-4/+13
| | | | | | | | as well The other SimplifyDemandedBits helpers become wrappers to this new demanded elts variant. llvm-svn: 358585
* [TargetLowering] Rename preferShiftsToClearExtremeBits and ↵Simon Pilgrim2019-04-161-2/+2
| | | | | | | | | | | | shouldFoldShiftPairToMask (PR41359) As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious. shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair llvm-svn: 358526
* [DAGCombiner] Add missing flag to addressing mode checkLuis Marques2019-04-161-0/+2
| | | | | | | | | | | | The checks in `canFoldInAddressingMode` tested for addressing modes that have a base register but didn't set the `HasBaseReg` flag to true (it's false by default). This patch fixes that. Although the omission of the flag was technically incorrect it had no known observable impact, so no tests were changed by this patch. Differential Revision: https://reviews.llvm.org/D60314 llvm-svn: 358502
* [DAGCombiner] narrow shuffle of concatenated vectorsSanjay Patel2019-04-121-0/+50
| | | | | | | | | | | | | | | | | | // shuffle (concat X, undef), (concat Y, undef), Mask --> // concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1) The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements. The x86 changes look neutral or better. There's one test with an extra instruction, but that could be reversed for a subtarget with the right attributes. But by default, we want to avoid the 256-bit op when possible (in my motivating benchmark, a handful of ymm ops sprinkled into a sequence of xmm ops are triggering frequency throttling on Haswell resulting in significantly worse perf). Differential Revision: https://reviews.llvm.org/D60545 llvm-svn: 358291
* [DAGCombiner] refactor narrowing of extracted vector binop; NFCSanjay Patel2019-04-111-20/+19
| | | | | | | There's a TODO comment about handling patterns with insert_subvector, and we do want to match that. llvm-svn: 358187
* [DAGCombiner][x86] scalarize inserted vector FP opsSanjay Patel2019-04-111-0/+58
| | | | | | | | | | | | | | | | | | | | | | | // bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) --> // build_vec ...undef, (bo x, y), undef... The lifetime of the nodes in these examples is different for variables versus constants, but they are all build vectors briefly, so I'm proposing to catch them in this form to handle all of the leading examples in the motivating test file. Before we have build vectors, we might have insert_vector_element. After that, we might have scalar_to_vector and constant pool loads. It's going to take more work to ensure that FP vector operands are getting simplified with undef elements, so this transform can apply more widely. In a non-loose FP environment, we are likely simplifying FP elements to NaN values rather than undefs. We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors yet. Differential Revision: https://reviews.llvm.org/D60514 llvm-svn: 358172
* [DAGCombiner][X86][SystemZ] Canonicalize SSUBO with immediate RHS to SADDO ↵Craig Topper2019-04-091-0/+8
| | | | | | | | | | by negating the immediate. This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate. Differential Revision: https://reviews.llvm.org/D60020 llvm-svn: 358027
* [DAGCombiner][x86] scalarize splatted vector FP opsSanjay Patel2019-04-051-2/+19
| | | | | | | | | | | | | | | There are a variety of vector patterns that may be profitably reduced to a scalar op when scalar ops are performed using a subset (typically, the first lane) of the vector register file. For x86, this is true for float/double ops and element 0 because insert/extract is just a sub-register rename. Other targets should likely enable the hook in a similar way. Differential Revision: https://reviews.llvm.org/D60150 llvm-svn: 357760
* [IR] Refactor attribute methods in Function class (NFC)Evandro Menezes2019-04-041-4/+4
| | | | | | | | Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731
* [DAGCombiner] Rename variables Demanded -> DemandedBits/DemandedElts. NFCI.Simon Pilgrim2019-04-031-9/+10
| | | | | | Use consistent variable names down the SimplifyDemanded* call stack so debugging isn't such a annoyance. llvm-svn: 357602
* [DAGCombiner] loosen restrictions for moving shuffles after vector binopSanjay Patel2019-04-031-16/+19
| | | | | | | | | | | | There are 3 changes to make this correspond to the same transform in instcombine: 1. Remove the legality check - we can't create anything less legal than we started with. 2. Ease the use restriction, so we only bail out if both operands have >1 use. 3. Ease the use restriction for binops with a repeated operand (eg, mul x, x). As discussed in D60150, there's a scalarization opportunity that will be made easier by allowing this transform more generally. llvm-svn: 357580
* [DAGCombine] Don't use getZExtValue() until we know the constant is in range.Simon Pilgrim2019-04-031-2/+2
| | | | | | Noticed during prep for a patch for PR40758. llvm-svn: 357571
* Revert r357256 "[DAGCombine] Improve Lifetime node chains."Hans Wennborg2019-04-031-31/+0
| | | | | | | | | | | | | | | | | | | | | | | | As it caused a pathological compile-time regressionin V8, see PR41352. > Improve both start and end lifetime nodes chain dependencies. > > Reviewers: courbet > > Reviewed By: courbet > > Subscribers: hiraditya, llvm-commits > > Tags: #llvm > > Differential Revision: https://reviews.llvm.org/D59795 This also reverts the follow-up r357309: > [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop. > > Avoid EXPENSIVE_CHECK failure. NFCI. llvm-svn: 357563
* [DAGCombiner] reduce code duplication; NFCSanjay Patel2019-04-021-8/+8
| | | | llvm-svn: 357498
* [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop.Nirav Dave2019-03-291-8/+9
| | | | | | Avoid EXPENSIVE_CHECK failure. NFCI. llvm-svn: 357309
* [DAG] Avoid redundancy in StoreMerge TokenFactor generation.Nirav Dave2019-03-291-2/+2
| | | | | | | Avoid generating redundant TokenFactor when all merged stores have the same chain. llvm-svn: 357299
* [DAGCombine] Prune unnused nodes.Nirav Dave2019-03-291-15/+48
| | | | | | | | | | | | | | | | | | | Summary: Nodes that have no uses are eventually pruned when they are selected from the worklist. Record nodes newly added to the worklist or DAG and perform pruning after every combine attempt. Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight Reviewed By: jyknight Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58070 llvm-svn: 357283
* [DAG] Set up infrastructure to avoid smart constructor-based dangling nodesNirav Dave2019-03-291-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Various SelectionDAG non-combine operations (e.g. the getNode smart constructor and legalization) may leave dangling nodes by applying optimizations without fully pruning unused result values. This results in nodes that are never added to the worklist and therefore can not be pruned. Add a node inserter for the combiner to make sure such nodes have the chance of being pruned. This allows a number of additional peephole optimizations. Reviewers: efriedma, RKSimon, craig.topper, jyknight Reviewed By: jyknight Subscribers: msearles, jyknight, sdardis, nemanjai, javed.absar, hiraditya, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58068 llvm-svn: 357279
* [DAGCombiner] simplify shuffle of shuffleSanjay Patel2019-03-291-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | After investigating the examples from D59777 targeting an SSE4.1 machine, it looks like a very different problem due to how we map illegal types (256-bit in these cases). We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand. We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that generality means it is limited to patterns with a one-use constraint, and the examples here have 2 uses. We don't need any uses or legality limitations for a simplification (no new value is created). It looks like we miss this pattern in IR too. In one of the zext examples here, we have shuffle masks like this: Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7> Shuf = vector_shuffle<4,u,6,7,u,u,u,u> ...so that's moving the high half of the 1st vector into the low half. But the high half of the 1st vector is already identical to the low half. Differential Revision: https://reviews.llvm.org/D59961 llvm-svn: 357258
* [DAGCombine] Improve Lifetime node chains.Nirav Dave2019-03-291-0/+30
| | | | | | | | | | | | | | | | Improve both start and end lifetime nodes chain dependencies. Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59795 llvm-svn: 357256
* [DAGCombiner] fold sext into decrementSanjay Patel2019-03-291-0/+9
| | | | | | | | | | | | | | | | | | | | | | | This is a sibling to rL357178 that I noticed we'd hit if we chose an alternate transform in D59818. %z = zext i8 %x to i32 %dec = add i32 %z, -1 %r = sext i32 %dec to i64 => %z2 = zext i8 %x to i64 %r = add i64 %z2, -1 https://rise4fun.com/Alive/kPP The x86 vector diffs show a slight regression, so there's a chance that we should limit this and the previous transform to scalars. But given that we allowed vectors before, I'm matching that behavior here. We should change both transforms together if that's the right thing to do. llvm-svn: 357254
* [DAGCombiner] fold sext into negationSanjay Patel2019-03-281-0/+10
| | | | | | | | | | | | | | As noted in D59818: %z = zext i8 %x to i32 %neg = sub i32 0, %z %r = sext i32 %neg to i64 => %z2 = zext i8 %x to i64 %r = sub i64 0, %z2 https://rise4fun.com/Alive/KzSR llvm-svn: 357178
* [DAGCombiner] Fold truncate(build_vector(x,y)) -> ↵Simon Pilgrim2019-03-281-1/+15
| | | | | | | | | | | | build_vector(truncate(x),truncate(y)) If scalar truncates are free, attempt to pre-truncate build_vectors source operands. Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering. Differential Revision: https://reviews.llvm.org/D59654 llvm-svn: 357161
* [DAGCombiner] Teach TokenFactor pruning to peek through lifetime nodesNirav Dave2019-03-271-0/+2
| | | | | | | | | | | | | | Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations. Reviewers: courbet, jyknight Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59897 llvm-svn: 357121
* Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes."Nirav Dave2019-03-271-12/+0
| | | | | | | This patch appears to trigger very large compile time increases in halide builds. llvm-svn: 357116
* [DAGCombiner] Unify Lifetime and memory Op aliasing.Nirav Dave2019-03-271-69/+91
| | | | | | | | | | | | | | | | | | | Rework BaseIndexOffset and isAlias to fully work with lifetime nodes and fold in lifetime alias analysis. This is mostly NFC. Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59794 llvm-svn: 357070
* [DAGCombine] Refactor GatherAllAliases. NFCI.Nirav Dave2019-03-271-65/+66
| | | | llvm-svn: 357069
* [DAGCombiner] Don't allow addcarry if the carry producer is illegal.Jonas Paulsson2019-03-271-0/+4
| | | | | | | | | | | | | | | getAsCarry() checks that the input argument is a carry-producing node before allowing a transformation to addcarry. This patch adds a check to make sure that the carry-producing node is legal. If it is not, it may not remain in a form that is manageable by the target backend. The test case caused a compilation failure during instruction selection for this reason on SystemZ. Patch by Ulrich Weigand. Review: Sanjay Patel https://reviews.llvm.org/D59822 llvm-svn: 357052
* [DAG] Avoid smart constructor-based dangling nodes.Nirav Dave2019-03-261-0/+12
| | | | | | | | | | | | | | | Various SelectionDAG non-combine operations (e.g. the getNode smart constructor and legalization) may leave dangling nodes by applying optimizations or not fully pruning unused result values. This can result in nodes that are never added to the worklist and therefore can not be pruned. Add a node inserter as the current node deleter to make sure such nodes have the chance of being pruned. Many minor changes, mostly positive. llvm-svn: 356996
* [TargetLowering] Add SimplifyDemandedBits support for ISD::INSERT_VECTOR_ELTSimon Pilgrim2019-03-261-3/+7
| | | | | | | | | | | | This helps us relax the extension of a lot of scalar elements before they are inserted into a vector. Its exposes an issue in DAGCombiner::convertBuildVecZextToZext as some/all the zero-extensions may be relaxed to ANY_EXTEND, so we need to handle that case to avoid a couple of AVX2 VPMOVZX test regressions. Once this is in it should be easier to fix a number of remaining failures to fold loads into VBROADCAST nodes. Differential Revision: https://reviews.llvm.org/D59484 llvm-svn: 356989
OpenPOWER on IntegriCloud