summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [DAGCombiner] widen zext of popcount based on target supportSanjay Patel2019-10-251-0/+12
| | | | | | | | | | | | | | | | zext (ctpop X) --> ctpop (zext X) This is a prerequisite step for canonicalizing in the other direction (narrow the popcount) in IR - PR43688: https://bugs.llvm.org/show_bug.cgi?id=43688 I'm not sure if any other targets are affected, but I found a missing fold for PPC, so added tests based on that. The reason we widen all the way to 64-bit in these tests is because the initial DAG looks something like this: t5: i8 = ctpop t4 t6: i32 = zero_extend t5 <-- created based on IR, but unused node? t7: i64 = zero_extend t5 Differential Revision: https://reviews.llvm.org/D69127
* Fix cppcheck shadow variable warning. NFCI.Simon Pilgrim2019-10-241-6/+6
|
* [AArch64][SVE] Add SPLAT_VECTOR ISD NodeGraham Hunter2019-10-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | Adds a new ISD node to replicate a scalar value across all elements of a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot be used. Fixes up default type legalization for scalable vectors after the new MVT type ranges were introduced. At present I only use this node for scalable vectors. A DAGCombine has been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all elements are the same, but only if the default operation action of Expand has been overridden by the target. I've only added result promotion legalization for scalable vector i8/i16/i32/i64 types in AArch64 for now. Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy Reviewed By: jmolloy Differential Revision: https://reviews.llvm.org/D47775 llvm-svn: 375222
* [DAGCombine][ARM] Enable extending masked loadsSam Parker2019-10-171-0/+39
| | | | | | | | | | | Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085
* [X86] Make memcmp() use PTEST if possible and also enable AVX1David Zarzycki2019-10-151-1/+3
| | | | llvm-svn: 374922
* [DAGCombiner] fold select-of-constants based on sign-bit testSanjay Patel2019-10-151-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | Examples: i32 X > -1 ? C1 : -1 --> (X >>s 31) | C1 i8 X < 0 ? C1 : 0 --> (X >>s 7) & C1 This is a small generalization of a fold requested in PR43650: https://bugs.llvm.org/show_bug.cgi?id=43650 The sign-bit of the condition operand can be used as a mask for the true operand: https://rise4fun.com/Alive/paT Note that we already handle some of the patterns (isNegative + scalar) because there's an over-specialized, yet over-reaching fold for that in foldSelectCCToShiftAnd(). It doesn't use any TLI hooks, so I can't easily rip out that code even though we're duplicating part of it here. This fold is guarded by TLI.convertSelectOfConstantsToMath(), so it should not cause problems for targets that prefer select over shift. Also worth noting: I thought we could generalize this further to include the case where the true operand of the select is not constant, but Alive says that may allow poison to pass through where it does not in the original select form of the code. Differential Revision: https://reviews.llvm.org/D68949 llvm-svn: 374902
* [DAGCombiner] fold vselect-of-constants to shiftSanjay Patel2019-10-111-0/+9
| | | | | | | | | | The diffs suggest that we are missing some more basic analysis/transforms, but this keeps the vector path in sync with the scalar (rL374397). This is again a preliminary step for introducing the reverse transform in IR as proposed in D63382. llvm-svn: 374555
* [DAGCombiner] fold select-of-constants to shiftSanjay Patel2019-10-101-3/+12
| | | | | | | | | | | | | | | | | | This reverses the scalar canonicalization proposed in D63382. Pre: isPowerOf2(C1) %r = select i1 %cond, i32 C1, i32 0 => %z = zext i1 %cond to i32 %r = shl i32 %z, log2(C1) https://rise4fun.com/Alive/Z50 x86 already tries to fold this pattern, but it isn't done uniformly, so we still see a diff. AArch64 probably should enable the TLI hook to benefit too, but that's a follow-on. llvm-svn: 374397
* [DAGCombiner] reduce code duplication; NFCSanjay Patel2019-10-101-2/+4
| | | | llvm-svn: 374370
* [DAGCombine] Match more patterns for half word bswapAmaury Sechet2019-10-101-29/+29
| | | | | | | | | | | | | | Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68250 llvm-svn: 374340
* Conservatively add volatility and atomic checks in a few placesPhilip Reames2019-10-091-1/+9
| | | | | | | | | | | | As background, starting in D66309, I'm working on support unordered atomics analogous to volatile flags on normal LoadSDNode/StoreSDNodes for X86. As part of that, I spent some time going through usages of LoadSDNode and StoreSDNode looking for cases where we might have missed a volatility check or need an atomic check. I couldn't find any cases that clearly miscompile - i.e. no test cases - but a couple of pieces in code loop suspicious though I can't figure out how to exercise them. This patch adds defensive checks and asserts in the places my manual audit found. If anyone has any ideas on how to either a) disprove any of the checks, or b) hit the bug they might be fixing, I welcome suggestions. Differential Revision: https://reviews.llvm.org/D68419 llvm-svn: 374261
* [X86][AVX] Access a scalar float/double as a free extract from a broadcast ↵Simon Pilgrim2019-10-061-0/+5
| | | | | | | | | | | | | | | | load (PR43217) If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element. This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted. Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper. Fixes PR43217 Differential Revision: https://reviews.llvm.org/D68544 llvm-svn: 373871
* Revert [DAGCombine] Match more patterns for half word bswapSanjay Patel2019-10-061-29/+29
| | | | | | | | This reverts r373850 (git commit 25ba49824d2d4f2347b4a7cb1623600a76ce9433) This patch appears to cause multiple codegen regression test failures - http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10680 llvm-svn: 373853
* [DAGCombine] Match more patterns for half word bswapAmaury Sechet2019-10-061-29/+29
| | | | | | | | | | | | | | Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68250 llvm-svn: 373850
* [DAGCombiner] add operation legality checks before creating shift ops (PR43542)Sanjay Patel2019-10-031-1/+6
| | | | | | | | | | | | | | As discussed on llvm-dev and: https://bugs.llvm.org/show_bug.cgi?id=43542 ...we have transforms that assume shift operations are legal and transforms to use them are profitable, but that may not hold for simple targets. In this case, the MSP430 target custom lowers shifts by repeating (many) simpler/fixed ops. That can be avoided by keeping this code as setcc/select. Differential Revision: https://reviews.llvm.org/D68397 llvm-svn: 373666
* [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook ↵Simon Pilgrim2019-10-011-276/+32
| | | | | | | | | | | | | | | | (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). Partial reversion of rL372756 - I've identified the infinite loop issue inside the X86 override but haven't fixed it yet so I've only (re)committed the common TargetLowering refactoring part of the patch. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 373343
* [DAGCombiner] Clang format MatchRotate. NFCAmaury Sechet2019-09-301-4/+6
| | | | llvm-svn: 373269
* [DAGCombiner] Update MatchRotate so that it returns an SDValue. NFCAmaury Sechet2019-09-301-22/+21
| | | | llvm-svn: 373260
* [TargetLowering] Make allowsMemoryAccess methode virtual.Thomas Raoux2019-09-261-1/+1
| | | | | | | | | | | Rename old function to explicitly show that it cares only about alignment. The new allowsMemoryAccess call the function related to alignment by default and can be overridden by target to inform whether the memory access is legal or not. Differential Revision: https://reviews.llvm.org/D67121 llvm-svn: 372935
* [DAGCombiner] add one-use restriction to vector transform with cheap extractSanjay Patel2019-09-251-1/+1
| | | | | | | | | | We might be able to do better on the example in the test, but in general, we should not scalarize a splatted vector binop if there are other uses of the binop. Otherwise, we can end up with code as we had - a scalar op that is redundant with a vector op. llvm-svn: 372886
* Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression ↵Ilya Biryukov2019-09-241-32/+276
| | | | | | | | | | to a target hook (PR42863) Reason: this caused severe compile time regressions in JAX. See email thread of original revision on llvm-commits for details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html llvm-svn: 372756
* [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook ↵Simon Pilgrim2019-09-191-276/+32
| | | | | | | | | | | | | | | | (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and includes a demonstration X86 implementation. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). I've only begun to replace X86's FNEG handling here, handling FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA negatation propagation). We can build on this in future patches. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 372333
* [DAGCombiner] Add node to the worklist in topological order in ↵Amaury Sechet2019-09-191-3/+3
| | | | | | | | | | | | | | | | scalarizeExtractedVectorLoad Summary: As per title. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66661 llvm-svn: 372327
* [DAG] Add SelectionDAG::MaxRecursionDepth constantSimon Pilgrim2019-09-191-2/+3
| | | | | | | | | | As commented on D67557 we have a lot of uses of depth checks all using magic numbers. This patch adds the SelectionDAG::MaxRecursionDepth constant and moves over some general cases to use this explicitly. Differential Revision: https://reviews.llvm.org/D67711 llvm-svn: 372315
* [DAGCombine][ARM][X86] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) foldRoman Lebedev2019-09-181-0/+12
| | | | | | | | | | | | | | | | | | | Summary: `DAGCombiner::visitADDLikeCommutative()` already has a sibling fold: `(add X, Carry) -> (addcarry X, 0, Carry)` This fold, as suggested by @efriedma, helps recover from //some// of the regressions of D62266 Reviewers: efriedma, deadalnix Subscribers: javed.absar, kristof.beyls, llvm-commits, efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D62392 llvm-svn: 372259
* [SDAG] Update generic code to conservatively check for isAtomic in addition ↵Philip Reames2019-09-121-46/+74
| | | | | | | | | | to isVolatile This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later. See D66309 for context. Differential Revision: https://reviews.llvm.org/D66318 llvm-svn: 371786
* [DAGCombiner][X86] Pass the CmpOpVT to reduceSelectOfFPConstantLoads so X86 ↵Craig Topper2019-09-121-1/+1
| | | | | | | | | | | can exclude fp128 compares. The X86 decision assumes the compare will produce a result in an XMM register, but that can't happen for an fp128 compare since those go to a libcall the returns an i32. Pass the VT so X86 can check the type. llvm-svn: 371775
* [DAGCombine] visitFDIV - Use isCheaperToUseNegatedFPOps helper for (fdiv ↵Simon Pilgrim2019-09-121-15/+5
| | | | | | | | (fneg X), (fneg Y)) -> (fdiv X, Y). NFCI. Minor cleanup to use equivalent helper code. llvm-svn: 371724
* [DAGCombiner] Improve division estimation of floating points.Qiu Chaofan2019-09-121-11/+33
| | | | | | | | | | | | | Current implementation of estimating divisions loses precision since it estimates reciprocal first and does multiplication. This patch is to re-order arithmetic operations in the last iteration in DAGCombiner to improve the accuracy. Reviewed By: Sanjay Patel, Jinsong Ji Differential Revision: https://reviews.llvm.org/D66050 llvm-svn: 371713
* [SelectionDAG] Remove ISD::FP_ROUND_INREGCraig Topper2019-09-091-18/+0
| | | | | | | | | | | | I don't think anything in tree creates this node. So all of this code appears to be dead. Code coverage agrees http://lab.llvm.org:8080/coverage/coverage-reports/llvm/coverage/Users/buildslave/jenkins/workspace/clang-stage2-coverage-R/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp.html Differential Revision: https://reviews.llvm.org/D67312 llvm-svn: 371431
* [DAGCombiner][X86][ARM] Teach visitMULO to fold multiplies with 0 to 0 and ↵Craig Topper2019-09-081-3/+19
| | | | | | | | | no carry. I modified the ARM test to use two inputs instead of 0 so the test hopefully still tests what was intended. llvm-svn: 371344
* [Intrinsic] Add the llvm.umul.fix.sat intrinsicBjorn Pettersson2019-09-071-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Add an intrinsic that takes 2 unsigned integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Patch by: leonardchan, bjope Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel Reviewed By: leonardchan Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57836 llvm-svn: 371308
* [DAGCombiner] try to form test+set out of shift+mask patternsSanjay Patel2019-09-021-0/+57
| | | | | | | | | | | | | | | | | | | | | The motivating bugs are: https://bugs.llvm.org/show_bug.cgi?id=41340 https://bugs.llvm.org/show_bug.cgi?id=42697 As discussed there, we could view this as a failure of IR canonicalization, but then we would need to implement a backend fixup with target overrides to get this right in all cases. Instead, we can just view this as a codegen opportunity. It's not even clear for x86 exactly when we should favor test+set; some CPUs have better theoretical throughput for the ALU ops than bt/test. This patch is made more complicated than I expected because there's an early DAGCombine for 'and' that can change types of the intermediate ops via trunc+anyext. Differential Revision: https://reviews.llvm.org/D66687 llvm-svn: 370668
* [DAGCombiner] improve throughput of shift+logic+shiftSanjay Patel2019-09-011-0/+74
| | | | | | | | | | | | | | | | | | | | | | | | The motivating case for this is a long way from here: https://bugs.llvm.org/show_bug.cgi?id=43146 ...but I think this is where we have to start. We need to canonicalize/optimize sequences of shift and logic to ease pattern matching for things like bswap and improve perf in general. But without the artificial limit of '!LegalTypes' (early combining), there are a lot of test diffs, and not all are good. In the minimal tests added for this proposal, x86 should have better throughput in all cases. AArch64 is neutral for scalar tests because it can fold shifts into bitwise logic ops. There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns: https://rise4fun.com/Alive/VlI https://rise4fun.com/Alive/n1m https://rise4fun.com/Alive/1Vn Differential Revision: https://reviews.llvm.org/D67021 llvm-svn: 370617
* [DAGCombiner] clean up code in visitShiftByConstant()Sanjay Patel2019-08-311-25/+20
| | | | | | | This is not quite NFC because the SDLoc propagation is changed, but there are no regression test diffs from that. llvm-svn: 370587
* [DAGCombiner] Match (add X, X) as (shl X, 1) when detecting rotate.Amaury Sechet2019-08-311-4/+20
| | | | | | | | | | | | | | Summary: The combiner transforms (shl X, 1) into (add X, X). Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66882 llvm-svn: 370578
* [DAGCombiner] Don't create illegal narrow storesJames Molloy2019-08-311-2/+7
| | | | | | | | | | | | | Narrowing stores when the target doesn't support the narrow version forces the target to expand into a load-modify-store sequence, which is highly suboptimal. The information narrowing throws away (legality of the inverse transform) is hard to re-analyze. If the target doesn't support a store of the narrow type, don't narrow even in pre-legalize mode. No test as this is DAGCombiner and depends on target bits. llvm-svn: 370576
* [DAGCombine] ReduceLoadWidth - remove duplicate SDLoc. NFCI.Simon Pilgrim2019-08-301-3/+2
| | | | | | SDLoc(N0) and SDLoc(cast<LoadSDNode>(N0)) should be equivalent. llvm-svn: 370498
* [DAGCombine] visitVSELECT - remove equivalent getValueType() call. NFCI.Simon Pilgrim2019-08-301-1/+0
| | | | llvm-svn: 370489
* [DAGCombine] visitVSELECT - remove duplicate getOperand calls. NFCI.Simon Pilgrim2019-08-301-4/+3
| | | | llvm-svn: 370478
* [DAGCombine] visitVSELECT - use getShiftAmountTy for shift amounts.Simon Pilgrim2019-08-301-3/+3
| | | | llvm-svn: 370471
* [DAGCombine] visitMULHS - use getScalarValueSizeInBits() to make safe for ↵Simon Pilgrim2019-08-301-1/+1
| | | | | | | | vector types. This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total. llvm-svn: 370468
* [DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node ↵Simon Pilgrim2019-08-301-8/+8
| | | | | | | | | | is all zeros Return a proper zero vector, just in case some elements are undef. Noticed by inspection after dealing with a similar issue in PR43159. llvm-svn: 370460
* [DAGCombine] Fix shadow variable warnings. NFCI.Simon Pilgrim2019-08-291-12/+12
| | | | llvm-svn: 370365
* Fix signed/unsigned comparison warning. NFCI.Simon Pilgrim2019-08-291-1/+2
| | | | llvm-svn: 370333
* [DAGCombiner] (insert_vector_elt (vector_shuffle X, Y), (extract_vector_elt ↵Amaury Sechet2019-08-291-4/+43
| | | | | | | | | | | | | | | | X, N), IdxC) -> (vector_shuffle X, Y) Summary: This is beneficial when the shuffle is only used once and end up being generated in a few places when some node is combined into a shuffle. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66718 llvm-svn: 370326
* [DAGCombine] Fix cppcheck shadow variable warning. NFCI.Simon Pilgrim2019-08-281-4/+4
| | | | | | We already have an outer Ops variable. llvm-svn: 370197
* [TargetLowering] Add buildLegalVectorShuffle facility to help build legal ↵Amaury Sechet2019-08-281-63/+42
| | | | | | | | | | | | | | | | shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 llvm-svn: 370190
* [DAGCombine] Remove LoadedSlice::Cost default 'ForCodeSize' constructor ↵Simon Pilgrim2019-08-281-3/+3
| | | | | | | | arguments. NFCI. These were always being passed in and it allowed me to add the explicit tag to stop a cppcheck warning about 1 argument constructors. llvm-svn: 370189
* [DAGCombiner] cancel fnegs from multiplied operands of FMASanjay Patel2019-08-271-15/+29
| | | | | | | | | | | | | | | | | | (-X) * (-Y) + Z --> X * Y + Z This is a missing optimization that shows up as a potential regression in D66050, so we should solve it first. We appear to be partly missing this fold in IR as well. We do handle the simpler case already: (-X) * (-Y) --> X * Y And it might be beneficial to make the constraint less conservative (eg, if both operands are cheap, but not necessarily cheaper), but that causes infinite looping for the existing fmul transform. Differential Revision: https://reviews.llvm.org/D66755 llvm-svn: 370071
OpenPOWER on IntegriCloud