summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
* [SelectionDAG] Do minnum->minimum at legalization time instead of building timeBenjamin Kramer2019-07-012-16/+17
| | | | | | | | The SDAGBuilder behavior stems from the days when we didn't have fast math flags available in SDAG. We do now and doing the transformation in the legalizer has the advantage that it also works for vector types. llvm-svn: 364743
* [SelectionDAG] Use the memory VT instead of result VT for FoldingSet ↵Craig Topper2019-06-301-3/+2
| | | | | | | | | profiling in getMaskedLoad/getMaskedStore. This matches what is done by the Profile function. Otherwise CSE won't work properly. llvm-svn: 364717
* [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3)Roman Lebedev2019-06-271-0/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. This is a recommit, the original commit rL364563 was reverted in rL364568 because test-suite detected miscompile - the new comparison constant 'Q' was being computed incorrectly (we divided by `D0` instead of `D`). Original patch D50222 by @hermord (Dmytro Shynkevych) Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 llvm-svn: 364600
* Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM ↵Roman Lebedev2019-06-271-107/+0
| | | | | | | | | | | | | | | | | | case) (try 2)" *Appears* to break test-suite on http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790 FAIL: burg.execution_time FAIL: spiff.execution_time FAIL: employ.execution_time FAIL: llu.execution_time FAIL: gramschmidt.execution_time FAIL: fdtd-apml.execution_time This reverts commit r364563. llvm-svn: 364568
* [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)Roman Lebedev2019-06-271-0/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... Original patch D50222 by @hermord (Dmytro Shynkevych) This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. Original patch author: @hermord (Dmytro Shynkevych)! Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 llvm-svn: 364563
* [TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support.Simon Pilgrim2019-06-271-0/+18
| | | | llvm-svn: 364548
* [TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify ↵Simon Pilgrim2019-06-271-11/+21
| | | | | | partial splat shift amounts llvm-svn: 364541
* [ISEL][X86] Tracking of registers that forward call argumentsDjordje Todorovic2019-06-271-2/+8
| | | | | | | | | | | | | | | | | While lowering calls, collect info about registers that forward arguments into following function frame. We store such info into the MachineFunction of the call. This is used very late when dumping DWARF info about call site parameters. ([9/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60715 llvm-svn: 364516
* [X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awarenessRoman Lebedev2019-06-261-0/+6
| | | | | | | | | | | | | | | | | | Summary: (Not so) boringly identical to pattern a (D62786) Not yet sure how do deal with the last pattern c. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62793 llvm-svn: 364418
* [DAGCombine] visitEXTRACT_SUBVECTOR - add TODO for ↵Simon Pilgrim2019-06-261-0/+1
| | | | | | | | extract_subvector(bitcast()) support We support 'big to little' (e.g. extract_subvector(v16i8 bitcast(v2i64))) but not 'little to big' cases (e.g. extract_subvector(v2i64 bitcast(v16i8))) llvm-svn: 364405
* Teach the DAGCombine to fold this pattern(c1 and c2 is constant).QingShan Zhang2019-06-261-2/+28
| | | | | | | | | | | | | | | | | | // fold (sext (select cond, c1, c2)) -> (select cond, sext c1, sext c2) // fold (zext (select cond, c1, c2)) -> (select cond, zext c1, zext c2) // fold (aext (select cond, c1, c2)) -> (select cond, sext c1, sext c2) Sign extend the operands if it is any_extend, to keep the signess of the operands that, the other combine rule would apply. The any_extend is handled as zero extend for constants. i.e. t1: i8 = select t0, Constant:i8<-1>, Constant:i8<0> t2: i64 = any_extend t1 --> t3: i64 = select t0, Constant:i64<-1>, Constant:i64<0> --> t4: i64 = sign_extend_inreg t3 Differential Revision: https://reviews.llvm.org/D63318 llvm-svn: 364382
* [DAGCombine] combineRepeatedFPDivisors - recognize -1.0 / X as a reciprocalSimon Pilgrim2019-06-251-2/+2
| | | | | | Fixes issue identified by @nemanjai (Nemanja Ivanovic) in D62963 / rL363040 - infinite loop due to GetNegatedExpression fighting combineRepeatedFPDivisors resulting in fneg(fdiv(x,splat)) -> fneg(fmul(x,1.0/splat)) -> fmul(x,-1.0/splat) -> fmul(x,(-1.0 * 1.0)/splat) ...... llvm-svn: 364326
* [SDAG] expand ctpop != 1Sanjay Patel2019-06-251-11/+11
| | | | | | | | | | | | | | | | | Change the generic ctpop expansion to more efficiently handle a check for not-a-power-of-two value: (ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0) This is the inverted predicate sibling pattern that was added with: D63004 This should have been done before I changed IR canonicalization to favor this form with: rL364246 ...so if this requires revert/changing, the earlier commit may also need to modified. llvm-svn: 364319
* [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG supportSimon Pilgrim2019-06-251-2/+18
| | | | | | | | Add 'lowest' demanded elt -> bitcast fold to all *_EXTEND_VECTOR_INREG cases. Reapplies rL363856. llvm-svn: 364311
* [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ↵Simon Pilgrim2019-06-251-6/+4
| | | | | | | | | | | | ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. Reapplies rL363850 but now with legality checks added at rL364290 llvm-svn: 364303
* [SDAG] improve expansion of ctpop+setccSanjay Patel2019-06-251-11/+14
| | | | | | | | | This should not cause any visible change in output, but it's more efficient because we were producing non-canonical 'sub x, 1' and 'setcc ugt x, 0'. As mentioned in the TODO, we should also be handling the inverse predicate. llvm-svn: 364302
* [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ↵Simon Pilgrim2019-06-251-6/+6
| | | | | | | | | | | | ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. Reapplies rL363802 but now with legality checks added at rL364290 llvm-svn: 364299
* [VectorLegalizer] ↵Simon Pilgrim2019-06-251-0/+26
| | | | | | | | | | | | ExpandANY_EXTEND_VECTOR_INREG/ExpandZERO_EXTEND_VECTOR_INREG - widen source vector The *_EXTEND_VECTOR_INREG opcodes were relaxed back around rL346784 to support source vector widths that are smaller than the output - it looks like the legalizers were never updated to account for this. This patch inserts the smaller source vector into an undef vector of the same width of the result before performing the shuffle+bitcast to correctly handle this. Part of the yak shaving to solve the crashes from rL364264 and rL364272 llvm-svn: 364295
* [TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ↵Simon Pilgrim2019-06-251-6/+15
| | | | | | | | | | ZERO/ANY_EXTEND As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations || isOperationLegal cases. We'll improve X86 legality in future commits. llvm-svn: 364290
* [Codegen] TargetLowering::SimplifySetCC(): omit urem when possibleRoman Lebedev2019-06-251-0/+12
| | | | | | | | | | | | | | | | | | | | | | Summary: This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll` The missing fold, at least partially, looks trivial: https://rise4fun.com/Alive/Zsln i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two, and the `urem` is of something that may at most have a single bit set (or no bits set at all), the `urem` is not needed. Reviewers: RKSimon, craig.topper, xbolva00, spatel Reviewed By: xbolva00, spatel Subscribers: xbolva00, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63390 llvm-svn: 364286
* Revert r363802, r363850, and r363856 "[TargetLowering] SimplifyDemandedBits..."Craig Topper2019-06-251-26/+20
| | | | | | | | | | | | | | | | | | | | This reverts the following patches. "[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support" We can end up with an any_extend_vector_inreg with a 256 bit result type and a 128 bit result type. This is allowed by the ISD opcode, but the generic operation legalizer is only able to expand cases where the total vector width is the same. The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg. The SimplifyDemandedBits changes are allowing those nodes to become aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom handling and never lets them get to the generic legalizer. We need to do the same for aext_vec_inreg. llvm-svn: 364264
* [CodeGen] Add missing vector type legalization for ctlz_zero_undefRoland Froese2019-06-241-0/+2
| | | | | | | | | Widen vector result type for ctlz_zero_undef and cttz_zero_undef the same as ctlz and cttz. Differential Revision: https://reviews.llvm.org/D63463 llvm-svn: 364221
* CodeGen: Introduce a class for registersMatt Arsenault2019-06-242-2/+2
| | | | | | | | | Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg(). llvm-svn: 364191
* [DAGCombine] visitMUL - allow shift by zero in MulByConstant.Simon Pilgrim2019-06-241-6/+6
| | | | | | | | This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward. Fixes OSS Fuzz #15429 llvm-svn: 364179
* [SelectionDAG] Remove the code that attempts to calculate the alignment for ↵Craig Topper2019-06-232-27/+4
| | | | | | | | | | | | | | the second half of a split masked load/store. The code divides the alignment by 2 if the original alignment is equal to the original VT size. But this wouldn't be correct if the alignment was larger than the VT size. The memory operand object already takes care of calling MinAlign on the base alignment and the memory pointer offset. So we don't need any special code at all. llvm-svn: 364151
* [DAGCombine] narrowExtractedVectorBinOp - pull out repeated getOpcode(). NFCI.Simon Pilgrim2019-06-211-2/+2
| | | | llvm-svn: 364076
* [DAGCombine] narrowInsertExtractVectorBinOp - reuse "extract from insert" ↵Simon Pilgrim2019-06-211-11/+15
| | | | | | | | detection code. Move the "extract from insert detection code" into a lambda helper function. llvm-svn: 364059
* [DAGCombiner] Use getAPIntValue() instead of getZExtValue() where possible.Simon Pilgrim2019-06-201-21/+20
| | | | | | Better handling of out-of-i64-range values due to large integer types or from fuzz tests. llvm-svn: 363955
* [DAGCombiner][NFC] Remove unused varJordan Rupprecht2019-06-201-1/+0
| | | | llvm-svn: 363954
* [DAGCombiner] Support (shl (zext (srl x, C)), C) -> (zext (shl (srl x, C), ↵Simon Pilgrim2019-06-201-17/+19
| | | | | | | | C)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363929
* [DAGCombine] Add TODOs for some combines that should support non-uniform vectorsSimon Pilgrim2019-06-201-0/+15
| | | | | | We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc. llvm-svn: 363924
* [DAGCombine] Reduce scope of ShAmtVal variable. NFCI.Simon Pilgrim2019-06-201-2/+1
| | | | | | | | Fixes cppcheck warning. Use the more capable getAPIntVal() instead of getZExtValue() as well since I'm here. llvm-svn: 363921
* [DAGCombine] Use ConstantSDNode::getAPIntValue() instead of getZExtValue().Simon Pilgrim2019-06-191-2/+2
| | | | | | Use getAPIntValue() in a few more places. Most of the time getZExtValue() is fine, but occasionally there's fuzzed code or someone decides to create i65536 or something..... llvm-svn: 363887
* [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG supportSimon Pilgrim2019-06-191-11/+12
| | | | | | Move 'lowest' demanded elt -> bitcast fold out of ZERO_EXTEND_VECTOR_INREG into ANY_EXTEND_VECTOR_INREG case. llvm-svn: 363856
* [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ↵Simon Pilgrim2019-06-191-3/+4
| | | | | | | | | | ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. llvm-svn: 363850
* [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ↵Simon Pilgrim2019-06-191-6/+10
| | | | | | | | | | ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. llvm-svn: 363802
* [DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, ↵Simon Pilgrim2019-06-191-16/+15
| | | | | | | | c2)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363793
* [DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> 0 non-uniform folds.Simon Pilgrim2019-06-192-11/+27
| | | | | | | | Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths. llvm-svn: 363792
* [DAGCombiner] visitSHL - pull out repeated shift amount VT. NFCI.Simon Pilgrim2019-06-191-6/+6
| | | | llvm-svn: 363789
* [DAGCombine] Fix (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) ↵Simon Pilgrim2019-06-191-1/+2
| | | | | | | | comment. NFCI. We pre-extend, not post. llvm-svn: 363787
* Rename ExpandISelPseudo->FinalizeISel, delay register reservationMatt Arsenault2019-06-192-1/+30
| | | | | | | | | | | This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757
* [TargetLowering] SimplifyDemandedBits - Cleanup ANY_EXTEND handlingSimon Pilgrim2019-06-181-2/+8
| | | | | | Match SIGN_EXTEND + ZERO_EXTEND handling - will be adding ANY_EXTEND_VECTOR_INREG support in a future patch. llvm-svn: 363716
* [TargetLowering] SimplifyDemandedBits - Merge ↵Simon Pilgrim2019-06-181-24/+16
| | | | | | | | ZERO_EXTEND+ZERO_EXTEND_VECTOR_INREG handling Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches. llvm-svn: 363713
* [TargetLowering] SimplifyDemandedBits - Merge ↵Simon Pilgrim2019-06-181-25/+17
| | | | | | | | SIGN_EXTEND+SIGN_EXTEND_VECTOR_INREG handling Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches. llvm-svn: 363710
* [TargetLowering] SimplifyDemandedVectorElts - support MUL and ↵Simon Pilgrim2019-06-181-0/+9
| | | | | | | | | | ANY_EXTEND_VECTOR_INREG Also fold ANY_EXTEND_VECTOR_INREG -> BITCAST if we only need the bottom element. Fixes temporary regression introduced in rL363693. llvm-svn: 363694
* [SelectionDAG] Legalize vaargs that require vector splittingSimon Pilgrim2019-06-182-0/+24
| | | | | | | | | | This adds vector splitting for vaarg instructions during type legalization Committed on behalf of @luke (Luke Lau) Differential Revision: https://reviews.llvm.org/D60762 llvm-svn: 363671
* [DAGCombiner] [CodeGenPrepare] More comprehensive GEP splittingLuis Marques2019-06-171-3/+63
| | | | | | | | | | | | | | | Some GEPs were not being split, presumably because that split would just be undone by the DAGCombiner. Not performing those splits can prevent important optimizations, such as preventing the element indices / member offsets from being (partially) folded into load/store instruction immediates. This patch: - Makes the splits also occur in the cases where the base address and the GEP are in the same BB. - Ensures that the DAGCombiner doesn't reassociate them back again. Differential Revision: https://reviews.llvm.org/D60294 llvm-svn: 363544
* [SelectionDAG] Fold insert_subvector(undef, extract_subvector(v, c), c) -> v ↵Simon Pilgrim2019-06-171-0/+6
| | | | | | | | in getNode This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source. llvm-svn: 363542
* adding more fmf propagation for selects plus updated testsMichael Berg2019-06-152-20/+37
| | | | llvm-svn: 363484
* Revert "adding more fmf propagation for selects plus tests"Fangrui Song2019-06-152-37/+20
| | | | | | | | | | | This reverts rL363474. -debug-only=isel was added to some tests that don't specify `REQUIRES: asserts`. This causes failures on -DLLVM_ENABLE_ASSERTIONS=off builds. I chose to revert instead of fixing the tests because I'm not sure whether we should add `REQUIRES: asserts` to more tests. llvm-svn: 363482
OpenPOWER on IntegriCloud