summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"Nikita Popov2019-01-151-4/+16
| | | | | | | | | | | | | Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Reapplying with updated SLPVectorizer tests. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351219
* Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"Nikita Popov2019-01-141-16/+4
| | | | | | | | | This reverts commit r351125. I missed test changes in an SLPVectorizer test, due to the cost model changes. Reverting for now. llvm-svn: 351129
* [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectorsNikita Popov2019-01-141-4/+16
| | | | | | | | | | | Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351125
* [X86] Rename overly verbose method; NFCNikita Popov2019-01-131-2/+1
| | | | | | As suggested on D56636. llvm-svn: 351021
* Use getShiftAmountTy for shift amounts.Simon Pilgrim2019-01-121-1/+2
| | | | llvm-svn: 351005
* [X86][AARCH64] Improve ISD::ABS supportSimon Pilgrim2019-01-121-0/+20
| | | | | | | | This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types. Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350998
* Remove check for single use in ShrinkDemandedConstantStanislav Mekhanoshin2019-01-091-3/+0
| | | | | | | | | | | | | | | This removes check for single use from general ShrinkDemandedConstant to the BE because of the AArch64 regression after D56289/rL350475. After several hours of experiments I did not come up with a testcase failing on any other targets if check is not performed. Moreover, direct call to ShrinkDemandedConstant is not really needed and superceed by SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D56406 llvm-svn: 350684
* [TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes ↵Craig Topper2019-01-071-50/+0
| | | | | | | | | | | | | | a User and OpIdx. Stop using it in AMDGPU target for simplifyI24. As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions. Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node. This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input. Differential Revision: https://reviews.llvm.org/D56087 llvm-svn: 350560
* Added single use check to ShrinkDemandedConstantStanislav Mekhanoshin2019-01-051-0/+3
| | | | | | | | | Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink source node to demanded bits only even if there are other uses. Differential Revision: https://reviews.llvm.org/D56289 llvm-svn: 350475
* [SelectionDAG] Always use the version of computeKnownBits that returns a ↵Simon Pilgrim2018-12-211-5/+4
| | | | | | | | value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349907
* [TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)Simon Pilgrim2018-12-191-0/+14
| | | | | | | | | | | | As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero. SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue. Thanks to @dmgreen for catching this. Differential Revision: https://reviews.llvm.org/D55883 llvm-svn: 349625
* [TargetLowering] Fallback from SimplifyDemandedVectorElts to ↵Simon Pilgrim2018-12-181-1/+8
| | | | | | | | SimplifyDemandedBits For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well. llvm-svn: 349466
* NFC: remove unused variableJF Bastien2018-12-171-1/+0
| | | | | | D55768 removed its use. llvm-svn: 349377
* [TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000)Simon Pilgrim2018-12-171-42/+120
| | | | | | | | | | This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen. I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well. Differential Revision: https://reviews.llvm.org/D55768 llvm-svn: 349374
* [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorEltsSimon Pilgrim2018-12-151-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D55600 llvm-svn: 349264
* [TargetLowering] Add ISD::ROTL/ROTR vector expansionSimon Pilgrim2018-12-131-0/+45
| | | | | | | | | | Move existing rotation expansion code into TargetLowering and set it up for vectors as well. Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment. Begun removing x86 vector rotate custom lowering to use the expansion. llvm-svn: 349025
* [TargetLowering] Add ISD::AND handling to SimplifyDemandedVectorEltsSimon Pilgrim2018-12-121-0/+16
| | | | | | | | If either of the operand elements are zero then we know the result element is going to be zero (even if the other element is undef). Differential Revision: https://reviews.llvm.org/D55558 llvm-svn: 348926
* [Intrinsic] Signed Fixed Point Multiplication IntrinsicLeonard Chan2018-12-121-5/+70
| | | | | | | | | | | | Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D54719 llvm-svn: 348912
* [TargetLowering] Add ISD::EXTRACT_VECTOR_ELT support to SimplifyDemandedBitsSimon Pilgrim2018-12-111-0/+19
| | | | | | | | Let SimplifyDemandedBits attempt to simplify all elements of a vector extraction. Part of PR39689. llvm-svn: 348839
* [TargetLowering] Add UNDEF folding to SimplifyDemandedVectorEltsSimon Pilgrim2018-12-101-1/+6
| | | | | | | | | | If all the demanded elements of the SimplifyDemandedVectorElts are known to be UNDEF, we can simplify to an ISD::UNDEF node. Zero constant folding will be handled in a future patch - its a little trickier as we often have bitcasted zero values. Differential Revision: https://reviews.llvm.org/D55511 llvm-svn: 348784
* [TargetLowering] Remove ISD::ANY_EXTEND/ANY_EXTEND_VECTOR_INREG opcodes from ↵Simon Pilgrim2018-12-051-2/+0
| | | | | | | | SimplifyDemandedVectorElts These have no test coverage and the KnownZero flags can't be guaranteed unlike SIGN/ZERO_EXTEND cases. llvm-svn: 348361
* [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)Simon Pilgrim2018-12-051-0/+48
| | | | | | | | | | This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353
* [TargetLowering] SimplifyDemandedVectorElts - don't alter DemandedElts maskSimon Pilgrim2018-12-051-2/+3
| | | | | | | | Fix potential issue with the ISD::INSERT_VECTOR_ELT case tweaking the DemandedElts mask instead of using a local copy - so later uses of the mask use the tweaked version..... Noticed while investigating adding zero/undef folding to SimplifyDemandedVectorElts and the altered DemandedElts mask was causing mismatches. llvm-svn: 348348
* [SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI.Nirav Dave2018-12-041-1/+4
| | | | llvm-svn: 348288
* [TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion ↵Simon Pilgrim2018-12-041-11/+30
| | | | | | | | | | | | | | (PR17686) PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction. The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment. The X87 cases don't need the strict flag but generates much nicer code with it.... Differential Revision: https://reviews.llvm.org/D53794 llvm-svn: 348251
* [TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodesSimon Pilgrim2018-12-041-0/+17
| | | | | | | | Add support for ISD::*_EXTEND and ISD::*_EXTEND_VECTOR_INREG opcodes. The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch. llvm-svn: 348246
* [SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts ↵Simon Pilgrim2018-12-011-5/+3
| | | | | | | | | | | | simplification D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element. This patch relaxes this to demanding an element if we need any bit from it. Differential Revision: https://reviews.llvm.org/D54761 llvm-svn: 348073
* [TargetLowering] SimplifyDemandedBits - only reduce known bits for integer ↵Simon Pilgrim2018-11-211-1/+3
| | | | | | | | constants Avoids fuzzing crash found by Mikael Holmén. llvm-svn: 347393
* [DAGCombine] Add calls to SimplifyDemandedVectorElts from ↵Simon Pilgrim2018-11-201-1/+1
| | | | | | | | visitINSERT_SUBVECTOR (PR37989) This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices. llvm-svn: 347313
* [TargetLowering] Improve SimplifyDemandedVectorElts/SimplifyDemandedBits supportSimon Pilgrim2018-11-201-0/+17
| | | | | | | | | | For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask. I've raised https://bugs.llvm.org/show_bug.cgi?id=39689 to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem. Differential Revision: https://reviews.llvm.org/D54679 llvm-svn: 347301
* [TargetLowering] expandFP_TO_UINT - improve fp16 supportSimon Pilgrim2018-11-191-10/+18
| | | | | | | | | | As discussed on D53794, for float types with ranges smaller than the destination integer type, then we should be able to just use a regular FP_TO_SINT opcode. I thought we'd need to provide MSA test cases for very small integer types as well (fp16 -> i8 etc.), but it turns out that promotion will kick in so they're unnecessary. Differential Revision: https://reviews.llvm.org/D54703 llvm-svn: 347251
* [TargetLowering] Cleanup more of the EXTEND demanded bits cases so that they ↵Simon Pilgrim2018-11-161-10/+11
| | | | | | | | match. NFCI. Use the same variable names etc. llvm-svn: 347045
* [TargetLowering] Begin generalizing TargetLowering::expandFP_TO_SINT ↵Simon Pilgrim2018-11-051-26/+26
| | | | | | | | support. NFCI. Prior to initial work to add vector expansion support, remove assumptions that we're working on scalar types. llvm-svn: 346139
* [LegalizeDAG] Add generic vector CTPOP expansion (PR32655)Simon Pilgrim2018-11-011-2/+13
| | | | | | | | This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion. Differential Revision: https://reviews.llvm.org/D53258 llvm-svn: 345869
* Check shouldReduceLoadWidth from SimplifySetCCStanislav Mekhanoshin2018-10-311-1/+2
| | | | | | | | | | | | SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778
* [Intrinsic] Signed and Unsigned Saturation Subtraction IntirnsicsLeonard Chan2018-10-291-16/+36
| | | | | | | | | | | | Add an intrinsic that takes 2 integers and perform saturation subtraction on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53783 llvm-svn: 345512
* [TargetLowering] Move i64/vXi64 to f32/vXf32 UINT_TO_FP handling to ↵Simon Pilgrim2018-10-281-29/+72
| | | | | | TargetLowering::expandUINT_TO_FP. llvm-svn: 345478
* [VectorLegalizer] Enable TargetLowering::expandFP_TO_UINT support.Simon Pilgrim2018-10-281-0/+5
| | | | | | | | Add vector support to TargetLowering::expandFP_TO_UINT. This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this. llvm-svn: 345473
* [TargetLowering] Move LegalizeDAG FP_TO_UINT handling to ↵Simon Pilgrim2018-10-271-0/+31
| | | | | | | | TargetLowering::expandFP_TO_UINT. NFCI. First step towards fixing PR17686 and adding vector support. llvm-svn: 345452
* [TargetLowering] Improve vXi64 UINT_TO_FP vXf64 support (P38226)Simon Pilgrim2018-10-251-0/+42
| | | | | | | | | | | | As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well. Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets. The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it. Differential Revision: https://reviews.llvm.org/D53649 llvm-svn: 345256
* [TargetLowering] Add SimplifyDemandedBitsForTargetNode callbackSimon Pilgrim2018-10-241-0/+24
| | | | | | | | Add a SimplifyDemandedBitsForTargetNode callback to handle target nodes. Differential Revision: https://reviews.llvm.org/D53643 llvm-svn: 345179
* [LegalizeDAG] Share Vector/Scalar CTPOP ExpansionSimon Pilgrim2018-10-231-0/+49
| | | | | | | | As suggested on D53258, this patch move the CTPOP expansion code from SelectionDAGLegalize to TargetLowering to allow it to be reused by the VectorLegalizer. Proper vector support will be added by D53258. llvm-svn: 345066
* [LegalizeDAG] Share Vector/Scalar CTLZ ExpansionSimon Pilgrim2018-10-231-0/+53
| | | | | | | | As suggested on D53258, this patch shares common CTLZ expansion code between VectorLegalizer and SelectionDAGLegalize by putting it in TargetLowering. Extension to D53474 llvm-svn: 345060
* [LegalizeDAG] Share Vector/Scalar CTTZ ExpansionSimon Pilgrim2018-10-231-0/+55
| | | | | | | | | | As suggested on D53258, this patch demonstrates sharing common CTTZ expansion code between VectorLegalizer and SelectionDAGLegalize by putting it in TargetLowering. I intend to move CTLZ and (scalar) CTPOP over as well and then update D53258 accordingly. Differential Revision: https://reviews.llvm.org/D53474 llvm-svn: 345039
* [Intrinsic] Unigned Saturation Addition IntrinsicLeonard Chan2018-10-221-22/+27
| | | | | | | | | | | | Add an intrinsic that takes 2 integers and perform unsigned saturation addition on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53340 llvm-svn: 344971
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-221-0/+29
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* [Intrinsic] Signed Saturation Addition IntrinsicLeonard Chan2018-10-161-0/+43
| | | | | | | | | | | Add an intrinsic that takes 2 integers and perform saturation addition on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53053 llvm-svn: 344629
* [SelectionDAG] allow FP binops in SimplifyDemandedVectorEltsSanjay Patel2018-10-151-1/+6
| | | | | | | | | | | | This is intended to make the backend on par with functionality that was added to the IR version of SimplifyDemandedVectorElts in: rL343727 ...and the original motivation is that we need to improve demanded-vector-elements in several ways to avoid problems that would be exposed in D51553. Differential Revision: https://reviews.llvm.org/D52912 llvm-svn: 344541
* [TargetLowering] SimplifyDemandedBits - rename demanded mask args. NFCI.Simon Pilgrim2018-10-101-80/+89
| | | | | | Help stop bugs like rL343935 by making the 'original' DemandedBits arg more obviously not the mask that is actually used. llvm-svn: 344138
* [TargetLowering] SimplifyDemandedBits - pull out repeated getOperands. NFCI.Simon Pilgrim2018-10-101-119/+119
| | | | | | Part of a minor cleanup to make all the switch statements more consistent prior to improving vector support. llvm-svn: 344136
OpenPOWER on IntegriCloud