summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG supportSimon Pilgrim2019-06-251-2/+18
| | | | | | | | Add 'lowest' demanded elt -> bitcast fold to all *_EXTEND_VECTOR_INREG cases. Reapplies rL363856. llvm-svn: 364311
* [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ↵Simon Pilgrim2019-06-251-6/+4
| | | | | | | | | | | | ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. Reapplies rL363850 but now with legality checks added at rL364290 llvm-svn: 364303
* [SDAG] improve expansion of ctpop+setccSanjay Patel2019-06-251-11/+14
| | | | | | | | | This should not cause any visible change in output, but it's more efficient because we were producing non-canonical 'sub x, 1' and 'setcc ugt x, 0'. As mentioned in the TODO, we should also be handling the inverse predicate. llvm-svn: 364302
* [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ↵Simon Pilgrim2019-06-251-6/+6
| | | | | | | | | | | | ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. Reapplies rL363802 but now with legality checks added at rL364290 llvm-svn: 364299
* [TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ↵Simon Pilgrim2019-06-251-6/+15
| | | | | | | | | | ZERO/ANY_EXTEND As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations || isOperationLegal cases. We'll improve X86 legality in future commits. llvm-svn: 364290
* [Codegen] TargetLowering::SimplifySetCC(): omit urem when possibleRoman Lebedev2019-06-251-0/+12
| | | | | | | | | | | | | | | | | | | | | | Summary: This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll` The missing fold, at least partially, looks trivial: https://rise4fun.com/Alive/Zsln i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two, and the `urem` is of something that may at most have a single bit set (or no bits set at all), the `urem` is not needed. Reviewers: RKSimon, craig.topper, xbolva00, spatel Reviewed By: xbolva00, spatel Subscribers: xbolva00, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63390 llvm-svn: 364286
* Revert r363802, r363850, and r363856 "[TargetLowering] SimplifyDemandedBits..."Craig Topper2019-06-251-26/+20
| | | | | | | | | | | | | | | | | | | | This reverts the following patches. "[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support" We can end up with an any_extend_vector_inreg with a 256 bit result type and a 128 bit result type. This is allowed by the ISD opcode, but the generic operation legalizer is only able to expand cases where the total vector width is the same. The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg. The SimplifyDemandedBits changes are allowing those nodes to become aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom handling and never lets them get to the generic legalizer. We need to do the same for aext_vec_inreg. llvm-svn: 364264
* [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG supportSimon Pilgrim2019-06-191-11/+12
| | | | | | Move 'lowest' demanded elt -> bitcast fold out of ZERO_EXTEND_VECTOR_INREG into ANY_EXTEND_VECTOR_INREG case. llvm-svn: 363856
* [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ↵Simon Pilgrim2019-06-191-3/+4
| | | | | | | | | | ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. llvm-svn: 363850
* [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ↵Simon Pilgrim2019-06-191-6/+10
| | | | | | | | | | ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. llvm-svn: 363802
* [TargetLowering] SimplifyDemandedBits - Cleanup ANY_EXTEND handlingSimon Pilgrim2019-06-181-2/+8
| | | | | | Match SIGN_EXTEND + ZERO_EXTEND handling - will be adding ANY_EXTEND_VECTOR_INREG support in a future patch. llvm-svn: 363716
* [TargetLowering] SimplifyDemandedBits - Merge ↵Simon Pilgrim2019-06-181-24/+16
| | | | | | | | ZERO_EXTEND+ZERO_EXTEND_VECTOR_INREG handling Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches. llvm-svn: 363713
* [TargetLowering] SimplifyDemandedBits - Merge ↵Simon Pilgrim2019-06-181-25/+17
| | | | | | | | SIGN_EXTEND+SIGN_EXTEND_VECTOR_INREG handling Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches. llvm-svn: 363710
* [TargetLowering] SimplifyDemandedVectorElts - support MUL and ↵Simon Pilgrim2019-06-181-0/+9
| | | | | | | | | | ANY_EXTEND_VECTOR_INREG Also fold ANY_EXTEND_VECTOR_INREG -> BITCAST if we only need the bottom element. Fixes temporary regression introduced in rL363693. llvm-svn: 363694
* [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests ↵Simon Pilgrim2019-06-121-1/+2
| | | | | | | | | | | | | | (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179
* [TargetLowering] Simplify (ctpop x) == 1David Bolvansky2019-06-091-1/+12
| | | | | | | | | | | | | | Reviewers: craig.topper, spatel, RKSimon, bkramer Reviewed By: spatel Subscribers: javed.absar, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63004 llvm-svn: 362912
* IR: make getParamByValType Just Work. NFC.Tim Northover2019-06-051-1/+3
| | | | | | | | | | | Most parts of LLVM don't care whether the byval type is derived from an explicit Attribute or from the parameter's pointee type, so it makes sense for the main access function to just return the right value. The very few users who do care (only BitcodeReader so far) can find out how it's specified by accessing the Attribute directly. llvm-svn: 362642
* [TargetLowering] SimplifyDemandedBits - pull out shift value type. NFCI.Simon Pilgrim2019-06-051-1/+2
| | | | | | Will be used more in an upcoming patch. llvm-svn: 362595
* [TargetLowering] SimplifyDemandedBits - don't use OriginalDemanded variables ↵Simon Pilgrim2019-06-021-5/+5
| | | | | | | | in analysis. These might have been replaced in multiple use cases. llvm-svn: 362322
* [TargetLowering] SimplifyDemandedVectorElts - use same arg names as ↵Simon Pilgrim2019-06-021-4/+4
| | | | | | | | SimplifyDemandedBits. NFCI. Helps with debugging as we recurse between them. llvm-svn: 362321
* Reapply: IR: add optional type to 'byval' function parametersTim Northover2019-05-301-0/+1
| | | | | | | | | | | | | | | | | When we switch to opaque pointer types we will need some way to describe how many bytes a 'byval' parameter should occupy on the stack. This adds a (for now) optional extra type parameter. If present, the type must match the pointee type of the argument. The original commit did not remap byval types when linking modules, which broke LTO. This version fixes that. Note to front-end maintainers: if this causes test failures, it's probably because the "byval" attribute is printed after attributes without any parameter after this change. llvm-svn: 362128
* Revert "IR: add optional type to 'byval' function parameters"Tim Northover2019-05-291-1/+0
| | | | | | | The IRLinker doesn't delve into the new byval attribute when mapping types, and this breaks LTO. llvm-svn: 362029
* IR: add optional type to 'byval' function parametersTim Northover2019-05-291-0/+1
| | | | | | | | | | | | | | When we switch to opaque pointer types we will need some way to describe how many bytes a 'byval' parameter should occupy on the stack. This adds a (for now) optional extra type parameter. If present, the type must match the pointee type of the argument. Note to front-end maintainers: if this causes test failures, it's probably because the "byval" attribute is printed after attributes without any parameter after this change. llvm-svn: 362012
* [SelectionDAG] computeKnownBits - support constant pool values from targetSimon Pilgrim2019-05-241-0/+12
| | | | | | | | | | | | | | | | This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620
* [TargetLowering] Extend bool args to inline-asm according to getBooleanTypeKees Cook2019-05-221-1/+10
| | | | | | | | | | | | | | | | | Summary: This extends Krzysztof Parzyszek's X86-specific solution (https://reviews.llvm.org/D60208) to the generic code pointed out by James Y Knight. Reviewers: kparzysz, craig.topper, nickdesaulniers Subscribers: efriedma, sdardis, nemanjai, javed.absar, eraman, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, srhines, void, nickdesaulniers, jyknight Tags: #llvm Differential Revision: https://reviews.llvm.org/D60224 llvm-svn: 361404
* [TargetLowering] Add blank line (test commit)Kees Cook2019-05-221-0/+1
| | | | llvm-svn: 361403
* [Intrinsic] Signed Fixed Point Saturation Multiplication IntrinsicLeonard Chan2019-05-211-11/+45
| | | | | | | | | | | | | | Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D55720 llvm-svn: 361289
* Add TargetLoweringInfo hook for explicitly setting the ABI calling ↵Dylan McKay2019-05-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | convention endianess Summary: The endianess used in the calling convention does not always match the endianess of the target on all architectures, namely AVR. When an argument is too large to be legalised by the architecture and is split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian is queried to find the endianess that function arguments must be laid out in. This approach was recommended by Eli Friedman. Originally reported in https://github.com/avr-rust/rust/issues/129. Patch by Carl Peto. Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma Reviewed By: efriedma Subscribers: JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62003 llvm-svn: 361222
* [SDAG] Vector op legalization for overflow opsNikita Popov2019-05-201-0/+74
| | | | | | | | | | | | | | | | | | Fixes issue reported by aemerson on D57348. Vector op legalization support is added for uaddo, usubo, saddo and ssubo (umulo and smulo were already supported). As usual, by extracting TargetLowering methods and calling them from vector op legalization. Vector op legalization doesn't really deal with multiple result nodes, so I'm explicitly performing a recursive legalization call on the result value that is not being legalized. There are some existing test changes because expansion happens earlier, so we don't get a DAG combiner run in between anymore. Differential Revision: https://reviews.llvm.org/D61692 llvm-svn: 361166
* [SDAG] fix unused variable warning and unneeded indirection; NFCSanjay Patel2019-05-141-2/+2
| | | | llvm-svn: 360640
* [SDAG, x86] allow targets to override test for binop opcodesSanjay Patel2019-05-141-1/+2
| | | | | | | | This follows the pattern of the existing isCommutativeBinOp(). x86 shows improvements from vector narrowing for the min/max opcodes. llvm-svn: 360639
* [TargetLowering] Handle multi depth GEPs w/ inline asm constraintsNick Desaulniers2019-05-131-38/+33
| | | | | | | | | | | | | | | | | | | | | | | Summary: X86TargetLowering::LowerAsmOperandForConstraint had better support than TargetLowering::LowerAsmOperandForConstraint for arbitrary depth getelementpointers for "i", "n", and "s" extended inline assembly constraints. Hoist its support from the derived class into the base class. Link: https://github.com/ClangBuiltLinux/linux/issues/469 Reviewers: echristo, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D61560 llvm-svn: 360604
* [TargetLowering] Add SimplifyDemandedBits support for ZERO_EXTEND_VECTOR_INREGSimon Pilgrim2019-05-131-0/+24
| | | | | | More work for PR39709. llvm-svn: 360592
* TargetLowering::SimplifyDemandedBits - early-out for UNDEF ops. NFCI.Simon Pilgrim2019-05-131-3/+5
| | | | llvm-svn: 360579
* Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits ↵Craig Topper2019-05-131-1/+25
| | | | | | | | | | | | | | | | | | | | bitcast handling" I've included a new fix in X86RegisterInfo to prevent PR41619 without reintroducing r359392. We might be able to improve that in the base class implementation of shouldRewriteCopySrc somehow. But this hopefully enables forward progress on SimplifyDemandedBits improvements for now. Original commit message: This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. llvm-svn: 360552
* [DAG] Add SimplifyDemandedBits support for BITREVERSESimon Pilgrim2019-05-111-0/+10
| | | | | | Pulled out of D58017 while I continue to investigate the BSWAP regression on PPC llvm-svn: 360534
* Revert r359392 and r358887Craig Topper2019-05-061-25/+1
| | | | | | | | | | | | | | | | | | | | Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066
* [TargetLowering] SimplifySetCC - remove repeated variable. NFCI.Simon Pilgrim2019-05-031-2/+1
| | | | | | Also reduce scope of Temp variable. llvm-svn: 359911
* [TargetLowering] ShrinkDemandedConstant - reduce scope of TLO.DAG variable. ↵Simon Pilgrim2019-05-031-3/+2
| | | | | | | | NFCI. Only ever used in one block llvm-svn: 359890
* [TargetLowering] expandUnalignedStore - cleanup EVT variables. NFCI.Simon Pilgrim2019-05-031-23/+18
| | | | | | Avoid duplicated EVTs and rename Store/Load VTs to avoid -Wshadow warnings. llvm-svn: 359877
* [TargetLowering] findOptimalMemOpLowering. NFCI.Sjoerd Meijer2019-04-301-0/+101
| | | | | | | | | | This was a local static funtion in SelectionDAG, which I've promoted to TargetLowering so that I can reuse it to estimate the cost of a memory operation in D59787. Differential Revision: https://reviews.llvm.org/D59766 llvm-svn: 359543
* [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handlingSimon Pilgrim2019-04-221-1/+25
| | | | | | | | | | | | This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887
* [TargetLowering][X86] Teach SimplifyDemandedBits to use ShrinkDemandedOp on ↵Craig Topper2019-04-121-0/+6
| | | | | | | | | | ISD::SHL nodes. If the upper bits of the SHL result aren't used, we might be able to use a narrower shift. For example, on X86 this can turn a 64-bit into 32-bit enabling a smaller encoding. Differential Revision: https://reviews.llvm.org/D60358 llvm-svn: 358257
* [TargetLowering] SimplifyDemandedBits - add ISD::INSERT_SUBVECTOR supportSimon Pilgrim2019-04-091-0/+39
| | | | llvm-svn: 358019
* [TargetLowering] SimplifyDemandedBits - Remove GetDemandedSrcMask lambda. NFCI.Simon Pilgrim2019-04-091-28/+21
| | | | | | An older version of this could return false but now that this always succeeds we can just inline and simplify it. llvm-svn: 357999
* [TargetLowering] SimplifyDemandedBits - call SimplifyDemandedBits in bitcast ↵Simon Pilgrim2019-04-091-6/+16
| | | | | | | | handling When bitcasting from a source op to a larger bitwidth op, split the demanded bits and OR them on top of one another and demand those merged bits in the SimplifyDemandedBits call on the source op. llvm-svn: 357992
* [TargetLowering] SimplifyDemandedBits - use DemandedElts in bitcast handlingSimon Pilgrim2019-04-081-12/+13
| | | | | | Be more selective in the SimplifyDemandedBits -> SimplifyDemandedVectorElts bitcast call based on the demanded elts. llvm-svn: 357942
* [DAG] Pull out ComputeNumSignBits call to make debugging easier. NFCI.Simon Pilgrim2019-04-071-2/+2
| | | | llvm-svn: 357861
* [TargetLowering] Add SimplifyDemandedBits support for ISD::INSERT_VECTOR_ELTSimon Pilgrim2019-03-261-0/+38
| | | | | | | | | | | | This helps us relax the extension of a lot of scalar elements before they are inserted into a vector. Its exposes an issue in DAGCombiner::convertBuildVecZextToZext as some/all the zero-extensions may be relaxed to ANY_EXTEND, so we need to handle that case to avoid a couple of AVX2 VPMOVZX test regressions. Once this is in it should be easier to fix a number of remaining failures to fold loads into VBROADCAST nodes. Differential Revision: https://reviews.llvm.org/D59484 llvm-svn: 356989
* [TargetLowering] SimplifyDemandedBits trunc(srl(x, C1)) - early out for out ↵Simon Pilgrim2019-03-221-19/+19
| | | | | | of range C1. NFCI. llvm-svn: 356810
OpenPOWER on IntegriCloud