summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
...
* DAG combiner: fold (select, C, X, undef) -> XStanislav Mekhanoshin2018-11-161-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D54646 llvm-svn: 347110
* [LegalizeVectorOps] After custom legalizing an extending load or a ↵Craig Topper2018-11-161-2/+10
| | | | | | | | | | truncating store, make sure the custom code is also legal. For example, on X86 we emit a sign_extend_vector_inreg from LowerLoad and without sse4.1 this node will need further legalization. Previously this sign_extend_vector_inreg was being custom lowered during DAG legalization instead of vector op legalization. Unfortunately, this doesn't seem to matter for the output of any existing lit tests. llvm-svn: 347094
* [TargetLowering] Cleanup more of the EXTEND demanded bits cases so that they ↵Simon Pilgrim2018-11-161-10/+11
| | | | | | | | match. NFCI. Use the same variable names etc. llvm-svn: 347045
* [DAGCombine] Fix non-deterministic debug outputSam Parker2018-11-161-4/+4
| | | | | | | | | | | PR37970 reported non-deterministic debug output, this was caused by iterating through a set and not a a vector. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37970 Differential Revision: https://reviews.llvm.org/D54570 llvm-svn: 347037
* [LegalizeVectorTypes] Teach WidenVecRes_Convert to turn ANY_EXTEND into ↵Craig Topper2018-11-161-0/+2
| | | | | | | | ANY_EXTEND_VECTOR_INREG when the input and output types need to be widened to the same width. If we don't do it here, DAGCombine will just end up creating it from the scalar any_extend+build_vector so might as well save a step. llvm-svn: 347034
* Fixed DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT i1 handlingStanislav Mekhanoshin2018-11-131-0/+9
| | | | | | | | | Legalizer used to request an ext load from i8 to i1 when promoting vector element type to i8. Fixed. Differential Revision: https://reviews.llvm.org/D54440 llvm-svn: 346795
* [SelectionDAG][X86] Relax restriction on the width of an input to ↵Craig Topper2018-11-131-3/+2
| | | | | | | | | | | | | | | | | | *_EXTEND_VECTOR_INREG. Use them and regular *_EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784
* [IR] Add a dedicated FNeg IR InstructionCameron McInally2018-11-132-0/+12
| | | | | | | | | | | The IEEE-754 Standard makes it clear that fneg(x) and fsub(-0.0, x) are two different operations. The former is a bitwise operation, while the latter is an arithmetic operation. This patch creates a dedicated FNeg IR Instruction to model that behavior. Differential Revision: https://reviews.llvm.org/D53877 llvm-svn: 346774
* [DAGCombiner] Enable tryToFoldExtendOfConstant to run after legalize vector opsCraig Topper2018-11-131-14/+7
| | | | | | | | | | It should be ok to create a new build_vector after legal operations so long as it doesn't cause an infinite loop in DAG combiner. Unfortunately, X86's custom constant folding in combineVSZext is hiding any test changes from this. But I'm trying to get to a point where that X86 specific code isn't necessary at all. Differential Revision: https://reviews.llvm.org/D54285 llvm-svn: 346728
* [DAGCombiner] Fix load-store forwarding of indexed loads.Nirav Dave2018-11-121-3/+17
| | | | | | | | | | | | | | | | Summary: Handle extra output from index loads in cases where we wish to forward a load value directly from a preceeding store. Fixes PR39571. Reviewers: peter.smith, rengolin Subscribers: javed.absar, hiraditya, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54265 llvm-svn: 346654
* [DAGCombiner] Make tryToFoldExtendOfConstant return an SDValue instead of an ↵Craig Topper2018-11-101-14/+14
| | | | | | | | SDNode*. NFC Removes the need to call getNode internally and to recreate an SDValue after the call. llvm-svn: 346600
* [x86] allow vector load narrowing with multi-use valuesSanjay Patel2018-11-101-5/+7
| | | | | | | | | | | | | | | | | | | | | | This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595
* [SelectionDAG] Fix a -Wparentheses warning from gcc in an assert. NFCCraig Topper2018-11-091-2/+2
| | | | | | gcc wants parentheses around the logical OR since there is a logical AND for the string. llvm-svn: 346564
* [DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between ↵Craig Topper2018-11-091-2/+5
| | | | | | | | | | | | vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input. I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used. Differential Revision: https://reviews.llvm.org/D54283 llvm-svn: 346530
* Type safe version of MachinePassRegistrySerge Guelton2018-11-091-1/+2
| | | | | | | | | | | Previous version used type erasure through a `void* (*)()` pointer, which triggered gcc warning and implied a lot of reinterpret_cast. This version should make it harder to hit ourselves in the foot. Differential revision: https://reviews.llvm.org/D54203 llvm-svn: 346522
* [SelectionDAG] swap select_cc operands to enable foldingAlexandros Lamprineas2018-11-091-32/+34
| | | | | | | | | | | | | | | | | | The DAGCombiner tries to SimplifySelectCC as follows: select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4) It can't cope with the situation of reordered operands: select_cc(x, y, 0, 16, cc) In that case we just need to swap the operands and invert the Condition Code: select_cc(x, y, 16, 0, ~cc) Differential Revision: https://reviews.llvm.org/D53236 llvm-svn: 346484
* [SelectionDAG] Assert on the width of DemandedElts argument to ↵Craig Topper2018-11-081-2/+3
| | | | | | | | computeKnownBits for all vector typed operations not just build_vector. Fix AArch64 unit test that fails with the assertion added. llvm-svn: 346437
* [DAGCombine] Improve alias analysis for chain of independent stores.Nirav Dave2018-11-081-59/+116
| | | | | | | | | | | | | | | | | | | FindBetterNeighborChains simulateanously improves the chain dependencies of a chain of related stores avoiding the generation of extra token factors. For chains longer than the GatherAllAliasDepths, stores further down in the chain will necessarily fail, a potentially significant waste and preventing otherwise trivial parallelization. This patch directly parallelize the chains of stores before improving each store. This generally improves DAG-level parallelism. Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53552 llvm-svn: 346432
* Add support for llvm.is.constant intrinsic (PR4898)James Y Knight2018-11-072-0/+15
| | | | | | | | | | | | | | | This adds the llvm-side support for post-inlining evaluation of the __builtin_constant_p GCC intrinsic. Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call to a function where canConstantFoldTo returns true, and one of the arguments is a struct. Updated from patch initially by Janusz Sobczak. Differential Revision: https://reviews.llvm.org/D4276 llvm-svn: 346322
* [FPEnv] Add constrained CEIL/FLOOR/ROUND/TRUNC intrinsicsCameron McInally2018-11-056-0/+52
| | | | | | Differential Revision: https://reviews.llvm.org/D53411 llvm-svn: 346141
* [TargetLowering] Begin generalizing TargetLowering::expandFP_TO_SINT ↵Simon Pilgrim2018-11-051-26/+26
| | | | | | | | support. NFCI. Prior to initial work to add vector expansion support, remove assumptions that we're working on scalar types. llvm-svn: 346139
* [DAGCombiner] Use tryFoldToZero to simplify some code and make it work ↵Craig Topper2018-11-051-8/+2
| | | | | | | | correctly between LegalTypes and LegalOperations. The original code avoided creating a zero vector after type legalization, but if we're after type legalization the type we have is legal. The real hazard we need to avoid is creating a build vector after op legalization. tryFoldToZero takes care of checking for this. llvm-svn: 346119
* [DAGCombiner] Remove an unused argument from tryFoldToZero. NFCCraig Topper2018-11-051-4/+3
| | | | llvm-svn: 346118
* [DAGCombiner] Remove 'else' after return. NFCCraig Topper2018-11-041-8/+7
| | | | | | This makes this code consistent with the nearly identical code in visitZERO_EXTEND. llvm-svn: 346090
* [SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG ↵Craig Topper2018-11-044-44/+22
| | | | | | | | | | nodes. Move asserts into getNode. These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers. The rest of the patch is just changing all callers to use getNode directly. llvm-svn: 346087
* [X86] Don't emit *_extend_vector_inreg nodes when both the input and output ↵Craig Topper2018-11-021-1/+1
| | | | | | | | | | | | | | | | types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043
* [DAGCombiner] Remove reduceBuildVecConvertToConvertBuildVec and rely on the ↵Simon Pilgrim2018-11-021-75/+0
| | | | | | | | | | | | | vectorizers instead (PR35732) reduceBuildVecConvertToConvertBuildVec vectorizes int2float in the DAGCombiner, which means that even if the LV/SLP has decided to keep scalar code using the cost models, this will override this. While there are cases where vectorization is necessary in the DAG (mainly due to legalization artefacts), I don't think this is the case here, we should assume that the vectorizers know what they are doing. Differential Revision: https://reviews.llvm.org/D53712 llvm-svn: 345964
* [COFF, ARM64] Implement Intrinsic.sponentry for AArch64Mandeep Singh Grang2018-11-013-0/+6
| | | | | | | | | | | | | | | | Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Patch by: Yin Ma (yinma@codeaurora.org) Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53996 llvm-svn: 345909
* [DAGCombiner] Make the isTruncateOf call from visitZERO_EXTEND work for ↵Craig Topper2018-11-011-16/+13
| | | | | | | | vectors. Remove FIXME. I'm having trouble creating a test case for the ISD::TRUNCATE part of this that shows any codegen differences. But I was able to test the setcc path which is what the test changes here cover. llvm-svn: 345908
* [LegalizeDAG] Add generic vector CTPOP expansion (PR32655)Simon Pilgrim2018-11-012-2/+26
| | | | | | | | This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion. Differential Revision: https://reviews.llvm.org/D53258 llvm-svn: 345869
* Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64"Mandeep Singh Grang2018-11-013-6/+0
| | | | | | This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2. llvm-svn: 345863
* [DAGCombiner] make sure we have a whole-number extract before trying to ↵Sanjay Patel2018-11-011-1/+5
| | | | | | | | | | | | narrow a vector op (PR39511) The test causes a crash because we were trying to extract v4f32 to v3f32, and the narrowing factor was then 4/3 = 1 producing a bogus narrow type. This should fix: https://bugs.llvm.org/show_bug.cgi?id=39511 llvm-svn: 345842
* [COFF, ARM64] Implement Intrinsic.sponentry for AArch64Mandeep Singh Grang2018-10-313-0/+6
| | | | | | | | | | | | | | Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma Reviewed By: efriedma Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53673 llvm-svn: 345791
* Check shouldReduceLoadWidth from SimplifySetCCStanislav Mekhanoshin2018-10-311-1/+2
| | | | | | | | | | | | SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778
* [SelectionDAG] Handle constant range [0,1) in lowerRangeToAssertZExtScott Linder2018-10-311-1/+2
| | | | | | | | | lowerRangeToAssertZExt currently relies on something like EarlyCSE having eliminated the constant range [0,1). At -O0 this leads to an assert. Differential Revision: https://reviews.llvm.org/D53888 llvm-svn: 345770
* [SelectionDAGISel] Suppress a -Wunused-but-set-variable warning in release ↵Craig Topper2018-10-311-0/+1
| | | | | | builds. NFC llvm-svn: 345761
* Fix comment typo. NFCI.Simon Pilgrim2018-10-311-1/+1
| | | | llvm-svn: 345758
* [SelectionDAG] SelectionDAGLegalize::ExpandBITREVERSE - ensure we use ShiftTySimon Pilgrim2018-10-311-6/+6
| | | | | | We should be using the getShiftAmountTy value type for shift amounts. llvm-svn: 345756
* [DAGCombiner] Fold 0 div/rem X to 0David Bolvansky2018-10-311-2/+5
| | | | | | | | | | | | Reviewers: RKSimon, spatel, javed.absar, craig.topper, t.p.northover Reviewed By: RKSimon Subscribers: craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52504 llvm-svn: 345721
* [FPEnv] [FPEnv] Add constrained intrinsics for MAXNUM and MINNUMCameron McInally2018-10-306-0/+26
| | | | | | Differential Revision: https://reviews.llvm.org/D53216 llvm-svn: 345650
* [DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoadBjorn Pettersson2018-10-301-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Normalize the offset for endianess before checking if the store cover the load in ForwardStoreValueToDirectLoad. Without this we missed out on some optimizations for big endian targets. If for example having a 4 bytes store followed by a 1 byte load, loading the least significant byte from the store, the STCoversLD check would fail (see @test4 in test/CodeGen/AArch64/load-store-forwarding.ll). This patch also fixes a problem seen in an out-of-tree target. The target has i40 as a legal type, it is big endian, and the StoreSize for i40 is 48 bits. So when normalizing the offset for endianess we need to take the StoreSize into account (assuming that padding added when storing into a larger StoreSize always is added at the most significant end). Reviewers: niravd Reviewed By: niravd Subscribers: javed.absar, kristof.beyls, llvm-commits, uabelho Differential Revision: https://reviews.llvm.org/D53776 llvm-svn: 345636
* [DAG] Add const variants for BaseIndexOffset functions.Nirav Dave2018-10-301-3/+4
| | | | llvm-svn: 345623
* [DAGCombiner] narrow vector binops when extraction is cheapSanjay Patel2018-10-301-11/+30
| | | | | | | | | | | | | | | | | Narrowing vector binops came up in the demanded bits discussion in D52912. I don't think we're going to be able to do this transform in IR as a canonicalization because of the risk of creating unsupported widths for vector ops, but we already have a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression test diffs). This is artificially limited to not look through bitcasts because there are so many test diffs already, but that's marked with a TODO and is a small follow-up. Differential Revision: https://reviews.llvm.org/D53784 llvm-svn: 345602
* [SelectionDAG] fix build warning for mismatched signs in compare; NFCSanjay Patel2018-10-301-1/+1
| | | | llvm-svn: 345598
* [SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodesSimon Pilgrim2018-10-301-0/+58
| | | | | | | | | | Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy. This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle. Differential Revision: https://reviews.llvm.org/D53760 llvm-svn: 345578
* [DAGCombiner] Improve X div/rem Y fold if single bit element typeDavid Bolvansky2018-10-301-3/+4
| | | | | | | | | | | | | | Summary: Tests by @spatel, thanks Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: sdardis, atanasyan, llvm-commits, spatel Differential Revision: https://reviews.llvm.org/D52668 llvm-svn: 345575
* [LegalizeTypes] Teach PromoteIntRes_BITCAST to better handle a bitcast with ↵Craig Topper2018-10-301-0/+20
| | | | | | | | | | | | | | | | vector output type and a vector input type that needs to be widened Summary: Previously if we had a bitcast vector output type that needs promotion and a vector input type that needs widening we would just do a stack store and load to handle the conversion. We can do a little better if we can widen the bitcast to a legal vector type the same size as the widened input type. Then we can do the bitcast between this widened type and the widened input type. Afterwards we can extract_subvector back to the original output and any_extend that. Type legalization will then circle back and handle promotion of the extract_subvector and the any_extend will just be removed. This will avoid going through the stack and allows us to remove a custom version of this legalization from X86. Reviewers: efriedma, RKSimon Reviewed By: efriedma Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53229 llvm-svn: 345567
* [Intrinsic] Signed and Unsigned Saturation Subtraction IntirnsicsLeonard Chan2018-10-298-31/+90
| | | | | | | | | | | | Add an intrinsic that takes 2 integers and perform saturation subtraction on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53783 llvm-svn: 345512
* [SelectionDAG] Fix bad indentation. NFCCraig Topper2018-10-281-4/+4
| | | | llvm-svn: 345481
* [TargetLowering] Move i64/vXi64 to f32/vXf32 UINT_TO_FP handling to ↵Simon Pilgrim2018-10-282-56/+72
| | | | | | TargetLowering::expandUINT_TO_FP. llvm-svn: 345478
OpenPOWER on IntegriCloud