summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
* [LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use ↵Craig Topper2018-11-261-7/+2
| | | | | | | | | | | | SplitVecOp_TruncateHelper for FP_TO_SINT/UINT. SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512. This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral. Differential Revision: https://reviews.llvm.org/D54906 llvm-svn: 347593
* [SelectionDAG] Teach BaseIndexOffset::match to unwrap the base after looking ↵Craig Topper2018-11-261-3/+3
| | | | | | | | | | through an add/or We might find a target specific node that needs to be unwrapped after we look through an add/or. Otherwise we get inconsistent results if one pointer is just X86WrapperRIP and the other is (add X86WrapperRIP, C) Differential Revision: https://reviews.llvm.org/D54818 llvm-svn: 347591
* [x86] limit transform for select-of-fp-constantsSanjay Patel2018-11-251-0/+3
| | | | | | | | | | | | This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov. llvm-svn: 347526
* [SelectionDAG] move constant or splat functions to common locationSanjay Patel2018-11-252-39/+28
| | | | | | | | rL347502 moved the null sibling, so we should group all of these together. I'm not sure why these aren't methods of the SDValue class itself, but that's another patch if that's possible. llvm-svn: 347523
* [DAG] consolidate shift simplificationsSanjay Patel2018-11-232-74/+58
| | | | | | | | | | ...and use them to avoid creating obviously undef values as discussed in the post-commit thread for r347478. The diffs in vector div/rem show that we were missing real optimizations by creating bogus shift nodes. llvm-svn: 347502
* [LegalizeVectorTypes] Don't use SplitVecOp_TruncateHelper if we're heading ↵Craig Topper2018-11-231-0/+9
| | | | | | | | | | | | towards scalarizing the type. This code takes a truncate, fp_to_int, or int_to_fp with a legal result type and an input type that needs to be split and enlarges the elements in the result type before doing the split. Then inserts a follow up truncate or fp_round after concatenating the two halves back together. But if the input type of the original op is being split on its way to ultimately being scalarized we're just going to end up building a vector from scalars and then truncating or rounding it in the vector register. Seems kind of silly to enlarge the result element type of the operation only to end up with scalar code and then building a vector with large elements only to make the elements smaller again in the vector register. Seems better to just try to get away producing smaller result types in the scalarized code. The X86 test case that changes is a pretty contrived test case that exists because of a bug we used to have in our AVG matching code. I think the code is better now, but its not realistic anyway. llvm-svn: 347482
* [LegalizeVectorTypes] Have SplitVecOp_TruncateHelper fall back to ↵Craig Topper2018-11-221-1/+7
| | | | | | | | | | SplitVecOp_UnaryOp if splitting the output type would be a legal type. SplitVecOp_TruncateHelper tries to introduce a multilevel truncate to avoid scalarization. But if splitting the result type would still be a legal type we don't need to do that. The comment block at the top of the function implied that this was already implemented. I looked back through the history and it doesn't look to have ever been checked. llvm-svn: 347479
* [DAGCombiner] form 'not' ops ahead of shifts (PR39657)Sanjay Patel2018-11-221-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'), but that would not matter without this patch because DAGCombiner was reversing that transform. I think we need this transform in the backend regardless of what happens in IR to catch cases where the shift-xor is formed late from GEP or other ops. https://rise4fun.com/Alive/NC1 Name: shl Pre: (-1 << C2) == C1 %shl = shl i8 %x, C2 %r = xor i8 %shl, C1 => %not = xor i8 %x, -1 %r = shl i8 %not, C2 Name: shr Pre: (-1 u>> C2) == C1 %sh = lshr i8 %x, C2 %r = xor i8 %sh, C1 => %not = xor i8 %x, -1 %r = lshr i8 %not, C2 https://bugs.llvm.org/show_bug.cgi?id=39657 llvm-svn: 347478
* [DAGCombiner] refactor select-of-FP-constants transformSanjay Patel2018-11-211-53/+60
| | | | | | | | | | | | | | This transform needs to be limited. We are converting to a constant pool load very early, and we are turning loads that are independent of the select condition (and therefore speculatable) into a dependent non-speculatable load. We may also be transferring a condition code from an FP register to integer to create that dependent load. llvm-svn: 347424
* [DAGCombiner] reduce code duplication; NFCSanjay Patel2018-11-211-33/+30
| | | | llvm-svn: 347410
* [TargetLowering] SimplifyDemandedBits - only reduce known bits for integer ↵Simon Pilgrim2018-11-211-1/+3
| | | | | | | | constants Avoids fuzzing crash found by Mikael Holmén. llvm-svn: 347393
* Test commit: Delete trailing space in commentNikita Popov2018-11-211-1/+1
| | | | llvm-svn: 347385
* [DAGCombiner] look through bitcasts when trying to narrow vector binopsSanjay Patel2018-11-201-13/+14
| | | | | | | | | | | | | | | | | | | This is another step in vector narrowing - a follow-up to D53784 (and hoping to eventually squash potential regressions seen in D51553). The x86 test diffs are wins, but the AArch64 diff is probably not. That problem already exists independent of this patch (see PR39722), but it went unnoticed in the previous patch because there were no regression tests that showed the possibility. The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling concerns with using wider vector ops, an extra extract to reduce vector width is the right trade-off at this level of codegen. Differential Revision: https://reviews.llvm.org/D54392 llvm-svn: 347356
* [DAGCombine] Add calls to SimplifyDemandedVectorElts from ↵Simon Pilgrim2018-11-202-1/+5
| | | | | | | | visitINSERT_SUBVECTOR (PR37989) This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices. llvm-svn: 347313
* [TargetLowering] Improve SimplifyDemandedVectorElts/SimplifyDemandedBits supportSimon Pilgrim2018-11-201-0/+17
| | | | | | | | | | For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask. I've raised https://bugs.llvm.org/show_bug.cgi?id=39689 to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem. Differential Revision: https://reviews.llvm.org/D54679 llvm-svn: 347301
* [SelectionDAG] Compute known bits and num sign bits for live out vector ↵Craig Topper2018-11-202-4/+4
| | | | | | | | | | | | | | | | | | | registers. Use it to add AssertZExt/AssertSExt in the live in basic blocks Summary: We already support this for scalars, but it was explicitly disabled for vectors. In the updated test cases this allows us to see the upper bits are zero to use less multiply instructions to emulate a 64 bit multiply. This should help with this ispc issue that a coworker pointed me to https://github.com/ispc/ispc/issues/1362 Reviewers: spatel, efriedma, RKSimon, arsenm Reviewed By: spatel Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D54725 llvm-svn: 347287
* [DAGCombiner] reduce code duplication in visitXOR; NFCSanjay Patel2018-11-201-32/+29
| | | | llvm-svn: 347278
* Implement computeKnownBits for scalar_to_vectorStanislav Mekhanoshin2018-11-191-0/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D54728 llvm-svn: 347274
* [DAGCombine] SimplifyNodeWithTwoResults - ensure same legalization for LO/HI ↵Simon Pilgrim2018-11-191-8/+6
| | | | | | | | | | operands (PR21207) Consistently use (!LegalOperations || isOperationLegalOrCustom) for all node pairs. Differential Revision: https://reviews.llvm.org/D53478 llvm-svn: 347255
* [TargetLowering] expandFP_TO_UINT - improve fp16 supportSimon Pilgrim2018-11-191-10/+18
| | | | | | | | | | As discussed on D53794, for float types with ranges smaller than the destination integer type, then we should be able to just use a regular FP_TO_SINT opcode. I thought we'd need to provide MSA test cases for very small integer types as well (fp16 -> i8 etc.), but it turns out that promotion will kick in so they're unnecessary. Differential Revision: https://reviews.llvm.org/D54703 llvm-svn: 347251
* [SelectionDAG] simplify vector select with undef operand(s)Sanjay Patel2018-11-192-5/+12
| | | | llvm-svn: 347227
* [SelectionDAG] simplify select FP with undef conditionSanjay Patel2018-11-191-1/+1
| | | | llvm-svn: 347212
* [SelectionDAG] add simplifySelect() to reduce code duplication; NFCSanjay Patel2018-11-192-36/+27
| | | | | | This should be extended to handle FP and vectors in follow-up patches. llvm-svn: 347210
* [DAG] add undef simplifications for select nodesSanjay Patel2018-11-182-14/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sadly, this duplicates (twice) the logic from InstSimplify. There might be some way to at least share the DAG versions of the code, but copying the folds seems to be the standard method to ensure that we don't miss these folds. Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no way to ensure that we do these kinds of simplifications unless the code is repeated at node creation time and during combines. There were other tests that would become worthless with this improvement that I changed as pre-commits: rL347161 rL347164 rL347165 rL347166 rL347167 I'm not sure how to salvage the remaining tests (diffs in this patch). So the x86 tests verify that the new code is working as intended. The AMDGPU test is actually similar to my motivating case: we have some undef value that has survived to machine IR in an x86 test, and then it gets folded in some weird way, or we crash if we don't transfer the undef flag. But we would have been better off never getting to that point by doing these simplifications. This will lead back to PR32023 someday... https://bugs.llvm.org/show_bug.cgi?id=32023 llvm-svn: 347170
* [SelectionDAG] simplify code; NFCSanjay Patel2018-11-181-6/+5
| | | | llvm-svn: 347160
* Use llvm::copy. NFCFangrui Song2018-11-171-3/+3
| | | | llvm-svn: 347126
* DAG combiner: fold (select, C, X, undef) -> XStanislav Mekhanoshin2018-11-161-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D54646 llvm-svn: 347110
* [LegalizeVectorOps] After custom legalizing an extending load or a ↵Craig Topper2018-11-161-2/+10
| | | | | | | | | | truncating store, make sure the custom code is also legal. For example, on X86 we emit a sign_extend_vector_inreg from LowerLoad and without sse4.1 this node will need further legalization. Previously this sign_extend_vector_inreg was being custom lowered during DAG legalization instead of vector op legalization. Unfortunately, this doesn't seem to matter for the output of any existing lit tests. llvm-svn: 347094
* [TargetLowering] Cleanup more of the EXTEND demanded bits cases so that they ↵Simon Pilgrim2018-11-161-10/+11
| | | | | | | | match. NFCI. Use the same variable names etc. llvm-svn: 347045
* [DAGCombine] Fix non-deterministic debug outputSam Parker2018-11-161-4/+4
| | | | | | | | | | | PR37970 reported non-deterministic debug output, this was caused by iterating through a set and not a a vector. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37970 Differential Revision: https://reviews.llvm.org/D54570 llvm-svn: 347037
* [LegalizeVectorTypes] Teach WidenVecRes_Convert to turn ANY_EXTEND into ↵Craig Topper2018-11-161-0/+2
| | | | | | | | ANY_EXTEND_VECTOR_INREG when the input and output types need to be widened to the same width. If we don't do it here, DAGCombine will just end up creating it from the scalar any_extend+build_vector so might as well save a step. llvm-svn: 347034
* Fixed DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT i1 handlingStanislav Mekhanoshin2018-11-131-0/+9
| | | | | | | | | Legalizer used to request an ext load from i8 to i1 when promoting vector element type to i8. Fixed. Differential Revision: https://reviews.llvm.org/D54440 llvm-svn: 346795
* [SelectionDAG][X86] Relax restriction on the width of an input to ↵Craig Topper2018-11-131-3/+2
| | | | | | | | | | | | | | | | | | *_EXTEND_VECTOR_INREG. Use them and regular *_EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784
* [IR] Add a dedicated FNeg IR InstructionCameron McInally2018-11-132-0/+12
| | | | | | | | | | | The IEEE-754 Standard makes it clear that fneg(x) and fsub(-0.0, x) are two different operations. The former is a bitwise operation, while the latter is an arithmetic operation. This patch creates a dedicated FNeg IR Instruction to model that behavior. Differential Revision: https://reviews.llvm.org/D53877 llvm-svn: 346774
* [DAGCombiner] Enable tryToFoldExtendOfConstant to run after legalize vector opsCraig Topper2018-11-131-14/+7
| | | | | | | | | | It should be ok to create a new build_vector after legal operations so long as it doesn't cause an infinite loop in DAG combiner. Unfortunately, X86's custom constant folding in combineVSZext is hiding any test changes from this. But I'm trying to get to a point where that X86 specific code isn't necessary at all. Differential Revision: https://reviews.llvm.org/D54285 llvm-svn: 346728
* [DAGCombiner] Fix load-store forwarding of indexed loads.Nirav Dave2018-11-121-3/+17
| | | | | | | | | | | | | | | | Summary: Handle extra output from index loads in cases where we wish to forward a load value directly from a preceeding store. Fixes PR39571. Reviewers: peter.smith, rengolin Subscribers: javed.absar, hiraditya, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54265 llvm-svn: 346654
* [DAGCombiner] Make tryToFoldExtendOfConstant return an SDValue instead of an ↵Craig Topper2018-11-101-14/+14
| | | | | | | | SDNode*. NFC Removes the need to call getNode internally and to recreate an SDValue after the call. llvm-svn: 346600
* [x86] allow vector load narrowing with multi-use valuesSanjay Patel2018-11-101-5/+7
| | | | | | | | | | | | | | | | | | | | | | This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595
* [SelectionDAG] Fix a -Wparentheses warning from gcc in an assert. NFCCraig Topper2018-11-091-2/+2
| | | | | | gcc wants parentheses around the logical OR since there is a logical AND for the string. llvm-svn: 346564
* [DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between ↵Craig Topper2018-11-091-2/+5
| | | | | | | | | | | | vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input. I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used. Differential Revision: https://reviews.llvm.org/D54283 llvm-svn: 346530
* Type safe version of MachinePassRegistrySerge Guelton2018-11-091-1/+2
| | | | | | | | | | | Previous version used type erasure through a `void* (*)()` pointer, which triggered gcc warning and implied a lot of reinterpret_cast. This version should make it harder to hit ourselves in the foot. Differential revision: https://reviews.llvm.org/D54203 llvm-svn: 346522
* [SelectionDAG] swap select_cc operands to enable foldingAlexandros Lamprineas2018-11-091-32/+34
| | | | | | | | | | | | | | | | | | The DAGCombiner tries to SimplifySelectCC as follows: select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4) It can't cope with the situation of reordered operands: select_cc(x, y, 0, 16, cc) In that case we just need to swap the operands and invert the Condition Code: select_cc(x, y, 16, 0, ~cc) Differential Revision: https://reviews.llvm.org/D53236 llvm-svn: 346484
* [SelectionDAG] Assert on the width of DemandedElts argument to ↵Craig Topper2018-11-081-2/+3
| | | | | | | | computeKnownBits for all vector typed operations not just build_vector. Fix AArch64 unit test that fails with the assertion added. llvm-svn: 346437
* [DAGCombine] Improve alias analysis for chain of independent stores.Nirav Dave2018-11-081-59/+116
| | | | | | | | | | | | | | | | | | | FindBetterNeighborChains simulateanously improves the chain dependencies of a chain of related stores avoiding the generation of extra token factors. For chains longer than the GatherAllAliasDepths, stores further down in the chain will necessarily fail, a potentially significant waste and preventing otherwise trivial parallelization. This patch directly parallelize the chains of stores before improving each store. This generally improves DAG-level parallelism. Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53552 llvm-svn: 346432
* Add support for llvm.is.constant intrinsic (PR4898)James Y Knight2018-11-072-0/+15
| | | | | | | | | | | | | | | This adds the llvm-side support for post-inlining evaluation of the __builtin_constant_p GCC intrinsic. Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call to a function where canConstantFoldTo returns true, and one of the arguments is a struct. Updated from patch initially by Janusz Sobczak. Differential Revision: https://reviews.llvm.org/D4276 llvm-svn: 346322
* [FPEnv] Add constrained CEIL/FLOOR/ROUND/TRUNC intrinsicsCameron McInally2018-11-056-0/+52
| | | | | | Differential Revision: https://reviews.llvm.org/D53411 llvm-svn: 346141
* [TargetLowering] Begin generalizing TargetLowering::expandFP_TO_SINT ↵Simon Pilgrim2018-11-051-26/+26
| | | | | | | | support. NFCI. Prior to initial work to add vector expansion support, remove assumptions that we're working on scalar types. llvm-svn: 346139
* [DAGCombiner] Use tryFoldToZero to simplify some code and make it work ↵Craig Topper2018-11-051-8/+2
| | | | | | | | correctly between LegalTypes and LegalOperations. The original code avoided creating a zero vector after type legalization, but if we're after type legalization the type we have is legal. The real hazard we need to avoid is creating a build vector after op legalization. tryFoldToZero takes care of checking for this. llvm-svn: 346119
* [DAGCombiner] Remove an unused argument from tryFoldToZero. NFCCraig Topper2018-11-051-4/+3
| | | | llvm-svn: 346118
* [DAGCombiner] Remove 'else' after return. NFCCraig Topper2018-11-041-8/+7
| | | | | | This makes this code consistent with the nearly identical code in visitZERO_EXTEND. llvm-svn: 346090
OpenPOWER on IntegriCloud