summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [DAGCombiner] limit shuffle to extend transform (PR40146)Sanjay Patel2018-12-231-4/+5
| | | | | | | | | | It's dangerous to knowingly create an illegal vector type no matter what stage of combining we're in. This prevents the missed folding/scalarization seen in: https://bugs.llvm.org/show_bug.cgi?id=40146 llvm-svn: 350034
* [DAGCombiner] allow hoisting vector bitwise logic ahead of extendsSanjay Patel2018-12-231-6/+5
| | | | llvm-svn: 350032
* [DAGCombiner] allow narrowing of add followed by truncateSanjay Patel2018-12-221-2/+1
| | | | | | | | | | | | | | | trunc (add X, C ) --> add (trunc X), C' If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type. This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine). This change used to show regressions for x86, but those are gone after D55494. This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) that does almost the same thing. Differential Revision: https://reviews.llvm.org/D55866 llvm-svn: 350006
* [DAGCombiner] simplify code leading to scalarizeExtractedVectorLoad; NFCSanjay Patel2018-12-211-6/+5
| | | | llvm-svn: 349958
* [SelectionDAG] Always use the version of computeKnownBits that returns a ↵Simon Pilgrim2018-12-211-10/+6
| | | | | | | | value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349907
* [ARM] Complete the Thumb1 shift+and->shift+shift transforms.Eli Friedman2018-12-201-1/+2
| | | | | | | | | | | | | | This saves materializing the immediate. The additional forms are less common (they don't usually show up for bitfield insert/extract), but they're still relevant. I had to add a new target hook to prevent DAGCombine from reversing the transform. That isn't the only possible way to solve the conflict, but it seems straightforward enough. Differential Revision: https://reviews.llvm.org/D55630 llvm-svn: 349857
* [DAGCombiner] Fix a place that was creating a SIGN_EXTEND with an extra operand.Craig Topper2018-12-201-1/+1
| | | | llvm-svn: 349726
* [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate ↵Simon Pilgrim2018-12-191-4/+4
| | | | | | | | | | | | | | (part 2 of 2) Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs. This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument. I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases. Differential Revision: https://reviews.llvm.org/D55822 llvm-svn: 349629
* [TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)Simon Pilgrim2018-12-191-4/+9
| | | | | | | | | | | | As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero. SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue. Thanks to @dmgreen for catching this. Differential Revision: https://reviews.llvm.org/D55883 llvm-svn: 349625
* [DAGCombiner] allow hoisting vector bitwise logic ahead of truncatesSanjay Patel2018-12-161-5/+2
| | | | | | | | | | | | | | | | | | The transform performs a bitwise logic op in a wider type followed by truncate when both inputs are truncated from the same source type: logic_op (truncate x), (truncate y) --> truncate (logic_op x, y) There are a bunch of other checks that should prevent doing this when it might be harmful. We already do this transform for scalars in this spot. The vector limitation was shared with a check for the case when the operands are extended. I'm not sure if that limit is needed either, but that would be a separate patch. Differential Revision: https://reviews.llvm.org/D55448 llvm-svn: 349303
* [SelectionDAG] Add FSHL/FSHR support to computeKnownBitsSimon Pilgrim2018-12-161-2/+4
| | | | | | Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type). llvm-svn: 349298
* [DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext ↵Craig Topper2018-12-141-15/+18
| | | | | | | | | | | | | | | | | (setcc)) already has the target desired type for the setcc Summary: If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node. To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55459 llvm-svn: 349137
* [DAGCombiner] clean up visitEXTRACT_VECTOR_ELTSanjay Patel2018-12-141-138/+129
| | | | | | | | | | | | | | | | | | This isn't quite NFC, but I don't know how to expose any outward diffs from these changes. Mostly, this was confusing because it used 'VT' to refer to the operand type rather the usual type of the input node. There's also a large block at the end that is dedicated solely to matching loads, but that wasn't obvious. This could probably be split up into separate functions to make it easier to see. It's still not clear to me when we make certain transforms because the legality and constant conditions are intertwined in a way that might be improved. llvm-svn: 349095
* [DAGCombiner] after simplifying demanded elements of vector operand of ↵Sanjay Patel2018-12-131-1/+6
| | | | | | | | | | | extract, revisit the extract; 2nd try This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from number of uses to an opcode test for DELETED_NODE based on existing similar code. Differential Revision: https://reviews.llvm.org/D55655 llvm-svn: 349058
* revert rL349051: [DAGCombiner] after simplifying demanded elements of vector ↵Sanjay Patel2018-12-131-6/+1
| | | | | | | | | operand of extract, revisit the extract This causes an address sanitizer bot failure: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/27187/steps/check-llvm%20asan/logs/stdio llvm-svn: 349056
* [DAGCombiner] after simplifying demanded elements of vector operand of ↵Sanjay Patel2018-12-131-1/+6
| | | | | | | | extract, revisit the extract Differential Revision: https://reviews.llvm.org/D55655 llvm-svn: 349051
* [DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombinerSimon Pilgrim2018-12-131-0/+7
| | | | | | Remove common code from custom lowering (code is still safe if somehow a zero value gets used). llvm-svn: 349028
* [DAGCombiner] Remove unnecessary recursive ↵Simon Pilgrim2018-12-101-6/+0
| | | | | | | | DAGCombiner::visitINSERT_SUBVECTOR call. As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now. llvm-svn: 348781
* [DAGCombiner] Use the result value type in visitCONCAT_VECTORSFrancis Visoiu Mistrih2018-12-101-1/+1
| | | | | | | | | | | | | | | This triggers an assert when combining concat_vectors of a bitcast of merge_values. With asserts disabled, it fails to select: fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8 0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1 0x7ff19d000b50: f64 = Register %1 In function: d Differential Revision: https://reviews.llvm.org/D55507 llvm-svn: 348759
* [DAGCombiner] re-enable truncation of binopsSanjay Patel2018-12-081-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is effectively re-committing the changes from: rL347917 (D54640) rL348195 (D55126) ...which were effectively reverted here: rL348604 ...because the code had a bug that could induce infinite looping or eventual out-of-memory compilation. The bug was that this code did not guard against transforming opaque constants. More details are in the post-commit mailing list thread for r347917. A reduced test for that is included in the x86 bool-math.ll file. (I wasn't able to reduce a PPC backend test for this, but it was almost the same pattern.) Original commit message for r347917: The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. llvm-svn: 348706
* [DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFCSanjay Patel2018-12-071-33/+48
| | | | | | | This duplicates several shared checks, but we need to split this up to fix underlying bugs in smaller steps. llvm-svn: 348627
* [DAGCombiner] disable truncation of binops by defaultSanjay Patel2018-12-071-1/+7
| | | | | | | | | | As discussed in the post-commit thread of r347917, this transform is fighting with an existing transform causing an infinite loop or out-of-memory, so this is effectively reverting r347917 and its follow-up r348195 while we investigate the bug. llvm-svn: 348604
* [DAGCombiner] remove explicit calls to AddToWorkList; NFCISanjay Patel2018-12-071-6/+0
| | | | | | | | As noted in the post-commit thread for rL347917: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608936.html ...we don't need to repeat these calls because the combiner does it automatically. llvm-svn: 348597
* [DAGCombiner] use root SDLoc for all nodes created by logic foldSanjay Patel2018-12-071-1/+1
| | | | | | | | | | | If this is not a valid way to assign an SDLoc, then we get this wrong all over SDAG. I don't know enough about the SDAG to explain this. IIUC, theoretically, debug info is not supposed to affect codegen. But here it has clearly affected 3 different targets, and the x86 change is an actual improvement. llvm-svn: 348552
* [DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCISanjay Patel2018-12-061-1/+1
| | | | | | | | | | | | | | We shouldn't care about the debug location for a node that we're creating, but attaching the root of the pattern should be the best effort. (If this is not true, then we are doing it wrong all over the SDAG). This is no-functional-change-intended, and there are no regression test diffs...and that's what I expected. But there's a similar line above this diff, where those assumptions apparently do not hold. llvm-svn: 348550
* [DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFCSanjay Patel2018-12-061-41/+34
| | | | | | This code can still misbehave. llvm-svn: 348547
* [DAGCombiner] don't group bswap with casts in logic hoisting foldSanjay Patel2018-12-061-6/+15
| | | | | | | | | | | | | | | | This was probably organized as it was because bswap is a unary op. But that's where the similarity to the other opcodes ends. We should not limit this transform to scalars, and we should not try it if either input has other uses. This is another step towards trying to clean this whole function up to prevent it from causing infinite loops and memory explosions. Earlier commits in this series: rL348501 rL348508 rL348518 llvm-svn: 348534
* [DAGCombiner] reduce indent; NFCSanjay Patel2018-12-061-38/+31
| | | | | | | | Unlike some of the folds in hoistLogicOpWithSameOpcodeHands() above this shuffle transform, this has the expected hasOneUse() checks in place. llvm-svn: 348523
* [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.Andrea Di Biagio2018-12-061-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes: concat_vectors( bitcast (scalar_to_vector %A), UNDEF) --> bitcast (scalar_to_vector %A) This patch only partially addresses PR39257. In particular, it is enough to fix one of the two problematic cases mentioned in PR39257. However, it is not enough to fix the original test case posted by Craig; that particular case would probably require a more complicated approach (and knowledge about used bits). Before this patch, we used to generate the following code for function PR39257 (-mtriple=x86_64 , -mattr=+avx): vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vxorps %xmm1, %xmm1, %xmm1 vblendps $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3] vmovaps %ymm0, (%rsi) vzeroupper retq Now we generate this: vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vmovaps %ymm0, (%rsi) vzeroupper retq As a side note: that VZEROUPPER is completely redundant... I guess the vzeroupper insertion pass doesn't realize that the definition of %xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on %-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion %pass is disabled. Differential Revision: https://reviews.llvm.org/D55274 llvm-svn: 348522
* [DAGCombiner] don't hoist logic op if operands have other uses, part 2Sanjay Patel2018-12-061-5/+7
| | | | | | | | | The PPC test with 2 extra uses seems clearly better by avoiding this transform. With 1 extra use, we also prevent an extra register move (although that might be an RA problem). The general rule should be to only make a change here if it is always profitable. The x86 diffs are all neutral. llvm-svn: 348518
* [DAGCombiner] don't hoist logic op if operands have other usesSanjay Patel2018-12-061-2/+6
| | | | | | | | | | | | | | | | | The AVX512 diffs are neutral, but the bswap test shows a clear overreach in hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can increase the instruction count. This could also fight with transforms trying to go in the opposite direction and possibly blow up/infinite loop. This might be enough to solve the bug noted here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html I did not add the hasOneUse() checks to all opcodes because I see a perf regression for at least one opcode. We may decide that's irrelevant in the face of potential compiler crashing, but I'll see if I can salvage that first. llvm-svn: 348508
* [DAGCombiner] refactor function that hoists bitwise logic; NFCISanjay Patel2018-12-061-56/+65
| | | | | | | | | Added FIXME and TODO comments for lack of safety checks. This function is a suspect in out-of-memory errors as discussed in the follow-up thread to r347917: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html llvm-svn: 348501
* DAGCombiner::visitINSERT_VECTOR_ELT - pull out repeated ↵Simon Pilgrim2018-12-061-3/+4
| | | | | | VT.getVectorNumElements(). NFCI. llvm-svn: 348494
* [DAGCombiner] don't try to extract a fraction of a vector binop and crash ↵Sanjay Patel2018-12-051-10/+14
| | | | | | | | | | | | | (PR39893) Because we're potentially peeking through a bitcast in this transform, we need to use overall bitwidths rather than number of elements to determine when it's safe to proceed. Should fix: https://bugs.llvm.org/show_bug.cgi?id=39893 llvm-svn: 348383
* [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)Simon Pilgrim2018-12-051-0/+36
| | | | | | | | | | This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353
* [TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodesSimon Pilgrim2018-12-041-0/+6
| | | | | | | | Add support for ISD::*_EXTEND and ISD::*_EXTEND_VECTOR_INREG opcodes. The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch. llvm-svn: 348246
* [DAGCombiner] narrow truncated vector binops when legalSanjay Patel2018-12-031-7/+11
| | | | | | | | | | | | | | | | This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision: https://reviews.llvm.org/D55126 llvm-svn: 348195
* [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb ↵Craig Topper2018-12-031-0/+20
| | | | | | | | | | | | | | | | | | | that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158
* [DAGCombiner] guard against an oversized shift crashSanjay Patel2018-12-021-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This change prevents the crash noted in the post-commit comments for rL347478 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision: https://reviews.llvm.org/D54954 llvm-svn: 348089
* [DAGCombiner] narrow truncated binopsSanjay Patel2018-11-291-0/+22
| | | | | | | | | | | | | | | | | | | The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. Differential Revision: https://reviews.llvm.org/D54640 llvm-svn: 347917
* [x86] limit transform for select-of-fp-constantsSanjay Patel2018-11-251-0/+3
| | | | | | | | | | | | This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov. llvm-svn: 347526
* [SelectionDAG] move constant or splat functions to common locationSanjay Patel2018-11-251-37/+12
| | | | | | | | rL347502 moved the null sibling, so we should group all of these together. I'm not sure why these aren't methods of the SDValue class itself, but that's another patch if that's possible. llvm-svn: 347523
* [DAG] consolidate shift simplificationsSanjay Patel2018-11-231-67/+24
| | | | | | | | | | ...and use them to avoid creating obviously undef values as discussed in the post-commit thread for r347478. The diffs in vector div/rem show that we were missing real optimizations by creating bogus shift nodes. llvm-svn: 347502
* [DAGCombiner] form 'not' ops ahead of shifts (PR39657)Sanjay Patel2018-11-221-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'), but that would not matter without this patch because DAGCombiner was reversing that transform. I think we need this transform in the backend regardless of what happens in IR to catch cases where the shift-xor is formed late from GEP or other ops. https://rise4fun.com/Alive/NC1 Name: shl Pre: (-1 << C2) == C1 %shl = shl i8 %x, C2 %r = xor i8 %shl, C1 => %not = xor i8 %x, -1 %r = shl i8 %not, C2 Name: shr Pre: (-1 u>> C2) == C1 %sh = lshr i8 %x, C2 %r = xor i8 %sh, C1 => %not = xor i8 %x, -1 %r = lshr i8 %not, C2 https://bugs.llvm.org/show_bug.cgi?id=39657 llvm-svn: 347478
* [DAGCombiner] refactor select-of-FP-constants transformSanjay Patel2018-11-211-53/+60
| | | | | | | | | | | | | | This transform needs to be limited. We are converting to a constant pool load very early, and we are turning loads that are independent of the select condition (and therefore speculatable) into a dependent non-speculatable load. We may also be transferring a condition code from an FP register to integer to create that dependent load. llvm-svn: 347424
* [DAGCombiner] reduce code duplication; NFCSanjay Patel2018-11-211-33/+30
| | | | llvm-svn: 347410
* [DAGCombiner] look through bitcasts when trying to narrow vector binopsSanjay Patel2018-11-201-13/+14
| | | | | | | | | | | | | | | | | | | This is another step in vector narrowing - a follow-up to D53784 (and hoping to eventually squash potential regressions seen in D51553). The x86 test diffs are wins, but the AArch64 diff is probably not. That problem already exists independent of this patch (see PR39722), but it went unnoticed in the previous patch because there were no regression tests that showed the possibility. The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling concerns with using wider vector ops, an extra extract to reduce vector width is the right trade-off at this level of codegen. Differential Revision: https://reviews.llvm.org/D54392 llvm-svn: 347356
* [DAGCombine] Add calls to SimplifyDemandedVectorElts from ↵Simon Pilgrim2018-11-201-0/+4
| | | | | | | | visitINSERT_SUBVECTOR (PR37989) This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices. llvm-svn: 347313
* [DAGCombiner] reduce code duplication in visitXOR; NFCSanjay Patel2018-11-201-32/+29
| | | | llvm-svn: 347278
* [DAGCombine] SimplifyNodeWithTwoResults - ensure same legalization for LO/HI ↵Simon Pilgrim2018-11-191-8/+6
| | | | | | | | | | operands (PR21207) Consistently use (!LegalOperations || isOperationLegalOrCustom) for all node pairs. Differential Revision: https://reviews.llvm.org/D53478 llvm-svn: 347255
OpenPOWER on IntegriCloud