summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [NFC] TLI query with default(on) behavior wrt DAG combines for fmin/fmax ↵Michael Berg2019-01-281-3/+7
| | | | | | target control llvm-svn: 352396
* [DAGCombine] Enable more pre-indexed storesSam Parker2019-01-231-1/+7
| | | | | | | | | | | The current check in CombineToPreIndexedLoadStore is too conversative, preventing a pre-indexed store when the base pointer is a predecessor of the value being stored. Instead, we should check the pointer operand of the store. Differential Revision: https://reviews.llvm.org/D56719 llvm-svn: 351933
* [DAGCombiner] narrow vector binop with 2 insert subvector operandsSanjay Patel2019-01-221-1/+24
| | | | | | | | | | | | | | | | | | vecbo (insertsubv undef, X, Z), (insertsubv undef, Y, Z) --> insertsubv VecC, (vecbo X, Y), Z This is another step in generic vector narrowing. It's also a step towards more horizontal op formation specifically for x86 (although we still failed to match those in the affected tests). The scalarization cases are also not optimal (we should be scalarizing those), but it's still an improvement to use a narrower vector op when we know part of the result must be constant because both inputs are undef in some vector lanes. I think a similar match but checking for a constant operand might help some of the cases in D51553. Differential Revision: https://reviews.llvm.org/D56875 llvm-svn: 351825
* [DAGCombiner] fix crash when converting build vector to shuffleSanjay Patel2019-01-211-5/+11
| | | | | | | | | | The regression test is reduced from the example shown in D56281. This does raise a question as noted in the test file: do we want to handle this pattern? I don't have a motivating example for that on x86 yet, but it seems like we could have that pattern there too, so we could avoid the back-and-forth using a shuffle. llvm-svn: 351753
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [SelectionDAG] Split very large token factors for chained stores to 64k chunks.Florian Hahn2019-01-181-1/+1
| | | | | | | | | | | | | | Similar to D55073. Without this change, the DAG combiner crashes on code with more than 64k of stores in a single basic block that form parallelizable chains. No test case, as it would be very IR file. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D56740 llvm-svn: 351571
* [DAGCombine] Fix ReduceLoadWidth for shifted offsetsSam Parker2019-01-161-12/+8
| | | | | | | | | | | | ReduceLoadWidth can trigger using a shifted mask is used and this requires that the function return a shl node to correct for the offset. However, the way that this was implemented meant that the returned result could be an existing node, which would be incorrect. This fixes the method of inserting the new node and replacing uses. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 351310
* [DAGCombiner] reduce buildvec of zexted extracted element to shuffleSanjay Patel2019-01-151-0/+75
| | | | | | | | | | | | | | | The motivating case for this is shown in the first regression test. We are transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'. That's a special-case for a more general pattern as shown here. In all tests, we're avoiding the vector-scalar-vector moves in favor of vector ops. We aren't producing optimal shuffle code in some cases though, so the patch is limited to reduce regressions. Differential Revision: https://reviews.llvm.org/D56281 llvm-svn: 351198
* [DAGCombiner] Add (sub_sat x, x) -> 0 combineSimon Pilgrim2019-01-141-0/+4
| | | | llvm-svn: 351073
* [DAGCombiner] Enable sub saturation constant foldingSimon Pilgrim2019-01-141-1/+6
| | | | llvm-svn: 351072
* [DAGCombiner] Add add/sub saturation undef handlingSimon Pilgrim2019-01-141-0/+8
| | | | | | | | Match ConstantFolding.cpp: (add_sat x, undef) -> -1 (sub_sat x, undef) -> 0 llvm-svn: 351070
* [DAGCombiner] Enable add saturation constant foldingSimon Pilgrim2019-01-141-2/+3
| | | | llvm-svn: 351060
* [DAGCombiner] Add add saturation constant folding tests.Simon Pilgrim2019-01-141-2/+3
| | | | | | Exposes an issue with sadd_sat for computeOverflowKind, so I've disabled it for now. llvm-svn: 351057
* [DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y)Simon Pilgrim2019-01-131-0/+4
| | | | | NOTE: We need more powerful signed overflow detection in computeOverflowKind llvm-svn: 351026
* Fix unused variable warning. NFCI.Simon Pilgrim2019-01-131-1/+0
| | | | llvm-svn: 351025
* [DAGCombiner] Some very basic add/sub saturation combines.Simon Pilgrim2019-01-131-0/+64
| | | | | | Handle combines with zero and constant canonicalization for adds. llvm-svn: 351024
* [DAGCombiner] fold insert_subvector of insert_subvectorSanjay Patel2019-01-121-0/+8
| | | | | | | | | | | | | | | | | | | This pattern: t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0> t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0> ...shows up in PR33758: https://bugs.llvm.org/show_bug.cgi?id=33758 ...although this patch doesn't make any difference to the final result on that yet. In the affected tests here, it looks like it just makes RA wiggle. But we might as well squash this to prevent it interfering with other pattern-matching. Differential Revision: https://reviews.llvm.org/D56604 llvm-svn: 351008
* [DAGCombiner] simplify code; NFCSanjay Patel2019-01-101-11/+11
| | | | llvm-svn: 350844
* [DAGCombiner][x86] scalarize binop followed by extractelementSanjay Patel2019-01-031-5/+44
| | | | | | | | | | | | | | | | | | | | As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354
* [DAGCombiner] After performing the division by constant optimization for a ↵Craig Topper2019-01-021-2/+29
| | | | | | | | | | | | DIV or REM node, replace the users of the corresponding REM or DIV node if it exists. Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced. Improves the test case from PR38217. There may be additional opportunities after this. Differential Revision: https://reviews.llvm.org/D56145 llvm-svn: 350239
* [DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold ↵Craig Topper2019-01-021-2/+5
| | | | | | | | | | | | (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them. If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead. The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this. Differential Revision: https://reviews.llvm.org/D56156 llvm-svn: 350235
* [DAGCombiner] Add missing one use check on the shuffle in the ↵Craig Topper2018-12-311-1/+1
| | | | | | | | bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform. Found while trying out some other changes so I don't really have a test case. llvm-svn: 350172
* [DAGCombiner] limit shuffle to extend transform (PR40146)Sanjay Patel2018-12-231-4/+5
| | | | | | | | | | It's dangerous to knowingly create an illegal vector type no matter what stage of combining we're in. This prevents the missed folding/scalarization seen in: https://bugs.llvm.org/show_bug.cgi?id=40146 llvm-svn: 350034
* [DAGCombiner] allow hoisting vector bitwise logic ahead of extendsSanjay Patel2018-12-231-6/+5
| | | | llvm-svn: 350032
* [DAGCombiner] allow narrowing of add followed by truncateSanjay Patel2018-12-221-2/+1
| | | | | | | | | | | | | | | trunc (add X, C ) --> add (trunc X), C' If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type. This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine). This change used to show regressions for x86, but those are gone after D55494. This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) that does almost the same thing. Differential Revision: https://reviews.llvm.org/D55866 llvm-svn: 350006
* [DAGCombiner] simplify code leading to scalarizeExtractedVectorLoad; NFCSanjay Patel2018-12-211-6/+5
| | | | llvm-svn: 349958
* [SelectionDAG] Always use the version of computeKnownBits that returns a ↵Simon Pilgrim2018-12-211-10/+6
| | | | | | | | value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349907
* [ARM] Complete the Thumb1 shift+and->shift+shift transforms.Eli Friedman2018-12-201-1/+2
| | | | | | | | | | | | | | This saves materializing the immediate. The additional forms are less common (they don't usually show up for bitfield insert/extract), but they're still relevant. I had to add a new target hook to prevent DAGCombine from reversing the transform. That isn't the only possible way to solve the conflict, but it seems straightforward enough. Differential Revision: https://reviews.llvm.org/D55630 llvm-svn: 349857
* [DAGCombiner] Fix a place that was creating a SIGN_EXTEND with an extra operand.Craig Topper2018-12-201-1/+1
| | | | llvm-svn: 349726
* [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate ↵Simon Pilgrim2018-12-191-4/+4
| | | | | | | | | | | | | | (part 2 of 2) Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs. This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument. I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases. Differential Revision: https://reviews.llvm.org/D55822 llvm-svn: 349629
* [TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)Simon Pilgrim2018-12-191-4/+9
| | | | | | | | | | | | As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero. SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue. Thanks to @dmgreen for catching this. Differential Revision: https://reviews.llvm.org/D55883 llvm-svn: 349625
* [DAGCombiner] allow hoisting vector bitwise logic ahead of truncatesSanjay Patel2018-12-161-5/+2
| | | | | | | | | | | | | | | | | | The transform performs a bitwise logic op in a wider type followed by truncate when both inputs are truncated from the same source type: logic_op (truncate x), (truncate y) --> truncate (logic_op x, y) There are a bunch of other checks that should prevent doing this when it might be harmful. We already do this transform for scalars in this spot. The vector limitation was shared with a check for the case when the operands are extended. I'm not sure if that limit is needed either, but that would be a separate patch. Differential Revision: https://reviews.llvm.org/D55448 llvm-svn: 349303
* [SelectionDAG] Add FSHL/FSHR support to computeKnownBitsSimon Pilgrim2018-12-161-2/+4
| | | | | | Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type). llvm-svn: 349298
* [DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext ↵Craig Topper2018-12-141-15/+18
| | | | | | | | | | | | | | | | | (setcc)) already has the target desired type for the setcc Summary: If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node. To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55459 llvm-svn: 349137
* [DAGCombiner] clean up visitEXTRACT_VECTOR_ELTSanjay Patel2018-12-141-138/+129
| | | | | | | | | | | | | | | | | | This isn't quite NFC, but I don't know how to expose any outward diffs from these changes. Mostly, this was confusing because it used 'VT' to refer to the operand type rather the usual type of the input node. There's also a large block at the end that is dedicated solely to matching loads, but that wasn't obvious. This could probably be split up into separate functions to make it easier to see. It's still not clear to me when we make certain transforms because the legality and constant conditions are intertwined in a way that might be improved. llvm-svn: 349095
* [DAGCombiner] after simplifying demanded elements of vector operand of ↵Sanjay Patel2018-12-131-1/+6
| | | | | | | | | | | extract, revisit the extract; 2nd try This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from number of uses to an opcode test for DELETED_NODE based on existing similar code. Differential Revision: https://reviews.llvm.org/D55655 llvm-svn: 349058
* revert rL349051: [DAGCombiner] after simplifying demanded elements of vector ↵Sanjay Patel2018-12-131-6/+1
| | | | | | | | | operand of extract, revisit the extract This causes an address sanitizer bot failure: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/27187/steps/check-llvm%20asan/logs/stdio llvm-svn: 349056
* [DAGCombiner] after simplifying demanded elements of vector operand of ↵Sanjay Patel2018-12-131-1/+6
| | | | | | | | extract, revisit the extract Differential Revision: https://reviews.llvm.org/D55655 llvm-svn: 349051
* [DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombinerSimon Pilgrim2018-12-131-0/+7
| | | | | | Remove common code from custom lowering (code is still safe if somehow a zero value gets used). llvm-svn: 349028
* [DAGCombiner] Remove unnecessary recursive ↵Simon Pilgrim2018-12-101-6/+0
| | | | | | | | DAGCombiner::visitINSERT_SUBVECTOR call. As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now. llvm-svn: 348781
* [DAGCombiner] Use the result value type in visitCONCAT_VECTORSFrancis Visoiu Mistrih2018-12-101-1/+1
| | | | | | | | | | | | | | | This triggers an assert when combining concat_vectors of a bitcast of merge_values. With asserts disabled, it fails to select: fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8 0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1 0x7ff19d000b50: f64 = Register %1 In function: d Differential Revision: https://reviews.llvm.org/D55507 llvm-svn: 348759
* [DAGCombiner] re-enable truncation of binopsSanjay Patel2018-12-081-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is effectively re-committing the changes from: rL347917 (D54640) rL348195 (D55126) ...which were effectively reverted here: rL348604 ...because the code had a bug that could induce infinite looping or eventual out-of-memory compilation. The bug was that this code did not guard against transforming opaque constants. More details are in the post-commit mailing list thread for r347917. A reduced test for that is included in the x86 bool-math.ll file. (I wasn't able to reduce a PPC backend test for this, but it was almost the same pattern.) Original commit message for r347917: The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. llvm-svn: 348706
* [DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFCSanjay Patel2018-12-071-33/+48
| | | | | | | This duplicates several shared checks, but we need to split this up to fix underlying bugs in smaller steps. llvm-svn: 348627
* [DAGCombiner] disable truncation of binops by defaultSanjay Patel2018-12-071-1/+7
| | | | | | | | | | As discussed in the post-commit thread of r347917, this transform is fighting with an existing transform causing an infinite loop or out-of-memory, so this is effectively reverting r347917 and its follow-up r348195 while we investigate the bug. llvm-svn: 348604
* [DAGCombiner] remove explicit calls to AddToWorkList; NFCISanjay Patel2018-12-071-6/+0
| | | | | | | | As noted in the post-commit thread for rL347917: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608936.html ...we don't need to repeat these calls because the combiner does it automatically. llvm-svn: 348597
* [DAGCombiner] use root SDLoc for all nodes created by logic foldSanjay Patel2018-12-071-1/+1
| | | | | | | | | | | If this is not a valid way to assign an SDLoc, then we get this wrong all over SDAG. I don't know enough about the SDAG to explain this. IIUC, theoretically, debug info is not supposed to affect codegen. But here it has clearly affected 3 different targets, and the x86 change is an actual improvement. llvm-svn: 348552
* [DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCISanjay Patel2018-12-061-1/+1
| | | | | | | | | | | | | | We shouldn't care about the debug location for a node that we're creating, but attaching the root of the pattern should be the best effort. (If this is not true, then we are doing it wrong all over the SDAG). This is no-functional-change-intended, and there are no regression test diffs...and that's what I expected. But there's a similar line above this diff, where those assumptions apparently do not hold. llvm-svn: 348550
* [DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFCSanjay Patel2018-12-061-41/+34
| | | | | | This code can still misbehave. llvm-svn: 348547
* [DAGCombiner] don't group bswap with casts in logic hoisting foldSanjay Patel2018-12-061-6/+15
| | | | | | | | | | | | | | | | This was probably organized as it was because bswap is a unary op. But that's where the similarity to the other opcodes ends. We should not limit this transform to scalars, and we should not try it if either input has other uses. This is another step towards trying to clean this whole function up to prevent it from causing infinite loops and memory explosions. Earlier commits in this series: rL348501 rL348508 rL348518 llvm-svn: 348534
* [DAGCombiner] reduce indent; NFCSanjay Patel2018-12-061-38/+31
| | | | | | | | Unlike some of the folds in hoistLogicOpWithSameOpcodeHands() above this shuffle transform, this has the expected hasOneUse() checks in place. llvm-svn: 348523
OpenPOWER on IntegriCloud