summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [DAGCombine] Improve Load-Store ForwardingNirav Dave2018-10-101-11/+134
| | | | | | | | | | | | | | | | | | Summary: Extend analysis forwarding loads from preceeding stores to work with extended loads and truncated stores to the same address so long as the load is fully subsumed by the store. Hexagon's swp-epilog-phis.ll and swp-memrefs-epilog1.ll test are deleted as they've no longer seem to be relevant. Reviewers: RKSimon, rnk, kparzysz, javed.absar Subscribers: sdardis, nemanjai, hiraditya, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D49200 llvm-svn: 344142
* [DAGCombiner] Expand combining of FP logical ops to sign-setting FP opsNemanja Ivanovic2018-10-091-3/+14
| | | | | | | | | | | | | | | | | We already do the following combines: (bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X (bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X When the target has "bit preserving fp logic". This patch just extends it to also combine: (bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X) As some targets have fnabs and even those that don't can efficiently lower both the fabs and the fneg. Differential revision: https://reviews.llvm.org/D44548 llvm-svn: 344093
* [DAGCombiner] simplify code for fmul with constant fold; NFCISanjay Patel2018-10-081-18/+8
| | | | llvm-svn: 343997
* [DAGCombiner] allow undef elts in vector fadd matchingSanjay Patel2018-10-071-1/+1
| | | | llvm-svn: 343945
* [DAGCombiner] allow undefs when matching vector splats for fmul foldsSanjay Patel2018-10-071-2/+2
| | | | llvm-svn: 343942
* [DAGCombiner] allow undef elts in vector fabs/fneg matchingSanjay Patel2018-10-071-1/+1
| | | | | | | | This change is proposed as a part of D44548, but we need this independently to avoid regressions from improved undef propagation in SimplifyDemandedVectorElts(). llvm-svn: 343940
* [DAGCombiner] shorten code for bitcast+fabs fold; NFCSanjay Patel2018-10-071-5/+2
| | | | llvm-svn: 343939
* [SelectionDAG] allow undefs when matching splat constantsSanjay Patel2018-10-051-4/+5
| | | | | | | | | And use that to transform fsub with zero constant operands. The integer part isn't used yet, but it is proposed for use in D44548, so adding both enhancements here makes that patch simpler. llvm-svn: 343865
* DAGCombiner: StoreMerging: Fix bad index calculating when adjusting ↵Matthias Braun2018-10-011-17/+8
| | | | | | | | | | | | | | | | mismatching vector types This fixes a case of bad index calculation when merging mismatching vector types. This changes the existing code to just use the existing extract_{subvector|element} and a bitcast (instead of bitcast first and then newly created extract_xxx) so we don't need to adjust any indices in the first place. rdar://44584718 Differential Revision: https://reviews.llvm.org/D52681 llvm-svn: 343493
* [DAG] Don't perform SINT_TO_FP<->UINT_TO_FP custom conversion after legalizationSimon Pilgrim2018-09-301-4/+4
| | | | | | | | The SINT_TO_FP<->UINT_TO_FP combines for non-negative integers should only occur for legal ops once LegalOperations = true No test case to hand, noticed when investigating PR38226 + PR38970 llvm-svn: 343405
* [DAGCombiner] [NFC] Improve X div/rem 1 foldDavid Bolvansky2018-09-281-8/+5
| | | | | | | | | | | | Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52661 llvm-svn: 343349
* llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)Fangrui Song2018-09-271-6/+4
| | | | | | | | | | | | Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163
* [DAGCombiner] Remove unnecessary check for visitSDIVLike/visitUDIVLike ↵Craig Topper2018-09-251-2/+1
| | | | | | | | returning a UDIVREM or SDIVREM node. This shouldn't be possible and is a leftover from when we used to recursively call combine here. llvm-svn: 343049
* [DAGCombine] Improve Predecessor check in SimplifySelectOps. NFCI.Nirav Dave2018-09-251-4/+36
| | | | | | | Reuse search space bookkeeping across multiple predecessor checks qdone to avoid redundancy. This should cut search cost by ~4x. llvm-svn: 342984
* [DAGCombine] Share predecessor bookkeeping in CombineToPostIndexedLoadStore. ↵Nirav Dave2018-09-251-2/+9
| | | | | | NFCI. llvm-svn: 342983
* [DAGCombine] Don't fold dependent loads across SELECT_CC.Nirav Dave2018-09-251-4/+5
| | | | | | | | | | | | | | | | | | DAGCombine will try to fold two loads that feed a SELECT or SELECT_CC after the select, resulting in a select of an address and a single load after. If either of the loads depend on the other, this is not legal as it could introduce cycles. However, it only checked this if the opcode was a SELECT, and not for a SELECT_CC. Unfortunately, the only reproducer I have for this is for our downstream target. I've tried getting it to trigger on an upstream one but haven't been successful. Patch thanks to Bevin Hansson. llvm-svn: 342980
* [DAGCombiner] use UADDO to optimize saturated unsigned addSanjay Patel2018-09-241-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a preliminary step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 If we have an 'add' instruction that sets flags, we can use that to eliminate an explicit compare instruction or some other instruction (cmn) that sets flags for use in the later select. As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively reversing an IR icmp canonicalization that replaces a variable operand with a constant: https://rise4fun.com/Alive/V1Q But we're not using 'uaddo' in those cases via DAG transforms. This happens in CGP after D8889 without checking target lowering to see if the op is supported. So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with "using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned saturated add and converts to uaddo without checking target capabilities. This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title (unlike x86 which sees improvements for all sizes because all sizes are 'custom'). But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, so this patch will fire on those tests. Another possibility given the existing behavior: we could remove the legal-or-custom check altogether because we're assuming that a UADDO sequence is canonical/optimal before we ever reach here. But that seems like a bug to me. If the target doesn't have an add-with-flags op, then it's not likely that we'll get optimal DAG combining using a UADDO node. This is similar justification for why we don't canonicalize IR to the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first place. Differential Revision: https://reviews.llvm.org/D51929 llvm-svn: 342886
* Remove debug printf leftover from r342397Hans Wennborg2018-09-241-2/+0
| | | | llvm-svn: 342863
* [DAGCombiner] Remove some dead code from ConstantFoldBITCASTofBUILD_VECTORCraig Topper2018-09-241-9/+2
| | | | | | This code handled SCALAR_TO_VECTOR being returned by the recursion, but the code that used to return SCALAR_TO_VECTOR was removed in 2015. llvm-svn: 342856
* [DAGCombiner] Clarify a comment. NFCCraig Topper2018-09-231-2/+4
| | | | | | This comment was misleading about why we were restricting to before legalize types. The reason given would only apply to before legalize ops. But there is a before legalize types reason that should also be listed. llvm-svn: 342851
* [DAGCombiner][x86] extend decompose of integer multiply into shift/add with ↵Sanjay Patel2018-09-231-6/+13
| | | | | | | | | | | | | | | | | negation This is an alternative to https://reviews.llvm.org/D37896. We can't decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some existing code that overlaps with this transform. This extends D52195 and may resolve PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 (still an open question about transforming legal vector multiplies, but we could open another bug report for those) llvm-svn: 342844
* [DAGCombiner] Simplify some code in visitBITCAST. NFCICraig Topper2018-09-221-9/+3
| | | | llvm-svn: 342826
* [DAGCombiner] Rewrite r331896 in a different way to address a FIXME. NFCICraig Topper2018-09-221-11/+14
| | | | llvm-svn: 342809
* [SelectionDAG] replace duplicated peekThroughBitcast helper functions; NFCISanjay Patel2018-09-201-38/+19
| | | | | | | | | | | | | | x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant. Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code. No functional change intended. I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps. Differential Revision: https://reviews.llvm.org/D52285 llvm-svn: 342669
* [SelectionDAG] allow vector types with isBitwiseNot()Sanjay Patel2018-09-191-3/+1
| | | | | | | The test diff in not-and-simplify.ll is from a use in SimplifyDemandedBits, and the test diff in add.ll is from a DAGCombiner transform. llvm-svn: 342594
* [DAGCombiner][x86] add transform/hook to decompose integer multiply into ↵Sanjay Patel2018-09-191-0/+26
| | | | | | | | | | | | | | | | | | | | | shift/add This is an alternative to D37896. I don't see a way to decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some duplicate code that overlaps with this transform. As a first step, we're only getting the most clear wins on the vector examples requested in PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 As noted in the code comment, it's likely that the x86 constraints are tighter than necessary, but it may not always be a win to replace a pmullw/pmulld. Differential Revision: https://reviews.llvm.org/D52195 llvm-svn: 342554
* Revert "Revert r342183 "[DAGCombine] Fix crash when store merging created an ↵Amara Emerson2018-09-171-1/+10
| | | | | | | | extract_subvector with invalid index."" Fixed the assertion failure. llvm-svn: 342397
* [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x)Sanjay Patel2018-09-161-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040. Copying the motivational statement by @evandro from that patch: "This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp, 447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. Otherwise, no regressions on x86-64 or A64." I'm proposing to add only the minimum support for a DAG node here. Since we don't have an LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I don't think we need to worry about DAG builder, legalization, a strict variant, etc. We should be able to expand as needed when adding more functionality/transforms. For reference, these are transform suggestions currently listed in SimplifyLibCalls.cpp: // * cbrt(expN(X)) -> expN(x/3) // * cbrt(sqrt(x)) -> pow(x,1/6) // * cbrt(cbrt(x)) -> pow(x,1/9) Also, given that we bail out on long double for now, there should not be any logical differences between platforms (unless there's some platform out there that has pow() but not cbrt()). Differential Revision: https://reviews.llvm.org/D51753 llvm-svn: 342348
* Revert r342183 "[DAGCombine] Fix crash when store merging created an ↵Reid Kleckner2018-09-141-8/+1
| | | | | | | | | extract_subvector with invalid index." Causes 'isVector() && "Invalid vector type!"' assertion when building Skia in Chrome. llvm-svn: 342265
* [DAGCombine] Fix crash when store merging created an extract_subvector with ↵Amara Emerson2018-09-131-1/+8
| | | | | | | | invalid index. Differential Revision: https://reviews.llvm.org/D51831 llvm-svn: 342183
* [DAGCombiner] improve formatting for select+setcc code; NFCSanjay Patel2018-09-121-17/+15
| | | | llvm-svn: 342095
* [DAGCombiner] foldBitcastedFPLogic - Add basic vector supportSimon Pilgrim2018-09-071-8/+8
| | | | | | | | Add support for bitcasts from float type to an integer type of the same element bitwidth. There maybe cases where we need to support different widths (e.g. as SSE __m128i is treated as v2i64) - but I haven't seen cases of this in the wild yet. llvm-svn: 341652
* [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))Sanjay Patel2018-09-051-0/+41
| | | | | | | | | | | | | | | | | | | | | This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. Here, we only do the transform when the target tells us that sqrt can be lowered with inline code. This is the basic case. Some potential enhancements are in the TODO comments: 1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper). 2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF. 3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right). Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should also be covered by existing tests). Differential Revision: https://reviews.llvm.org/D51630 llvm-svn: 341481
* [DAGCombiner] Fix bad identation. NFCCraig Topper2018-08-301-1/+1
| | | | llvm-svn: 341103
* [DAGCombiner] Add X / X -> 1 & X % X -> 0 foldsSimon Pilgrim2018-08-291-1/+18
| | | | | | | | Adds more divrem folds to try and get in sync with InstructionSimplify Differential Revision: https://reviews.llvm.org/D50636 llvm-svn: 340919
* [DAGCombine] Rework MERGE_VALUES to inline in single pass. NFCI.Nirav Dave2018-08-281-1/+4
| | | | | | | Avoid hyperlinear cost of inlining MERGE_VALUE node by constructing temporary vector and doing a single replacement. llvm-svn: 340853
* [DAGCombiner][AMDGPU][Mips] Fold bitcast with volatile loads if the ↵Craig Topper2018-08-281-3/+12
| | | | | | | | | | | | | | | | | | | resulting load is legal for the target. Summary: I'm not sure if this patch is correct or if it needs more qualifying somehow. Bitcast shouldn't change the size of the load so it should be ok? We already do something similar for stores. We'll change the type of a volatile store if the resulting store is Legal or Custom. I'm not sure we should be allowing Custom there... I was playing around with converting X86 atomic loads/stores(except seq_cst) into regular volatile loads and stores during lowering. This would allow some special RMW isel patterns in X86InstrCompiler.td to be removed. But there's some floating point patterns in there that didn't work because we don't fold (f64 (bitconvert (i64 volatile load))) or (f32 (bitconvert (i32 volatile load))). Reviewers: efriedma, atanasyan, arsenm Reviewed By: efriedma Subscribers: jvesely, arsenm, sdardis, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, arichardson, jrtc27, atanasyan, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50491 llvm-svn: 340797
* DAG: Check transformed type for forming fminnum/fmaxnum from vselectMatt Arsenault2018-08-271-2/+3
| | | | | | Follow up to r340655 to fix vector types which are split. llvm-svn: 340766
* [SelectionDAG] add helper query for binops; NFCSanjay Patel2018-08-271-11/+2
| | | | | | We will also use this in a planned enhancement for vector insertelement. llvm-svn: 340741
* [SelectionDAG][x86] turn insertelement into undef with variable index into splatSanjay Patel2018-08-261-3/+10
| | | | | | | | | | | | | | | | | | I noticed this along with the patterns in D51125, but when the index is variable, we don't convert insertelement into a build_vector. For x86, that means these get expanded at legalization time into the loading/spilling code that we see in the tests. I think it's always better to avoid going to memory on these, and we get the optimal 'broadcast' if it's available. I suspect other targets may want to look at enabling the hook. AArch64 and AMDGPU have regression tests that would be affected (although I did not check what would happen in those cases). In the most basic cases shown here, AArch64 would probably do much better with a splat. Differential Revision: https://reviews.llvm.org/D51186 llvm-svn: 340705
* DAG: Allow matching fminnum/fmaxnum from vselectMatt Arsenault2018-08-241-8/+27
| | | | llvm-svn: 340655
* [DAGCombiner][Mips] Don't combine bitcast+store after LegalOperations when ↵Craig Topper2018-08-241-1/+1
| | | | | | | | | | | | | | the store is volatile, if the resulting store isn't Legal Previously we allowed the store to be Custom. But without knowing for sure that the Custom handling won't split the store, we shouldn't convert a volatile store. We also probably shouldn't be creating a store the requires custom handling after LegalizeOps. This could lead to an infinite loop if the custom handling was to insert a bitcast. Though I guess isStoreBitCastBeneficial could be used to block such a loop. The test changes here are due to the volatile part of this. The stores in the test are all volatile and i32 stores are marked custom, So we are no longer converting them This is related to D50491 where I was trying to allow some bitcasting of volatile loads Differential Revision: https://reviews.llvm.org/D50578 llvm-svn: 340626
* [DAGCombiner] Reduce load widths of shifted masksSam Parker2018-08-211-8/+41
| | | | | | | | | | | During combining, ReduceLoadWdith is used to combine AND nodes that mask loads into narrow loads. This patch allows the mask to be a shifted constant. This results in a narrow load which is then left shifted to compensate for the new offset. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 340261
* [DAGCombiner] Allow divide by constant optimization on opaque constants.Craig Topper2018-08-181-2/+2
| | | | | | | | | | | | | | | | | Summary: I believe this restores the behavior we had before r339147. Fixes PR38622. Reviewers: RKSimon, chandlerc, spatel Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50936 llvm-svn: 340120
* [DAGCombiner] extractShiftForRotate - fix out of range shift issueSimon Pilgrim2018-08-171-2/+2
| | | | | | | | | Don't just check for negative shift amounts. Fixes OSS Fuzz #9935 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9935 llvm-svn: 340015
* [DAGCombine] Improve (sra (sra x, c1), c2) -> (sra x, (add c1, c2)) foldingSimon Pilgrim2018-08-171-17/+16
| | | | | | | | Add support for cases where only some c1+c2 results exceed the max bitshift, clamping accordingly. Differential Revision: https://reviews.llvm.org/D35722 llvm-svn: 340010
* [DAGCombiner] Don't reassociate operations that have the vector reduction ↵Craig Topper2018-08-161-9/+13
| | | | | | | | | | | | | | flag set. When nodes are reassociated the vector-reduction flag gets lost. The test case is here is what would happen if you had a sum of absolute differences loop that started with a non-zero but contant sum and that loop was unrolled. The vectorizer will generate a constant vector for the initial value. And DAGCombiner reassociate tries to move it down the addition tree erasing the vector-reduction flag. Interestingly this moves constants the opposite direction of the reassociate IR pass. I've chosen to just punt on the reassociate, but I suppose we could maybe preserve the flag if both nodes have it set. Differential Revision: https://reviews.llvm.org/D50827 llvm-svn: 339946
* [DagCombiner] Don't bother adding to the work list if TLI.BuildSDIVPow2 ↵Simon Pilgrim2018-08-151-4/+6
| | | | | | | | failed. NFCI. Matches the code in BuildSDIV/BuildUDIV llvm-svn: 339757
* [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.Eli Friedman2018-08-141-2/+3
| | | | | | | | | | | | | | | | | | | | | Intentionally excluding nodes from the DAGCombine worklist is likely to lead to weird optimizations and infinite loops, so it's generally a bad idea. To avoid the infinite loops, fix DAGCombine to use the isDesirableToCommuteWithShift target hook before performing the transforms in question, and implement the target hook in the ARM backend disable the transforms in question. Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a reduced testcase for that bug. But we should have sufficient test coverage for PerformSHLSimplify given that we're not playing weird tricks with the worklist. I can try to bugpoint it if necessary, though.) Differential Revision: https://reviews.llvm.org/D50667 llvm-svn: 339734
* [DAG] Avoid redundant chain transversal in store merge cycle check. NFCI.Nirav Dave2018-08-141-1/+2
| | | | | | Patch by Henric Karlsson. llvm-svn: 339688
OpenPOWER on IntegriCloud