summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [DAGCombiner] make sure we have a whole-number extract before trying to ↵Sanjay Patel2018-11-011-1/+5
| | | | | | | | | | | | narrow a vector op (PR39511) The test causes a crash because we were trying to extract v4f32 to v3f32, and the narrowing factor was then 4/3 = 1 producing a bogus narrow type. This should fix: https://bugs.llvm.org/show_bug.cgi?id=39511 llvm-svn: 345842
* [DAGCombiner] Fold 0 div/rem X to 0David Bolvansky2018-10-311-2/+5
| | | | | | | | | | | | Reviewers: RKSimon, spatel, javed.absar, craig.topper, t.p.northover Reviewed By: RKSimon Subscribers: craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52504 llvm-svn: 345721
* [DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoadBjorn Pettersson2018-10-301-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Normalize the offset for endianess before checking if the store cover the load in ForwardStoreValueToDirectLoad. Without this we missed out on some optimizations for big endian targets. If for example having a 4 bytes store followed by a 1 byte load, loading the least significant byte from the store, the STCoversLD check would fail (see @test4 in test/CodeGen/AArch64/load-store-forwarding.ll). This patch also fixes a problem seen in an out-of-tree target. The target has i40 as a legal type, it is big endian, and the StoreSize for i40 is 48 bits. So when normalizing the offset for endianess we need to take the StoreSize into account (assuming that padding added when storing into a larger StoreSize always is added at the most significant end). Reviewers: niravd Reviewed By: niravd Subscribers: javed.absar, kristof.beyls, llvm-commits, uabelho Differential Revision: https://reviews.llvm.org/D53776 llvm-svn: 345636
* [DAGCombiner] narrow vector binops when extraction is cheapSanjay Patel2018-10-301-11/+30
| | | | | | | | | | | | | | | | | Narrowing vector binops came up in the demanded bits discussion in D52912. I don't think we're going to be able to do this transform in IR as a canonicalization because of the risk of creating unsupported widths for vector ops, but we already have a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression test diffs). This is artificially limited to not look through bitcasts because there are so many test diffs already, but that's marked with a TODO and is a small follow-up. Differential Revision: https://reviews.llvm.org/D53784 llvm-svn: 345602
* [DAGCombiner] Improve X div/rem Y fold if single bit element typeDavid Bolvansky2018-10-301-3/+4
| | | | | | | | | | | | | | Summary: Tests by @spatel, thanks Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: sdardis, atanasyan, llvm-commits, spatel Differential Revision: https://reviews.llvm.org/D52668 llvm-svn: 345575
* [DAGCombiner] Better constant vector support for FCOPYSIGN.Craig Topper2018-10-281-4/+4
| | | | | | | | Enable constant folding when both operands are vectors of constants. Turn into FNEG/FABS when the RHS is a splat constant vector. llvm-svn: 345469
* [DAGCombiner] rearrange code in narrowExtractedVectorBinOp(); NFCSanjay Patel2018-10-261-22/+24
| | | | | | | We can extend this code to handle many more cases if an extract is cheap, so prepping for that change. llvm-svn: 345430
* [NFC] Rename minnan and maxnan to minimum and maximumThomas Lively2018-10-241-6/+6
| | | | | | | | | | | | | | | Summary: Changes all uses of minnan/maxnan to minimum/maximum globally. These names emphasize that the semantic difference between these operations is more than just NaN-propagation. Reviewers: arsenm, aheejin, dschuff, javed.absar Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53112 llvm-svn: 345218
* [SelectionDAG] DAG combiner for fminnan and fmaxnanThomas Lively2018-10-241-20/+20
| | | | | | | | | | | | Summary: Depends on D52765. Reviewers: aheejin, dschuff Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52768 llvm-svn: 345210
* [DAG] check more operands for cycles when merging stores.Tim Northover2018-10-241-8/+8
| | | | | | | | | | | | | | | | | Until now, we've only checked whether merging stores would cause a cycle via the value argument, but the address and indexed offset arguments are also capable of creating cycles in some situations. The addresses are all base+offset with notionally the same base, but the base SDNode may still be different (e.g. via an indexed load in one case, and an ISD::ADD elsewhere). This allows cycles to creep in if one of these sources depends on another. The indexed offset is usually undef (representing a non-indexed store), but on some architectures (e.g. 32-bit ARM-mode ARM) it can be an arbitrary value, again allowing dependency cycles to creep in. llvm-svn: 345200
* SelectionDAG: Reuse bigger sized constants in memset expansion.Matthias Braun2018-10-231-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | When implementing memset's today we often see this pattern: $x0 = MOV 0xXYXYXYXYXYXYXYXY store $x0, ... $w1 = MOV 0xXYXYXYXY store $w1, ... We first create a 64bit constant in a 64bit register with all bytes the same and then create a 32bit constant with all bytes the same in a 32bit register. In many targets we could just access the lower byte of the 64bit register instead. - Ideally this would be handled by the ConstantHoist pass but it runs too early when memset isn't expanded yet. - The memset expansion code already had this optimization implemented, however SelectionDAG constantfolding would constantfold the "trunc(bigconstnat)" pattern to "smallconstant". - This patch makes the memset expansion mark the constant as Opaque and stop DAGCombiner from constant folding in this situation. (Similar to how ConstantHoisting marks things as Opaque to avoid folding ADD/SUB/etc.) Differential Revision: https://reviews.llvm.org/D53181 llvm-svn: 345102
* Recommit r344877 "[X86] Stop promoting integer loads to vXi64"Craig Topper2018-10-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-221-0/+11
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016)Sanjay Patel2018-10-211-4/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a late backend subset of the IR transform added with: D52439 We can confirm that the conversion to a 'trunc' is correct by running: $ opt -instcombine -data-layout="e" (assuming the IR transforms are correct; change "e" to "E" for big-endian) As discussed in PR39016: https://bugs.llvm.org/show_bug.cgi?id=39016 ...the pattern may emerge during legalization, so that's we are waiting for an insertelement to become a scalar_to_vector in the pattern matching here. The DAG allows for fun variations that are not possible in IR. Result types for extracts and scalar_to_vector don't necessarily match input types, so that means we have to be a bit more careful in the transform (see code comments). The tests show that we don't handle cases that require a shift (as we did in the IR version). I've left that as a potential follow-up because I'm not sure if that's a real concern at this late stage. Differential Revision: https://reviews.llvm.org/D53201 llvm-svn: 344872
* [DAGCombiner] allow undef elts in vector fmul matchingSanjay Patel2018-10-151-1/+1
| | | | llvm-svn: 344534
* [DAGCombiner] refactor folds for fadd (fmul X, -2.0), Y; NFCISanjay Patel2018-10-151-16/+18
| | | | | | The transform doesn't work if the vector constant has undef elements. llvm-svn: 344532
* [DAGCombiner] allow undef elts in vector fma matchingSanjay Patel2018-10-151-21/+22
| | | | llvm-svn: 344528
* [DAGCombiner] allow undef elts in vector fma matchingSanjay Patel2018-10-151-9/+10
| | | | llvm-svn: 344525
* [DAGCombiner] rearrange extract_element+bitcast fold; NFCSanjay Patel2018-10-111-6/+8
| | | | | | | | | | I want to add another pattern here that includes scalar_to_vector, so this makes that patch smaller. I was hoping to remove the hasOneUse() check because it shouldn't be necessary for common codegen, but an AMDGPU test has a comment suggesting that the extra check makes things better on one of those targets. llvm-svn: 344320
* [DAG] Fix Big Endian in Load-Store forwardingNirav Dave2018-10-111-0/+5
| | | | | | | | | | | | | | Summary: Correct offset calculation in load-store forwarding for big-endian targets. Reviewers: rnk, RKSimon, waltl Subscribers: sdardis, nemanjai, hiraditya, jrtc27, atanasyan, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D53147 llvm-svn: 344272
* [DAGCombiner] move comment closer to the corresponding code; NFC Sanjay Patel2018-10-111-2/+1
| | | | llvm-svn: 344255
* [DAGCombine] Improve Load-Store ForwardingNirav Dave2018-10-101-11/+134
| | | | | | | | | | | | | | | | | | Summary: Extend analysis forwarding loads from preceeding stores to work with extended loads and truncated stores to the same address so long as the load is fully subsumed by the store. Hexagon's swp-epilog-phis.ll and swp-memrefs-epilog1.ll test are deleted as they've no longer seem to be relevant. Reviewers: RKSimon, rnk, kparzysz, javed.absar Subscribers: sdardis, nemanjai, hiraditya, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D49200 llvm-svn: 344142
* [DAGCombiner] Expand combining of FP logical ops to sign-setting FP opsNemanja Ivanovic2018-10-091-3/+14
| | | | | | | | | | | | | | | | | We already do the following combines: (bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X (bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X When the target has "bit preserving fp logic". This patch just extends it to also combine: (bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X) As some targets have fnabs and even those that don't can efficiently lower both the fabs and the fneg. Differential revision: https://reviews.llvm.org/D44548 llvm-svn: 344093
* [DAGCombiner] simplify code for fmul with constant fold; NFCISanjay Patel2018-10-081-18/+8
| | | | llvm-svn: 343997
* [DAGCombiner] allow undef elts in vector fadd matchingSanjay Patel2018-10-071-1/+1
| | | | llvm-svn: 343945
* [DAGCombiner] allow undefs when matching vector splats for fmul foldsSanjay Patel2018-10-071-2/+2
| | | | llvm-svn: 343942
* [DAGCombiner] allow undef elts in vector fabs/fneg matchingSanjay Patel2018-10-071-1/+1
| | | | | | | | This change is proposed as a part of D44548, but we need this independently to avoid regressions from improved undef propagation in SimplifyDemandedVectorElts(). llvm-svn: 343940
* [DAGCombiner] shorten code for bitcast+fabs fold; NFCSanjay Patel2018-10-071-5/+2
| | | | llvm-svn: 343939
* [SelectionDAG] allow undefs when matching splat constantsSanjay Patel2018-10-051-4/+5
| | | | | | | | | And use that to transform fsub with zero constant operands. The integer part isn't used yet, but it is proposed for use in D44548, so adding both enhancements here makes that patch simpler. llvm-svn: 343865
* DAGCombiner: StoreMerging: Fix bad index calculating when adjusting ↵Matthias Braun2018-10-011-17/+8
| | | | | | | | | | | | | | | | mismatching vector types This fixes a case of bad index calculation when merging mismatching vector types. This changes the existing code to just use the existing extract_{subvector|element} and a bitcast (instead of bitcast first and then newly created extract_xxx) so we don't need to adjust any indices in the first place. rdar://44584718 Differential Revision: https://reviews.llvm.org/D52681 llvm-svn: 343493
* [DAG] Don't perform SINT_TO_FP<->UINT_TO_FP custom conversion after legalizationSimon Pilgrim2018-09-301-4/+4
| | | | | | | | The SINT_TO_FP<->UINT_TO_FP combines for non-negative integers should only occur for legal ops once LegalOperations = true No test case to hand, noticed when investigating PR38226 + PR38970 llvm-svn: 343405
* [DAGCombiner] [NFC] Improve X div/rem 1 foldDavid Bolvansky2018-09-281-8/+5
| | | | | | | | | | | | Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52661 llvm-svn: 343349
* llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)Fangrui Song2018-09-271-6/+4
| | | | | | | | | | | | Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163
* [DAGCombiner] Remove unnecessary check for visitSDIVLike/visitUDIVLike ↵Craig Topper2018-09-251-2/+1
| | | | | | | | returning a UDIVREM or SDIVREM node. This shouldn't be possible and is a leftover from when we used to recursively call combine here. llvm-svn: 343049
* [DAGCombine] Improve Predecessor check in SimplifySelectOps. NFCI.Nirav Dave2018-09-251-4/+36
| | | | | | | Reuse search space bookkeeping across multiple predecessor checks qdone to avoid redundancy. This should cut search cost by ~4x. llvm-svn: 342984
* [DAGCombine] Share predecessor bookkeeping in CombineToPostIndexedLoadStore. ↵Nirav Dave2018-09-251-2/+9
| | | | | | NFCI. llvm-svn: 342983
* [DAGCombine] Don't fold dependent loads across SELECT_CC.Nirav Dave2018-09-251-4/+5
| | | | | | | | | | | | | | | | | | DAGCombine will try to fold two loads that feed a SELECT or SELECT_CC after the select, resulting in a select of an address and a single load after. If either of the loads depend on the other, this is not legal as it could introduce cycles. However, it only checked this if the opcode was a SELECT, and not for a SELECT_CC. Unfortunately, the only reproducer I have for this is for our downstream target. I've tried getting it to trigger on an upstream one but haven't been successful. Patch thanks to Bevin Hansson. llvm-svn: 342980
* [DAGCombiner] use UADDO to optimize saturated unsigned addSanjay Patel2018-09-241-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a preliminary step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 If we have an 'add' instruction that sets flags, we can use that to eliminate an explicit compare instruction or some other instruction (cmn) that sets flags for use in the later select. As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively reversing an IR icmp canonicalization that replaces a variable operand with a constant: https://rise4fun.com/Alive/V1Q But we're not using 'uaddo' in those cases via DAG transforms. This happens in CGP after D8889 without checking target lowering to see if the op is supported. So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with "using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned saturated add and converts to uaddo without checking target capabilities. This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title (unlike x86 which sees improvements for all sizes because all sizes are 'custom'). But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, so this patch will fire on those tests. Another possibility given the existing behavior: we could remove the legal-or-custom check altogether because we're assuming that a UADDO sequence is canonical/optimal before we ever reach here. But that seems like a bug to me. If the target doesn't have an add-with-flags op, then it's not likely that we'll get optimal DAG combining using a UADDO node. This is similar justification for why we don't canonicalize IR to the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first place. Differential Revision: https://reviews.llvm.org/D51929 llvm-svn: 342886
* Remove debug printf leftover from r342397Hans Wennborg2018-09-241-2/+0
| | | | llvm-svn: 342863
* [DAGCombiner] Remove some dead code from ConstantFoldBITCASTofBUILD_VECTORCraig Topper2018-09-241-9/+2
| | | | | | This code handled SCALAR_TO_VECTOR being returned by the recursion, but the code that used to return SCALAR_TO_VECTOR was removed in 2015. llvm-svn: 342856
* [DAGCombiner] Clarify a comment. NFCCraig Topper2018-09-231-2/+4
| | | | | | This comment was misleading about why we were restricting to before legalize types. The reason given would only apply to before legalize ops. But there is a before legalize types reason that should also be listed. llvm-svn: 342851
* [DAGCombiner][x86] extend decompose of integer multiply into shift/add with ↵Sanjay Patel2018-09-231-6/+13
| | | | | | | | | | | | | | | | | negation This is an alternative to https://reviews.llvm.org/D37896. We can't decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some existing code that overlaps with this transform. This extends D52195 and may resolve PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 (still an open question about transforming legal vector multiplies, but we could open another bug report for those) llvm-svn: 342844
* [DAGCombiner] Simplify some code in visitBITCAST. NFCICraig Topper2018-09-221-9/+3
| | | | llvm-svn: 342826
* [DAGCombiner] Rewrite r331896 in a different way to address a FIXME. NFCICraig Topper2018-09-221-11/+14
| | | | llvm-svn: 342809
* [SelectionDAG] replace duplicated peekThroughBitcast helper functions; NFCISanjay Patel2018-09-201-38/+19
| | | | | | | | | | | | | | x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant. Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code. No functional change intended. I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps. Differential Revision: https://reviews.llvm.org/D52285 llvm-svn: 342669
* [SelectionDAG] allow vector types with isBitwiseNot()Sanjay Patel2018-09-191-3/+1
| | | | | | | The test diff in not-and-simplify.ll is from a use in SimplifyDemandedBits, and the test diff in add.ll is from a DAGCombiner transform. llvm-svn: 342594
* [DAGCombiner][x86] add transform/hook to decompose integer multiply into ↵Sanjay Patel2018-09-191-0/+26
| | | | | | | | | | | | | | | | | | | | | shift/add This is an alternative to D37896. I don't see a way to decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some duplicate code that overlaps with this transform. As a first step, we're only getting the most clear wins on the vector examples requested in PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 As noted in the code comment, it's likely that the x86 constraints are tighter than necessary, but it may not always be a win to replace a pmullw/pmulld. Differential Revision: https://reviews.llvm.org/D52195 llvm-svn: 342554
* Revert "Revert r342183 "[DAGCombine] Fix crash when store merging created an ↵Amara Emerson2018-09-171-1/+10
| | | | | | | | extract_subvector with invalid index."" Fixed the assertion failure. llvm-svn: 342397
* [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x)Sanjay Patel2018-09-161-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040. Copying the motivational statement by @evandro from that patch: "This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp, 447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. Otherwise, no regressions on x86-64 or A64." I'm proposing to add only the minimum support for a DAG node here. Since we don't have an LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I don't think we need to worry about DAG builder, legalization, a strict variant, etc. We should be able to expand as needed when adding more functionality/transforms. For reference, these are transform suggestions currently listed in SimplifyLibCalls.cpp: // * cbrt(expN(X)) -> expN(x/3) // * cbrt(sqrt(x)) -> pow(x,1/6) // * cbrt(cbrt(x)) -> pow(x,1/9) Also, given that we bail out on long double for now, there should not be any logical differences between platforms (unless there's some platform out there that has pow() but not cbrt()). Differential Revision: https://reviews.llvm.org/D51753 llvm-svn: 342348
* Revert r342183 "[DAGCombine] Fix crash when store merging created an ↵Reid Kleckner2018-09-141-8/+1
| | | | | | | | | extract_subvector with invalid index." Causes 'isVector() && "Invalid vector type!"' assertion when building Skia in Chrome. llvm-svn: 342265
OpenPOWER on IntegriCloud