summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] reduce more checks for power-of-2-or-zero using ctpopSanjay Patel2019-07-011-1/+7
| | | | | | | | | Extends the transform from: rL364341 ...to include another (more common?) pattern that tests whether a value is a power-of-2 (including or excluding zero). llvm-svn: 364856
* [InstCombine] (Y + ~X) + 1 --> Y - X fold (PR42459)Roman Lebedev2019-07-011-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: To be noted, this pattern is not unhandled by instcombine per-se, it is somehow does end up being folded when one runs opt -O3, but not if it's just -instcombine. Regardless, that fold is indirect, depends on some other folds, and is thus blind when there are extra uses. This does address the regression being exposed in D63992. https://godbolt.org/z/7DGltU https://rise4fun.com/Alive/EPO0 Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42459 | PR42459 ]] Reviewers: spatel, nikic, huihuiz Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63993 llvm-svn: 364792
* [InstCombine] Shift amount reassociation in bittest (PR42399)Roman Lebedev2019-07-011-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Given pattern: `icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0` we should move shifts to the same hand of 'and', i.e. rewrite as `icmp eq/ne (and (x shift (Q+K)), y), 0` iff `(Q+K) u< bitwidth(x)` It might be tempting to not restrict this to situations where we know we'd fold two shifts together, but i'm not sure what rules should there be to avoid endless combine loops. We pick the same shift that was originally used to shift the variable we picked to shift: https://rise4fun.com/Alive/6x1v Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42399 | PR42399]]. Reviewers: spatel, nikic, RKSimon Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63829 llvm-svn: 364791
* [InstCombine] Omit 'urem' where possibleRoman Lebedev2019-07-011-4/+20
| | | | | | | | This was added in D63390 / rL364286 to backend, but it makes sense to also handle it in middle-end. https://rise4fun.com/Alive/Zsln llvm-svn: 364738
* [InstCombine] canonicalize fcmp+select to minnum/maxnum intrinsicsSanjay Patel2019-06-301-0/+13
| | | | | | | | | | | | This is the opposite direction of D62158 (we have to choose 1 form or the other). Now that we have FMF on the select, this becomes more palatable. And the benefits of having a single IR instruction for this operation (less chances of missing folds based on extra uses, etc) overcome my previous comments about the potential advantage of larger pattern matching/analysis. Differential Revision: https://reviews.llvm.org/D62414 llvm-svn: 364721
* [InstCombine] Shift amount reassociation (PR42391)Roman Lebedev2019-06-291-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Given pattern: `(x shiftopcode Q) shiftopcode K` we should rewrite it as `x shiftopcode (Q+K)` iff `(Q+K) u< bitwidth(x)` This is valid for any shift, but they must be identical. * https://rise4fun.com/Alive/9E2 * exact on both lshr => exact https://rise4fun.com/Alive/plHk * exact on both ashr => exact https://rise4fun.com/Alive/QDAA * nuw on both shl => nuw https://rise4fun.com/Alive/5Uk * nsw on both shl => nsw https://rise4fun.com/Alive/0plg Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42391 | PR42391]]. Reviewers: spatel, nikic, RKSimon Reviewed By: nikic Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63812 llvm-svn: 364712
* [InstCombine] simplify code for inserts -> splat; NFCSanjay Patel2019-06-261-25/+17
| | | | llvm-svn: 364441
* [InstCombine] Simplify icmp ult/uge (shl %x, C2), C1 iff C1 is power of two ↵Huihui Zhang2019-06-251-0/+21
| | | | | | | | | | | | | | | | | | | | | | | -> icmp eq/ne (and %x, (lshr -C1, C2)), 0. Simplify 'shl' inequality test into 'and' equality test. This pattern happens in the middle-end while simplifying bitfield access, Exposed in https://reviews.llvm.org/D63505 https://rise4fun.com/Alive/6uz Reviewers: lebedev.ri, efriedma Reviewed By: lebedev.ri Subscribers: spatel, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63675 llvm-svn: 364348
* [InstCombine] reduce checks for power-of-2-or-zero using ctpopSanjay Patel2019-06-251-12/+10
| | | | | | | | | | | | | | | | | | | | | | | This follows up the transform from rL363956 to use the ctpop intrinsic when checking for power-of-2-or-zero. This is matching the isPowerOf2() patterns used in PR42314: https://bugs.llvm.org/show_bug.cgi?id=42314 But there's at least 1 instcombine follow-up needed to match the alternate form: (v & (v - 1)) == 0; We should have all of the backend expansions handled with: rL364319 (x86-specific changes still needed for optimal code based on subtarget) And the larger patterns to exclude zero as a power-of-2 are joining with this change after: rL364153 ( D63660 ) rL364246 Differential Revision: https://reviews.llvm.org/D63777 llvm-svn: 364341
* [InstCombine] Fold icmp eq/ne (and %x, C), 0 iff (-C) is power of two -> %x ↵Huihui Zhang2019-06-251-11/+9
| | | | | | | | | | | | | | | | | | | | | | | | | u</u>= (-C) earlier. Summary: To generate simplified IR, make sure fold (X & ~C) ==/!= 0 --> X u</u>= C+1 is scheduled before fold ((X << Y) & C) == 0 -> (X & (C >> Y)) == 0. https://rise4fun.com/Alive/7ZN Reviewers: lebedev.ri, efriedma, spatel, craig.topper Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63505 llvm-svn: 364255
* [InstCombine] squash is-not-power-of-2 using ctpopSanjay Patel2019-06-241-5/+18
| | | | | | | | | | | | | | | | This is the Demorgan'd 'not' of the pattern handled in: D63660 / rL364153 This is another intermediate IR step towards solving PR42314: https://bugs.llvm.org/show_bug.cgi?id=42314 We can test if a value is not a power-of-2 using ctpop(X) > 1, so combining that with an is-zero check of the input is the same as testing if not exactly 1 bit is set: (X == 0) || (ctpop(X) u> 1) --> ctpop(X) != 1 llvm-svn: 364246
* InstCombine: Preserve nuw when reassociating nuw ops [3/3]Matt Arsenault2019-06-241-15/+27
| | | | | | Alive says this is OK. llvm-svn: 364235
* InstCombine: Preserve nuw when reassociating nuw ops [2/3]Matt Arsenault2019-06-241-2/+10
| | | | | | Alive says this is OK. llvm-svn: 364234
* InstCombine: Preserve nuw when reassociating nuw ops [1/3]Matt Arsenault2019-06-241-4/+14
| | | | | | Alive says this is OK. llvm-svn: 364233
* [InstCombine] reduce funnel-shift i16 X, X, 8 to bswap XSanjay Patel2019-06-241-0/+7
| | | | | | | | | | Prefer the more exact intrinsic to remove a use of the input value and possibly make further transforms easier (we will still need to match patterns with funnel-shift of wider types as pieces of bswap, especially if we want to canonicalize to funnel-shift with constant shift amount). Discussed in D46760. llvm-svn: 364187
* [InstCombine] SliceUpIllegalIntegerPHI - bail on out of range shiftsSimon Pilgrim2019-06-241-0/+5
| | | | | | | | trunc(lshr) handling - if the shift is out of range (undefined) then bail like we do for non-constant shifts. Fixes OSS Fuzz #15217 llvm-svn: 364181
* [InstCombine] squash is-power-of-2 that uses ctpopSanjay Patel2019-06-231-0/+23
| | | | | | | | | | | | | | | This is another intermediate IR step towards solving PR42314: https://bugs.llvm.org/show_bug.cgi?id=42314 We can test if a value is power-of-2-or-0 using ctpop(X) < 2, so combining that with a non-zero check of the input is the same as testing if exactly 1 bit is set: (X != 0) && (ctpop(X) u< 2) --> ctpop(X) == 1 Differential Revision: https://reviews.llvm.org/D63660 llvm-svn: 364153
* [InstCombine] (1 << (C - x)) -> ((1 << C) >> x) if C is bitwidth - 1David Bolvansky2019-06-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: ``` %a = sub i32 31, %x %r = shl i32 1, %a => %d = shl i32 1, 31 %r = lshr i32 %d, %x Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/btZm Reviewers: spatel, lebedev.ri, nikic Reviewed By: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63652 llvm-svn: 364073
* [InstCombine] cttz(abs(x)) -> cttz(x)David Bolvansky2019-06-211-4/+15
| | | | | | | | | | | | Summary: Signedness does not change number of trailing zeros. Reviewers: spatel, lebedev.ri, nikic Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D63546 llvm-svn: 364064
* [InstCombine] fix typo in comment; NFCSanjay Patel2019-06-201-1/+1
| | | | llvm-svn: 363974
* [InstCombine] canonicalize check for power-of-2Sanjay Patel2019-06-201-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The form that compares against 0 is better because: 1. It removes a use of the input value. 2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2 3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS). This is a root cause for PR42314, but probably doesn't completely answer the codegen request: https://bugs.llvm.org/show_bug.cgi?id=42314 Alive proof: https://rise4fun.com/Alive/9kG Name: is power-of-2 %neg = sub i32 0, %x %a = and i32 %neg, %x %r = icmp eq i32 %a, %x => %dec = add i32 %x, -1 %a2 = and i32 %dec, %x %r = icmp eq i32 %a2, 0 Name: is not power-of-2 %neg = sub i32 0, %x %a = and i32 %neg, %x %r = icmp ne i32 %a, %x => %dec = add i32 %x, -1 %a2 = and i32 %dec, %x %r = icmp ne i32 %a2, 0 llvm-svn: 363956
* [InstCombine] cttz(-x) -> cttz(x)David Bolvansky2019-06-201-0/+6
| | | | | | | | | | | | | | | | Summary: Signedness does not change number of trailing zeros. Reviewers: spatel, lebedev.ri, nikic Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63534 llvm-svn: 363951
* [InstCombine] Fold icmp eq/ne (and %x, signbit), 0 -> %x s>=/s< 0 earlierHuihui Zhang2019-06-191-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: To generate simplified IR, make sure fold ``` (X & signbit) ==/!= 0) -> X s>=/s< 0; ``` is scheduled before fold ``` ((X << Y) & C) == 0 -> (X & (C >> Y)) == 0. ``` https://rise4fun.com/Alive/fbdh Reviewers: lebedev.ri, efriedma, spatel, craig.topper Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63026 llvm-svn: 363845
* AMDGPU: Fold readlane/readfirstlane callsMatt Arsenault2019-06-171-0/+24
| | | | llvm-svn: 363587
* AMDGPU: Fold readlane intrinsics of constantsMatt Arsenault2019-06-141-0/+7
| | | | | | | | I'm not 100% sure about this, since I'm worried about IR transforms that might end up introducing divergence downstream once replaced with a constant, but I haven't come up with an example yet. llvm-svn: 363406
* [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32Stanislav Mekhanoshin2019-06-131-1/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D63301 llvm-svn: 363339
* [IntrinsicEmitter] Extend argument overloading with forward references.Sander de Smalen2019-06-131-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the mechanism to overload intrinsic arguments by using either backward or forward references to the overloadable arguments. In for example: def int_something : Intrinsic<[LLVMPointerToElt<0>], [llvm_anyvector_ty], []>; LLVMPointerToElt<0> is a forward reference to the overloadable operand of type 'llvm_anyvector_ty' and would allow intrinsics such as: declare i32* @llvm.something.v4i32(<4 x i32>); declare i64* @llvm.something.v2i64(<2 x i64>); where the result pointer type is deduced from the element type of the first argument. If the returned pointer is not a pointer to the element type, LLVM will give an error: Intrinsic has incorrect return type! i64* (<4 x i32>)* @llvm.something.v4i32 Reviewers: RKSimon, arsenm, rnk, greened Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62995 llvm-svn: 363233
* [InstCombine] Handle -(X-Y) --> (Y-X) for unary fneg when NSZCameron McInally2019-06-111-1/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D62612 llvm-svn: 363082
* [InstCombine] Update fptrunc (fneg x)) -> (fneg (fptrunc x) for unary FNegCameron McInally2019-06-111-4/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D62629 llvm-svn: 363080
* [InstCombine] allow unordered preds when canonicalizing to fabs()Sanjay Patel2019-06-101-2/+2
| | | | | | | | | | We have a known-never-nan value via 'nnan', so an unordered predicate is the same as its ordered sibling. Similar to: rL362937 llvm-svn: 362954
* [InstCombine] fix bug in canonicalization to fabs()Sanjay Patel2019-06-101-2/+4
| | | | | | Forgot to translate the predicate clauses in rL362943. llvm-svn: 362945
* [InstCombine] change canonicalization to fabs() to use FMF on fsubSanjay Patel2019-06-101-19/+21
| | | | | | | | | | | | | | Similar to rL362909: This isn't the ideal fix (use FMF on the select), but it's still an improvement until we have better FMF propagation to selects and other FP math operators. I don't think there's much risk of regression from this change by not including the FMF on the fcmp any more. The nsz/nnan FMF should be the same on the fcmp and the fsub because they have the same operand. llvm-svn: 362943
* [InstCombine] allow unordered preds when canonicalizing to fabs()Sanjay Patel2019-06-101-2/+4
| | | | | | | PR42179: https://bugs.llvm.org/show_bug.cgi?id=42179 llvm-svn: 362937
* [InstCombine] foldICmpWithLowBitMaskedVal(): 'icmp sgt/sle': avoid miscompilesRoman Lebedev2019-06-091-0/+8
| | | | | | | | | | | | | A precondition 'x != 0' was forgotten by me: https://rise4fun.com/Alive/JFNP https://rise4fun.com/Alive/jHvL These 4 folds with non-constants could be re-enabled, but for now let's go for the simplest solution. https://bugs.llvm.org/show_bug.cgi?id=42198 llvm-svn: 362911
* [InstCombine] change canonicalization to fabs() to use FMF on fnegSanjay Patel2019-06-091-13/+25
| | | | | | | | | | | | | | | | | This isn't the ideal fix (use FMF on the select), but it's still an improvement until we have better FMF propagation to selects and other FP math operators. I don't think there's much risk of regression from this change by not including the FMF on the fcmp any more. The nsz/nnan FMF should be the same on the fcmp and the fneg (fsub) because they have the same operand. This works around the most glaring FMF logical inconsistency cited in PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 llvm-svn: 362909
* [InstCombine] simplify code for bitcast of insertelement; NFCSanjay Patel2019-06-051-5/+4
| | | | llvm-svn: 362655
* InstCombine: correctly change byval type attribute alongside call args.Tim Northover2019-06-051-4/+20
| | | | | | | | When the byval attribute has a type, it must match the pointee type of any parameter; but InstCombine was not updating the attribute when folding casts of various kinds away. llvm-svn: 362643
* [InstCombine] 'C-(C2-X) --> X+(C-C2)' constant-foldRoman Lebedev2019-05-311-1/+6
| | | | | | | | | | It looks this fold was already partially happening, indirectly via some other folds, but with one-use limitation. No other fold here has that restriction. https://rise4fun.com/Alive/ftR llvm-svn: 362217
* [InstCombine] 'add (sub C1, X), C2 --> sub (add C1, C2), X' constant-foldRoman Lebedev2019-05-311-1/+8
| | | | | | https://rise4fun.com/Alive/qJQ llvm-svn: 362216
* [InstCombine] Avoid use after free in DenseMap, when built with GCCMartin Storsjo2019-05-301-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, this used a statement like this: Map[A] = Map[B]; This is equivalent to the following: const auto &Src = Map[B]; auto &Dest = Map[A]; Dest = Src; The second statement, "auto &Dest = Map[A];" can insert a new element into the DenseMap, which can potentially grow and reallocate the DenseMap's internal storage, which will invalidate the existing reference to the source. When doing the actual assignment, the Src reference is dereferenced, accessing memory that was freed when the DenseMap grew. This issue hasn't shown up when LLVM was built with Clang, because the right hand side ended up dereferenced before evaulating the left hand side. (If the value type is a larger data type, Clang doesn't do this but behaves like GCC.) With GCC, a cast to Value* isn't enough to make it dereference the right hand side reference before invoking operator[] (while that is enough to make Clang/LLVM do the right thing for larger types), but storing it in an intermediate variable in a separate statement works. This fixes PR42065. Differential Revision: https://reviews.llvm.org/D62624 llvm-svn: 362150
* [InstCombine] Optimize always overflowing signed saturating add/subNikita Popov2019-05-291-8/+12
| | | | | | | | | Based on the overflow direction information added in D62463, we can now fold always overflowing signed saturating add/sub to signed min/max. Differential Revision: https://reviews.llvm.org/D62544 llvm-svn: 362006
* [InstCombine] Clean up saturing math overflow optimizations; NFCNikita Popov2019-05-281-29/+20
| | | | | | | Reduce duplication and make it easier to handle signed always-overflows conditions in the future. llvm-svn: 361863
* [ValueTracking][ConstantRange] Distinguish low/high always overflowNikita Popov2019-05-282-3/+4
| | | | | | | | | | | | In order to fold an always overflowing signed saturating add/sub, we need to know in which direction the always overflow occurs. This patch splits up AlwaysOverflows into AlwaysOverflowsLow and AlwaysOverflowsHigh to pass through this information (but it is not used yet). Differential Revision: https://reviews.llvm.org/D62463 llvm-svn: 361858
* [InstCombine] prevent crashing with invalid extractelement indexSanjay Patel2019-05-261-2/+3
| | | | | | | This was found/reduced from a fuzzer report: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14956 llvm-svn: 361729
* [InstCombine] Refactor OptimizeOverflowCheck; NFCINikita Popov2019-05-262-79/+65
| | | | | | | | | Extract method to compute overflow based on binop and signedness, and then make the result handling code generic. This extends the always-overflow handling to signed muls, but has currently no effect, as we don't compute always overflow for them (thus NFC). llvm-svn: 361721
* [InstCombine] Remove OverflowCheckFlavor; NFCNikita Popov2019-05-263-59/+20
| | | | | | | Instead pass binary op and signedness. The extra enum only makes things more complicated in this case. llvm-svn: 361720
* Use the DataLayout::typeSizeEqualsStoreSize helper. NFCBjorn Pettersson2019-05-241-1/+1
| | | | | | | | | Just a minor refactoring to use the new helper method DataLayout::typeSizeEqualsStoreSize(). This is done when checking if getTypeSizeInBits is equal/non-equal to getTypeStoreSizeInBits. llvm-svn: 361613
* [InstSimplify] fold insertelement-of-extractelementSanjay Patel2019-05-241-5/+0
| | | | | | | | This was partly handled in InstCombine (only the constant index case), so delete that and zap it more generally in InstSimplify. llvm-svn: 361576
* [InstCombine] remove redundant fold for extractelement; NFCSanjay Patel2019-05-231-4/+0
| | | | | | | | The out-of-bounds index pattern is handled by InstSimplify, so the extractelement should be eliminated next time it is visited. llvm-svn: 361570
* [InstCombine] remove redundant fold for insertelement; NFCSanjay Patel2019-05-231-4/+0
| | | | | | The out-of-bounds index pattern is handled by InstSimplify. llvm-svn: 361569
OpenPOWER on IntegriCloud