summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine
Commit message (Collapse)AuthorAgeFilesLines
...
* [InstCombine] (1 << (C - x)) -> ((1 << C) >> x) if C is bitwidth - 1David Bolvansky2019-06-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: ``` %a = sub i32 31, %x %r = shl i32 1, %a => %d = shl i32 1, 31 %r = lshr i32 %d, %x Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/btZm Reviewers: spatel, lebedev.ri, nikic Reviewed By: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63652 llvm-svn: 364073
* [InstCombine] cttz(abs(x)) -> cttz(x)David Bolvansky2019-06-211-4/+15
| | | | | | | | | | | | Summary: Signedness does not change number of trailing zeros. Reviewers: spatel, lebedev.ri, nikic Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D63546 llvm-svn: 364064
* [InstCombine] fix typo in comment; NFCSanjay Patel2019-06-201-1/+1
| | | | llvm-svn: 363974
* [InstCombine] canonicalize check for power-of-2Sanjay Patel2019-06-201-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The form that compares against 0 is better because: 1. It removes a use of the input value. 2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2 3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS). This is a root cause for PR42314, but probably doesn't completely answer the codegen request: https://bugs.llvm.org/show_bug.cgi?id=42314 Alive proof: https://rise4fun.com/Alive/9kG Name: is power-of-2 %neg = sub i32 0, %x %a = and i32 %neg, %x %r = icmp eq i32 %a, %x => %dec = add i32 %x, -1 %a2 = and i32 %dec, %x %r = icmp eq i32 %a2, 0 Name: is not power-of-2 %neg = sub i32 0, %x %a = and i32 %neg, %x %r = icmp ne i32 %a, %x => %dec = add i32 %x, -1 %a2 = and i32 %dec, %x %r = icmp ne i32 %a2, 0 llvm-svn: 363956
* [InstCombine] cttz(-x) -> cttz(x)David Bolvansky2019-06-201-0/+6
| | | | | | | | | | | | | | | | Summary: Signedness does not change number of trailing zeros. Reviewers: spatel, lebedev.ri, nikic Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63534 llvm-svn: 363951
* [InstCombine] Fold icmp eq/ne (and %x, signbit), 0 -> %x s>=/s< 0 earlierHuihui Zhang2019-06-191-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: To generate simplified IR, make sure fold ``` (X & signbit) ==/!= 0) -> X s>=/s< 0; ``` is scheduled before fold ``` ((X << Y) & C) == 0 -> (X & (C >> Y)) == 0. ``` https://rise4fun.com/Alive/fbdh Reviewers: lebedev.ri, efriedma, spatel, craig.topper Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63026 llvm-svn: 363845
* AMDGPU: Fold readlane/readfirstlane callsMatt Arsenault2019-06-171-0/+24
| | | | llvm-svn: 363587
* AMDGPU: Fold readlane intrinsics of constantsMatt Arsenault2019-06-141-0/+7
| | | | | | | | I'm not 100% sure about this, since I'm worried about IR transforms that might end up introducing divergence downstream once replaced with a constant, but I haven't come up with an example yet. llvm-svn: 363406
* [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32Stanislav Mekhanoshin2019-06-131-1/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D63301 llvm-svn: 363339
* [IntrinsicEmitter] Extend argument overloading with forward references.Sander de Smalen2019-06-131-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the mechanism to overload intrinsic arguments by using either backward or forward references to the overloadable arguments. In for example: def int_something : Intrinsic<[LLVMPointerToElt<0>], [llvm_anyvector_ty], []>; LLVMPointerToElt<0> is a forward reference to the overloadable operand of type 'llvm_anyvector_ty' and would allow intrinsics such as: declare i32* @llvm.something.v4i32(<4 x i32>); declare i64* @llvm.something.v2i64(<2 x i64>); where the result pointer type is deduced from the element type of the first argument. If the returned pointer is not a pointer to the element type, LLVM will give an error: Intrinsic has incorrect return type! i64* (<4 x i32>)* @llvm.something.v4i32 Reviewers: RKSimon, arsenm, rnk, greened Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62995 llvm-svn: 363233
* [InstCombine] Handle -(X-Y) --> (Y-X) for unary fneg when NSZCameron McInally2019-06-111-1/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D62612 llvm-svn: 363082
* [InstCombine] Update fptrunc (fneg x)) -> (fneg (fptrunc x) for unary FNegCameron McInally2019-06-111-4/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D62629 llvm-svn: 363080
* [InstCombine] allow unordered preds when canonicalizing to fabs()Sanjay Patel2019-06-101-2/+2
| | | | | | | | | | We have a known-never-nan value via 'nnan', so an unordered predicate is the same as its ordered sibling. Similar to: rL362937 llvm-svn: 362954
* [InstCombine] fix bug in canonicalization to fabs()Sanjay Patel2019-06-101-2/+4
| | | | | | Forgot to translate the predicate clauses in rL362943. llvm-svn: 362945
* [InstCombine] change canonicalization to fabs() to use FMF on fsubSanjay Patel2019-06-101-19/+21
| | | | | | | | | | | | | | Similar to rL362909: This isn't the ideal fix (use FMF on the select), but it's still an improvement until we have better FMF propagation to selects and other FP math operators. I don't think there's much risk of regression from this change by not including the FMF on the fcmp any more. The nsz/nnan FMF should be the same on the fcmp and the fsub because they have the same operand. llvm-svn: 362943
* [InstCombine] allow unordered preds when canonicalizing to fabs()Sanjay Patel2019-06-101-2/+4
| | | | | | | PR42179: https://bugs.llvm.org/show_bug.cgi?id=42179 llvm-svn: 362937
* [InstCombine] foldICmpWithLowBitMaskedVal(): 'icmp sgt/sle': avoid miscompilesRoman Lebedev2019-06-091-0/+8
| | | | | | | | | | | | | A precondition 'x != 0' was forgotten by me: https://rise4fun.com/Alive/JFNP https://rise4fun.com/Alive/jHvL These 4 folds with non-constants could be re-enabled, but for now let's go for the simplest solution. https://bugs.llvm.org/show_bug.cgi?id=42198 llvm-svn: 362911
* [InstCombine] change canonicalization to fabs() to use FMF on fnegSanjay Patel2019-06-091-13/+25
| | | | | | | | | | | | | | | | | This isn't the ideal fix (use FMF on the select), but it's still an improvement until we have better FMF propagation to selects and other FP math operators. I don't think there's much risk of regression from this change by not including the FMF on the fcmp any more. The nsz/nnan FMF should be the same on the fcmp and the fneg (fsub) because they have the same operand. This works around the most glaring FMF logical inconsistency cited in PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 llvm-svn: 362909
* [InstCombine] simplify code for bitcast of insertelement; NFCSanjay Patel2019-06-051-5/+4
| | | | llvm-svn: 362655
* InstCombine: correctly change byval type attribute alongside call args.Tim Northover2019-06-051-4/+20
| | | | | | | | When the byval attribute has a type, it must match the pointee type of any parameter; but InstCombine was not updating the attribute when folding casts of various kinds away. llvm-svn: 362643
* [InstCombine] 'C-(C2-X) --> X+(C-C2)' constant-foldRoman Lebedev2019-05-311-1/+6
| | | | | | | | | | It looks this fold was already partially happening, indirectly via some other folds, but with one-use limitation. No other fold here has that restriction. https://rise4fun.com/Alive/ftR llvm-svn: 362217
* [InstCombine] 'add (sub C1, X), C2 --> sub (add C1, C2), X' constant-foldRoman Lebedev2019-05-311-1/+8
| | | | | | https://rise4fun.com/Alive/qJQ llvm-svn: 362216
* [InstCombine] Avoid use after free in DenseMap, when built with GCCMartin Storsjo2019-05-301-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, this used a statement like this: Map[A] = Map[B]; This is equivalent to the following: const auto &Src = Map[B]; auto &Dest = Map[A]; Dest = Src; The second statement, "auto &Dest = Map[A];" can insert a new element into the DenseMap, which can potentially grow and reallocate the DenseMap's internal storage, which will invalidate the existing reference to the source. When doing the actual assignment, the Src reference is dereferenced, accessing memory that was freed when the DenseMap grew. This issue hasn't shown up when LLVM was built with Clang, because the right hand side ended up dereferenced before evaulating the left hand side. (If the value type is a larger data type, Clang doesn't do this but behaves like GCC.) With GCC, a cast to Value* isn't enough to make it dereference the right hand side reference before invoking operator[] (while that is enough to make Clang/LLVM do the right thing for larger types), but storing it in an intermediate variable in a separate statement works. This fixes PR42065. Differential Revision: https://reviews.llvm.org/D62624 llvm-svn: 362150
* [InstCombine] Optimize always overflowing signed saturating add/subNikita Popov2019-05-291-8/+12
| | | | | | | | | Based on the overflow direction information added in D62463, we can now fold always overflowing signed saturating add/sub to signed min/max. Differential Revision: https://reviews.llvm.org/D62544 llvm-svn: 362006
* [InstCombine] Clean up saturing math overflow optimizations; NFCNikita Popov2019-05-281-29/+20
| | | | | | | Reduce duplication and make it easier to handle signed always-overflows conditions in the future. llvm-svn: 361863
* [ValueTracking][ConstantRange] Distinguish low/high always overflowNikita Popov2019-05-282-3/+4
| | | | | | | | | | | | In order to fold an always overflowing signed saturating add/sub, we need to know in which direction the always overflow occurs. This patch splits up AlwaysOverflows into AlwaysOverflowsLow and AlwaysOverflowsHigh to pass through this information (but it is not used yet). Differential Revision: https://reviews.llvm.org/D62463 llvm-svn: 361858
* [InstCombine] prevent crashing with invalid extractelement indexSanjay Patel2019-05-261-2/+3
| | | | | | | This was found/reduced from a fuzzer report: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14956 llvm-svn: 361729
* [InstCombine] Refactor OptimizeOverflowCheck; NFCINikita Popov2019-05-262-79/+65
| | | | | | | | | Extract method to compute overflow based on binop and signedness, and then make the result handling code generic. This extends the always-overflow handling to signed muls, but has currently no effect, as we don't compute always overflow for them (thus NFC). llvm-svn: 361721
* [InstCombine] Remove OverflowCheckFlavor; NFCNikita Popov2019-05-263-59/+20
| | | | | | | Instead pass binary op and signedness. The extra enum only makes things more complicated in this case. llvm-svn: 361720
* Use the DataLayout::typeSizeEqualsStoreSize helper. NFCBjorn Pettersson2019-05-241-1/+1
| | | | | | | | | Just a minor refactoring to use the new helper method DataLayout::typeSizeEqualsStoreSize(). This is done when checking if getTypeSizeInBits is equal/non-equal to getTypeStoreSizeInBits. llvm-svn: 361613
* [InstSimplify] fold insertelement-of-extractelementSanjay Patel2019-05-241-5/+0
| | | | | | | | This was partly handled in InstCombine (only the constant index case), so delete that and zap it more generally in InstSimplify. llvm-svn: 361576
* [InstCombine] remove redundant fold for extractelement; NFCSanjay Patel2019-05-231-4/+0
| | | | | | | | The out-of-bounds index pattern is handled by InstSimplify, so the extractelement should be eliminated next time it is visited. llvm-svn: 361570
* [InstCombine] remove redundant fold for insertelement; NFCSanjay Patel2019-05-231-4/+0
| | | | | | The out-of-bounds index pattern is handled by InstSimplify. llvm-svn: 361569
* [InstSimplify] insertelement V, undef, ? --> VSanjay Patel2019-05-231-4/+0
| | | | | | | | This was part of InstCombine, but it's better placed in InstSimplify. InstCombine also had an unreachable but weaker fold for insertelement with undef index, so that is deleted. llvm-svn: 361559
* [InstCombine] be more careful when transforming a shuffle maskSanjay Patel2019-05-231-4/+21
| | | | | | | | | | | This is reduced from a fuzzer test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14890 Usually, demanded elements should be able to simplify shuffle mask elements that are pointing to undef elements of its source operands, but that doesn't happen in the test case. llvm-svn: 361533
* [X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics ↵Craig Topper2019-05-221-122/+0
| | | | | | | | | | | | | | | | | | | | | | | into llvm.ceil/floor. Remove some isel patterns that existed because that was happening. We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond to the libm functions. For the libm functions we need to disable the precision exception so the llvm.floor/ceil functions should always map to encodings 0x9 and 0xA. We had a mix of isel patterns where some used 0x9 and 0xA and others used 0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA. Since we have no way in isel of knowing where the llvm.ceil/floor came from, we can't map X86 specific intrinsics with encodings 1 or 2 to it. We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like to see a use case and optimization advantage first. I've left the backend test cases to show the blend we now emit without the extra isel patterns. But I've removed the InstCombine tests completely. llvm-svn: 361425
* [InstCombine] fold shuffles of insert_subvectorsSanjay Patel2019-05-221-1/+52
| | | | | | | | | | | | | | | | | This should be a valid exception to the general rule of not creating new shuffle masks in IR... because we already do it. :) Also, DAG combining/legalization will undo this by widening the shuffle back out if needed. Explanation for how we already do this: SLP or vector source can create chains of insert/extract as shown in 1 of the examples from PR16739: https://godbolt.org/z/NlK7rA https://bugs.llvm.org/show_bug.cgi?id=16739 And we expect instcombine or DAGCombine to clean that up by creating relatively simple shuffles. Differential Revision: https://reviews.llvm.org/D62024 llvm-svn: 361338
* [NFC][InstCombine] Add FIXME for one-use check on constant negation transforms.Cameron McInally2019-05-201-0/+2
| | | | llvm-svn: 361197
* [InstCombine] Add visitFNeg(...) visitor for unary FnegCameron McInally2019-05-201-13/+27
| | | | | | | | Also, break out a helper function, namely foldFNegIntoConstant(...), which performs transforms common between visitFNeg(...) and visitFSub(...). Differential Revision: https://reviews.llvm.org/D61693 llvm-svn: 361188
* [InstCombine] move bitcast after insertelement-with-bitcasted-operandsSanjay Patel2019-05-171-0/+14
| | | | llvm-svn: 361058
* [InstCombine] canShiftBinOpWithConstantRHS(): drop bogus signbit checkRoman Lebedev2019-05-171-26/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In D61918 i was looking at dropping it in DAGCombiner `visitShiftByConstant()`, but as @craig.topper pointed out, it was copied from here. That check claims that the transform is illegal otherwise. That isn't true: 1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter https://rise4fun.com/Alive/K4A 2. For `ISD::AND`, there is no restriction on constants: https://rise4fun.com/Alive/Wy3 3. For `ISD::OR`, there is no restriction on constants: https://rise4fun.com/Alive/GOH 3. For `ISD::XOR`, there is no restriction on constants: https://rise4fun.com/Alive/ml6 So, why is it there then? As far as i can tell, it dates all the way back to original check-in rL7793. I think we should just drop it. Reviewers: spatel, craig.topper, efriedma, majnemer Reviewed By: spatel Subscribers: llvm-commits, craig.topper Tags: #llvm Differential Revision: https://reviews.llvm.org/D61938 llvm-svn: 361043
* [InstCombine] try harder to form rotate (funnel shift) (PR20750)Sanjay Patel2019-05-131-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a similar match for patterns ending in a truncate. This should be ok for all targets because the default expansion would still likely be better from replacing 2 'and' ops with 1. Attempt to show the logic equivalence in Alive (which doesn't currently have funnel-shift in its vocabulary AFAICT): %shamt = zext i8 %i to i32 %m = and i32 %shamt, 31 %neg = sub i32 0, %shamt %and4 = and i32 %neg, 31 %shl = shl i32 %v, %m %shr = lshr i32 %v, %and4 %or = or i32 %shr, %shl => %a = and i8 %i, 31 %shamt2 = zext i8 %a to i32 %neg2 = sub i32 0, %shamt2 %and4 = and i32 %neg2, 31 %shl = shl i32 %v, %shamt2 %shr = lshr i32 %v, %and4 %or = or i32 %shr, %shl https://rise4fun.com/Alive/V9r llvm-svn: 360605
* Add InstCombine::visitFNeg(...)Cameron McInally2019-05-102-0/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D61784 llvm-svn: 360461
* [InstCombine] When turning sext into zext due to known bits, return the new ↵Craig Topper2019-05-081-4/+2
| | | | | | | | | | | | ZExt instead of calling replaceinstuseswith The worklist loop that we're returning back to should be able to do the repacement itself. This is how we normally do replacements. My main motivation was that I observed that we weren't preserving the name of the result when we do this transform. The replacement code in the worklist loop will call takeName as part of the replacement. Differential Revision: https://reviews.llvm.org/D61695 llvm-svn: 360284
* [InstCombine] Add new combine to add foldingRobert Lougher2019-05-071-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (X | C1) + C2 --> (X | C1) ^ C1 iff (C1 == -C2) I verified the correctness using Alive: https://rise4fun.com/Alive/YNV This transform enables the following transform that already exists in instcombine: (X | Y) ^ Y --> X & ~Y As a result, the full expected transform is: (X | C1) + C2 --> X & ~C1 iff (C1 == -C2) There already exists the transform in the sub case: (X | Y) - Y --> X & ~Y However this does not trigger in the case where Y is constant due to an earlier transform: X - (-C) --> X + C With this new add fold, both the add and sub constant cases are handled. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61517 llvm-svn: 360185
* [InstCombine] allow sinking fneg operands through an FP min/maxSanjay Patel2019-05-071-5/+5
| | | | | | | | | Fundamentally/generally, we should not have to rely on bailouts/crippling of folds. In this particular case, I think we always recognize the inverted predicate min/max pattern, so there should not be any loss of optimization. Codegen looks better because we are eliminating an fneg. llvm-svn: 360180
* [InstCombine] sink FP negation of operands through selectSanjay Patel2019-05-061-0/+12
| | | | | | | | | | | | | | We don't always get this: Cond ? -X : -Y --> -(Cond ? X : Y) ...even with the legacy IR form of fneg in the case with extra uses, and we miss matching with the newer 'fneg' instruction because we are expecting binops through the rest of the path. Differential Revision: https://reviews.llvm.org/D61604 llvm-svn: 360075
* [InstCombine] reduce code duplication; NFCSanjay Patel2019-05-061-7/+7
| | | | llvm-svn: 360059
* [InstCombine] reduce code duplication; NFCISanjay Patel2019-05-061-22/+19
| | | | llvm-svn: 360051
* Move Value *RHSCIOp def into the scope where its actually used. NFCI.Simon Pilgrim2019-05-051-2/+1
| | | | llvm-svn: 359973
OpenPOWER on IntegriCloud