summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask checkNikita Popov2018-12-181-3/+22
| | | | | | | | | | | | | Checking whether a number has a certain number of trailing / leading zeros means checking whether it is of the form XXXX1000 / 0001XXXX, which can be done with an and+icmp. Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next step, this can be extended to non-equality predicates. Differential Revision: https://reviews.llvm.org/D55745 llvm-svn: 349530
* [InstCombine] Fix negative GEP offset evaluation for 32-bit pointersNikita Popov2018-12-121-5/+3
| | | | | | | | | | | | | | | | | This fixes https://bugs.llvm.org/show_bug.cgi?id=39908. The evaluateGEPOffsetExpression() function simplifies GEP offsets for use in comparisons against zero, basically by converting X*Scale+Offset==0 to X+Offset/Scale==0 if Scale divides Offset. However, before this is done, Offset is masked down to the pointer size. This results in incorrect results for negative Offsets, because we basically end up dividing the 32-bit offset *zero* extended to 64-bit bits (rather than sign extended). Fix this by explicitly sign extending the truncated value. Differential Revision: https://reviews.llvm.org/D55449 llvm-svn: 348987
* [InstCombine] foldICmpWithLowBitMaskedVal(): don't miscompile -1 vector eltsRoman Lebedev2018-12-061-0/+4
| | | | | | | | | | | | | | | | I was finally able to quantify what i thought was missing in the fix, it was vector constants. If we have a scalar (and %x, -1), it will be instsimplified before we reach this code, but if it is a vector, we may still have a -1 element. Thus, we want to avoid the fold if *at least one* element is -1. Or in other words, ignoring the undef elements, no sign bits should be set. Thus, m_NonNegative(). A follow-up for rL348181 https://bugs.llvm.org/show_bug.cgi?id=39861 llvm-svn: 348462
* [InstCombine] simplify icmps with same operands based on dominating cmpSanjay Patel2018-12-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tests here are based on the motivating cases from D54827. More background: 1. We don't get these cases in general with SimplifyCFG because the root of the pattern match is an icmp, not a branch. I'm not sure how often we encounter this pattern vs. the seemingly more likely case with branches, but I don't see evidence to leave the minimal pattern unoptimized. 2. This has a chance of increasing compile-time because we're using a ValueTracking call to handle the match. The motivating cases could be handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/ isImpliedFalseByMatchingCmp, but I saw that we have a more comprehensive wrapper around those, so we might as well use it here unless there's evidence that it's significantly slower. 3. Ideally, we'd handle the fold to constants in InstSimplify, but as with the existing code here, we could extend this to handle cases where the result is not a constant, but a new combined predicate. That would mean splitting the logic across the 2 passes and possibly duplicating the pattern-matching cost. 4. As mentioned in D54827, this seems like the kind of thing that should be handled in Correlated Value Propagation, but that pass is currently limited to dealing with instructions with constant operands, so extending this bit of InstCombine is the smallest/easiest way to get these patterns optimized. llvm-svn: 348367
* [InstCombine] rearrange foldICmpWithDominatingICmp; NFCSanjay Patel2018-12-041-33/+45
| | | | | | | | Move it out from under the constant check, reorder predicates, add comments. This makes it easier to extend to handle the non-constant case. llvm-svn: 348284
* [InstCombine] add helper for icmp with dominator; NFCSanjay Patel2018-12-041-20/+23
| | | | | | | | | | | | | | | | | | | There's a potential small enhancement to this code that could solve the cases currently under proposal in D54827 via SimplifyCFG. Whether instcombine should be doing this kind of semi-non-local analysis in the first place is an open question, but separating the logic out can only help if/when we decide to move it to a different pass. AFAICT, any proposal to do this in SimplifyCFG could also be seen as an overreach + it would be incomplete to start the fold from a branch rather than an icmp. There's another question here about the code for processUGT_ADDCST_ADD(). That part may be completely dead after rL234638 ? llvm-svn: 348273
* [InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds.Roman Lebedev2018-12-031-0/+4
| | | | | | | | | | | These two folds are invalid for this non-constant pattern when the mask ends up being all-ones: https://rise4fun.com/Alive/9au https://rise4fun.com/Alive/UcQM Fixes https://bugs.llvm.org/show_bug.cgi?id=39861 llvm-svn: 348181
* [InstCombine] propagate FMF for fcmp+fabs foldsSanjay Patel2018-11-071-7/+13
| | | | | | | By morphing the instruction rather than deleting and creating a new one, we retain fast-math-flags and potentially other metadata (profile info?). llvm-svn: 346331
* [InstCombine] peek through fabs() when checking isnan()Sanjay Patel2018-11-071-1/+6
| | | | | | | | | That should be the end of the missing cases for this fold. See earlier patches in this series: rL346321 rL346324 llvm-svn: 346327
* [InstCombine] add folds for fcmp Pred fabs(X), 0.0Sanjay Patel2018-11-071-0/+8
| | | | | | | | Similar to rL346321, we had folds for the ordered versions of these compares already, so add the unordered siblings for completeness. llvm-svn: 346324
* [InstCombine] add fold for fabs(X) u< 0.0Sanjay Patel2018-11-071-0/+5
| | | | | | | | | | | | | | The sibling fold for 'oge' --> 'ord' was already here, but this half was missing. The result of fabs() must be positive or nan, so asking if the result is negative or nan is the same as asking if the result is nan. This is another step towards fixing: https://bugs.llvm.org/show_bug.cgi?id=39475 llvm-svn: 346321
* [IR] add optional parameter for copying IR flags to compare instructionsSanjay Patel2018-11-071-29/+14
| | | | | | | | As shown, this is used to eliminate redundant code in InstCombine, and there are more cases where we should be using this pattern, but we're currently unintentionally dropping flags. llvm-svn: 346282
* [InstCombine] allow vector types for fcmp+fpext foldSanjay Patel2018-11-061-9/+9
| | | | llvm-svn: 346245
* [InstCombine] propagate fast-math-flags when folding fcmp+fpext, part 2Sanjay Patel2018-11-061-3/+6
| | | | llvm-svn: 346242
* [InstCombine] rearrange code for fcmp+fpext; NFCISanjay Patel2018-11-061-29/+27
| | | | llvm-svn: 346241
* [InstCombine] propagate fast-math-flags when folding fcmp+fpextSanjay Patel2018-11-061-5/+7
| | | | llvm-svn: 346240
* [InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2Sanjay Patel2018-11-061-2/+3
| | | | llvm-svn: 346238
* [InstCombine] reduce code; NFCSanjay Patel2018-11-061-1/+1
| | | | llvm-svn: 346235
* [InstCombine] propagate fast-math-flags when folding fcmp+fnegSanjay Patel2018-11-061-11/+16
| | | | | | | | | | This is another part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 This might be enough to fix that particular issue, but as noted with the FIXME, we're still dropping FMF on other folds around here. llvm-svn: 346234
* [InstCombine] canonicalize -0.0 to +0.0 in fcmpSanjay Patel2018-11-051-0/+7
| | | | | | | | | | | | | | | | | As stated in IEEE-754 and discussed in: https://bugs.llvm.org/show_bug.cgi?id=38086 ...the sign of zero does not affect any FP compare predicate. Known regressions were fixed with: rL346097 (D54001) rL346143 The transform will help reduce pattern-matching complexity to solve: https://bugs.llvm.org/show_bug.cgi?id=39475 ...as well as improve CSE and codegen (a zero constant is almost always easier to produce than 0x80..00). llvm-svn: 346147
* [InstCombine] refactor fabs+fcmp fold; NFCSanjay Patel2018-10-311-39/+45
| | | | | | | Also, remove/replace/minimize/enhance the tests for this fold. The code drops FMF, so it needs more tests and at least 1 fix. llvm-svn: 345734
* [InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFCSanjay Patel2018-10-311-2/+5
| | | | | | | | The 'OLT' case was updated at rL266175, so I assume it was just an oversight that 'UGE' was not included because that patch handled both predicates in InstSimplify. llvm-svn: 345727
* [InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negativeSanjay Patel2018-10-311-2/+3
| | | | | | | | | | | | | | This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 ...but given the current implementation (no FMF on casts), this is likely the only way to predicate the transform. This is part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 Differential Revision: https://reviews.llvm.org/D53874 llvm-svn: 345725
* [InstCombine] use 'match' to reduce code; NFCSanjay Patel2018-10-301-92/+90
| | | | llvm-svn: 345647
* [InstCombine] use getFltSemantics() instead of duplicating it; NFCSanjay Patel2018-10-301-19/+3
| | | | llvm-svn: 345613
* [InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd trySanjay Patel2018-10-101-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-trying r344082 because it unintentionally included extra diffs. Original commit message: icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344181
* revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalizationSanjay Patel2018-10-101-7/+0
| | | | | | This commit accidentally included the diffs from D53057. llvm-svn: 344178
* [InstCombine] reverse 'trunc X to <N x i1>' canonicalizationSanjay Patel2018-10-091-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344082
* [InstCombine] Handle vector compares in foldGEPIcmp(), take 2Jesper Antonsson2018-10-011-1/+2
| | | | | | | | | | | | | | | | | | | Summary: This is a continuation of the fix for PR34627 "InstCombine assertion at vector gep/icmp folding". (I just realized bugpoint had fuzzed the original test for me, so I had fixed another trigger of the same assert in adjacent code in InstCombine.) This patch avoids optimizing an icmp (to look only at the base pointers) when the resulting icmp would have a different type. The patch adds a testcase and also cleans up and shrinks the pre-existing test for the adjacent assert trigger. Reviewers: lebedev.ri, majnemer, spatel Reviewed By: lebedev.ri Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52494 llvm-svn: 343486
* [InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0)Sanjay Patel2018-09-271-0/+55
| | | | | | | | | | | | | | | | | | | | When C is not zero and infinites are not allowed (C / X) > 0 is a sign test. Depending on the sign of C, the predicate must be swapped. E.g.: foo(double X) { if ((-2.0 / X) <= 0) ... } => foo(double X) { if (X >= 0) ... } Patch by: @marels (Martin Elshuber) Differential Revision: https://reviews.llvm.org/D51942 llvm-svn: 343228
* [InstCombine] Handle vector compares in foldGEPIcmp()Jesper Antonsson2018-09-201-1/+1
| | | | | | | | | | | | | | | | Summary: This is to fix PR38984 "InstCombine assertion at vector gep/icmp folding": https://bugs.llvm.org/show_bug.cgi?id=38984 Reviewers: majnemer, spatel, lattner, lebedev.ri Reviewed By: lebedev.ri Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D52263 llvm-svn: 342647
* [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((-1 << y) ↵Roman Lebedev2018-09-191-5/+7
| | | | | | | | | | | | | | | | | | | | | | >> y) mask Summary: The last low-bit-mask-pattern-producing-pattern i can think of. https://rise4fun.com/Alive/UGzE <- non-canonical But we can not canonicalize it because of extra uses. https://bugs.llvm.org/show_bug.cgi?id=38123 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52148 llvm-svn: 342548
* [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << ↵Roman Lebedev2018-09-191-2/+5
| | | | | | | | | | | | | | | | | | | y)+(-1)) mask Summary: Same as to D52146. `((1 << y)+(-1))` is simply non-canoniacal version of `~(-1 << y)`: https://rise4fun.com/Alive/0vl We can not canonicalize it due to the extra uses. But we can handle it here. Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52147 llvm-svn: 342547
* [InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) maskRoman Lebedev2018-09-191-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Two folds are happening here: 1. https://rise4fun.com/Alive/oaFX 2. And then `foldICmpWithHighBitMask()` (D52001): https://rise4fun.com/Alive/wsP4 This change doesn't just add the handling for eq/ne predicates, it actually builds upon the previous `foldICmpWithLowBitMaskedVal()` work, so **all** the 16 fold variants* are immediately supported. I'm indeed only testing these two predicates. I do not feel like re-proving all 16 folds*, because they were already proven for the general case of constant with all-ones in low bits. So as long as the mask produces all-ones in low bits, i'm pretty sure the fold is valid. But required, i can re-prove, let me know. * eq/ne are commutative - 4 folds; ult/ule/ugt/uge - are not commutative (the commuted variant is InstSimplified), 4 folds; slt/sle/sgt/sge are not commutative - 4 folds. 12 folds in total. https://bugs.llvm.org/show_bug.cgi?id=38123 https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52146 llvm-svn: 342546
* [InstCombine] Inefficient pattern for high-bits checking 3 (PR38708)Roman Lebedev2018-09-151-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: It is sometimes important to check that some newly-computed value is non-negative and only n bits wide (where n is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) The last (as far i know?) pattern, non-canonical due to the extra use. https://godbolt.org/z/aCMsPk https://rise4fun.com/Alive/I6f https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52062 llvm-svn: 342321
* [InstCombine] Inefficient pattern for high-bits checking 2 (PR38708)Roman Lebedev2018-09-131-19/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: It is sometimes important to check that some newly-computed value is non-negative and only n bits wide (where n is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) More complicated, canonical pattern: https://rise4fun.com/Alive/uhA We do need to have two `switch()`'es like this, to not mismatch the swappable predicates. https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52001 llvm-svn: 342173
* [InstCombine] Inefficient pattern for high-bits checking (PR38708)Roman Lebedev2018-09-121-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: It is sometimes important to check that some newly-computed value is non-negative and only `n` bits wide (where `n` is a variable.) There are **many** ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) Let's handle the second variant first, since it is much simpler. https://rise4fun.com/Alive/LYjY https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51985 llvm-svn: 342067
* [InstCombine] add folds for unsigned-overflow comparesSanjay Patel2018-09-111-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | Name: op_ugt_sum %a = add i8 %x, %y %r = icmp ugt i8 %x, %a => %notx = xor i8 %x, -1 %r = icmp ugt i8 %y, %notx Name: sum_ult_op %a = add i8 %x, %y %r = icmp ult i8 %a, %x => %notx = xor i8 %x, -1 %r = icmp ugt i8 %y, %notx https://rise4fun.com/Alive/ZRxI AFAICT, this doesn't interfere with any add-saturation patterns because those have >1 use for the 'add'. But this should be better for IR analysis and codegen in the basic cases. This is another fold inspired by PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 llvm-svn: 342004
* [InstCombine] add folds for icmp with xor mask constantSanjay Patel2018-09-111-10/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are the folds in Alive; Name: xor_ult Pre: isPowerOf2(-C1) %xor = xor i8 %x, C1 %r = icmp ult i8 %xor, C1 => %r = icmp ugt i8 %x, ~C1 Name: xor_ugt Pre: isPowerOf2(C1+1) %xor = xor i8 %x, C1 %r = icmp ugt i8 %xor, C1 => %r = icmp ugt i8 %x, C1 https://rise4fun.com/Alive/Vty The ugt case in its simplest form was already handled by DemandedBits, but that's not ideal as shown in the multi-use test. I'm not sure if these are all of the symmetrical folds, but I adjusted the existing code for one of the folds to try to show the similarities. There's no obvious connection, but this is another preliminary step for PR14613... https://bugs.llvm.org/show_bug.cgi?id=14613 llvm-svn: 341997
* InstCombine: move hasOneUse check to the top of foldICmpAddConstantTim Northover2018-09-101-3/+3
| | | | | | | | | | | | There were two combines not covered by the check before now, neither of which actually differed from normal in the benefit analysis. The most recent seems to be because it was just added at the top of the function (naturally). The older is from way back in 2008 (r46687) when we just didn't put those checks in so routinely, and has been diligently maintained since. llvm-svn: 341831
* [InstCombine] Fold icmp ugt/ult (add nuw X, C2), C --> icmp ugt/ult X, (C - C2)Nicola Zaghen2018-09-041-5/+8
| | | | | | | | | | Support for sgt/slt was added in rL294898, this adds the same cases also for unsigned compares. This is the Alive proof: https://rise4fun.com/Alive/nyY Differential Revision: https://reviews.llvm.org/D50972 llvm-svn: 341353
* [InstCombine] Add splat vector constant support to foldICmpAddOpConst.Craig Topper2018-08-201-16/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D50946 llvm-svn: 340231
* [InstCombine] move vector compare before same-shuffled opsSanjay Patel2018-08-161-0/+28
| | | | | | | This is a step towards fixing PR37463: https://bugs.llvm.org/show_bug.cgi?id=37463 llvm-svn: 339875
* ValueTracking: Start enhancing isKnownNeverNaNMatt Arsenault2018-08-091-2/+2
| | | | llvm-svn: 339399
* [InstCombine] Re-commit: Fold 'check for [no] signed truncation' patternRoman Lebedev2018-07-181-0/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. The DAGCombine will reverse this transform, see https://reviews.llvm.org/D49266 This transform is surprisingly frustrating. This does not deal with non-splat shift amounts, or with undef shift amounts. I've outlined what i think the solution should be: ``` // Potential handling of non-splats: for each element: // * if both are undef, replace with constant 0. // Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0. // * if both are not undef, and are different, bailout. // * else, only one is undef, then pick the non-undef one. ``` This is a re-commit, as the original patch, committed in rL337190 was reverted in rL337344 as it broke chromium build: https://bugs.llvm.org/show_bug.cgi?id=38204 and https://crbug.com/864832 Proofs that the fixed folds are ok: https://rise4fun.com/Alive/VYM Differential Revision: https://reviews.llvm.org/D49320 llvm-svn: 337376
* Revert "[InstCombine] Fold 'check for [no] signed truncation' pattern"Bob Haarman2018-07-181-69/+0
| | | | | | | | | This reverts r337190 (and a few follow-up commits), which caused the Chromium build to fail. See https://bugs.llvm.org/show_bug.cgi?id=38204 and https://crbug.com/864832 llvm-svn: 337344
* Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.Simon Pilgrim2018-07-171-2/+2
| | | | llvm-svn: 337257
* [InstCombine] Fold 'check for [no] signed truncation' patternRoman Lebedev2018-07-161-0/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. Proofs for this transform: https://rise4fun.com/Alive/mgu This transform is surprisingly frustrating. This does not deal with non-splat shift amounts, or with undef shift amounts. I've outlined what i think the solution should be: ``` // Potential handling of non-splats: for each element: // * if both are undef, replace with constant 0. // Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0. // * if both are not undef, and are different, bailout. // * else, only one is undef, then pick the non-undef one. ``` The DAGCombine will reverse this transform, see https://reviews.llvm.org/D49266 Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: JDevlieghere, rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D49320 llvm-svn: 337190
* [InstCombine] fold icmp pred (sub 0, X) C for vector typeChen Zheng2018-07-161-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D49283 llvm-svn: 337141
* [NFC][InstCombine] foldICmpWithLowBitMaskedVal(): update comments.Roman Lebedev2018-07-141-2/+3
| | | | | | | | All predicates are handled. There does not seem to be any other possible folds here. There are some more folds possible with inverted mask though. llvm-svn: 337112
OpenPOWER on IntegriCloud