| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm circling back around to a loose end from D51929.
The backend (either CGP or DAG) doesn't recognize this pattern, so we end up with different asm for these IR variants.
Regardless of any future changes to canonicalize to saturation/overflow intrinsics, we want to get raw IR variations
into the minimal number of raw IR forms. If/when we can canonicalize to intrinsics, that will make that step easier.
Pre: C2 == ~C1
%a = add i32 %x, C1
%c = icmp ugt i32 %x, C2
%r = select i1 %c, i32 -1, i32 %a
=>
%a = add i32 %x, C1
%c2 = icmp ult i32 %x, C2
%r = select i1 %c2, i32 %a, i32 -1
https://rise4fun.com/Alive/pkH
Differential Revision: https://reviews.llvm.org/D57352
llvm-svn: 352536
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
| |
|
|
|
|
|
|
|
| |
This is matching the equivalent of the DAG expansion,
so it should never end up with worse perf than the
original code even if the target doesn't have a rotate
instruction.
llvm-svn: 350672
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cttz/ctlz intrinsics have a parameter specifying whether the
result is undefined for zero. cttz(x, false) can be relaxed to
cttz(x, true) if x is known non-zero, and in fact such an optimization
is already performed. However, this currently doesn't work if x is
non-zero as a result of a select rather than an explicit branch.
This patch adds handling for this case, thus allowing
x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y.
Differential Revision: https://reviews.llvm.org/D55786
llvm-svn: 350463
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The final piece of IR-level analysis to allow this was committed with:
rL350188
Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.
The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.
llvm-svn: 350199
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an almost direct move of the functionality from InstCombine to
InstSimplify. There's no reason not to do this in InstSimplify because
we never create a new value with this transform.
(There's a question of whether any dominance-based transform belongs in
either of these passes, but that's a separate issue.)
I've changed 1 of the conditions for the fold (1 of the blocks for the
branch must be the block we started with) into an assert because I'm not
sure how that could ever be false.
We need 1 extra check to make sure that the instruction itself is in a
basic block because passes other than InstCombine may be using InstSimplify
as an analysis on values that are not wired up yet.
The 3-way compare changes show that InstCombine has some kind of
phase-ordering hole. Otherwise, we would have already gotten the intended
final result that we now show here.
llvm-svn: 347896
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These asserts are based on the assumption that the order of true/false operands in a select and those in the compare would always be the same.
This fixes PR39595.
Reviewers: craig.topper, spatel, dmgreen
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54359
llvm-svn: 346874
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cmp+branch variant of this pattern is shown in:
https://bugs.llvm.org/show_bug.cgi?id=34924
...and as discussed there, we probably can't transform
that without a rotate intrinsic. We do have that now
via funnel shift, but we're not quite ready to
canonicalize IR to that form yet. The case with 'select'
should already be transformed though, so that's this patch.
The sequence with negation followed by masking is what we
use in the backend and partly in clang (though that part
should be updated).
https://rise4fun.com/Alive/TplC
%cmp = icmp eq i32 %shamt, 0
%sub = sub i32 32, %shamt
%shr = lshr i32 %x, %shamt
%shl = shl i32 %x, %sub
%or = or i32 %shr, %shl
%r = select i1 %cmp, i32 %x, i32 %or
=>
%neg = sub i32 0, %shamt
%masked = and i32 %shamt, 31
%maskedneg = and i32 %neg, 31
%shl2 = lshr i32 %x, %masked
%shr2 = shl i32 %x, %maskedneg
%r = or i32 %shl2, %shr2
llvm-svn: 346807
|
| |
|
|
|
|
|
|
|
| |
This is NFCI for InstCombine because it calls InstSimplify,
so I left the tests for this transform there. As noted in
the code comment, we can allow this fold more often by using
FMF and/or value tracking.
llvm-svn: 346169
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like we correctly removed edge cases with 0.0 from D50714,
but we were a bit conservative because getBinOpIdentity() doesn't
distinguish between +0.0 and -0.0 and 'nsz' is effectively always
true for fcmp (see discussion in:
https://bugs.llvm.org/show_bug.cgi?id=38086
Without this change, we would get regressions by canonicalizing
to +0.0 in all fcmp, and that's a step towards solving:
https://bugs.llvm.org/show_bug.cgi?id=39475
llvm-svn: 346143
|
| |
|
|
|
|
|
| |
This is another step towards completely removing the fake
binop queries for not/neg/fneg.
llvm-svn: 345036
|
| |
|
|
| |
llvm-svn: 345034
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with. To fix this
I've:
- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads.
- Used that array to pass into the Intrinsic::getDeclaration call.
- Added a CreateUnaryIntrinsic to replace the most common use of
CreateIntrinsic where the type was auto-deduced from operand 0.
- Added a bunch more unit tests to test Create*Intrinsic calls that
weren't being tested (including the FMF flag that wasn't checked).
This was suggested as part of the AMDGPU specific atomic optimizer
review (https://reviews.llvm.org/D51969).
Differential Revision: https://reviews.llvm.org/D52087
llvm-svn: 343962
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
invertible
Summary: This restores the combine that was reverted in r341883. The infinite loop from the failing test no longer occurs due to changes from r342163.
Reviewers: spatel, dmgreen
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52070
llvm-svn: 342797
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
are freely invertible.
This allows the xor to be removed completely.
This might help with recomitting r341674, but seems good regardless.
Coincidentally fixes PR38915.
Differential Revision: https://reviews.llvm.org/D51964
llvm-svn: 342163
|
| |
|
|
|
|
|
|
|
| |
I accidentally committed this diff with rL342147 because
I had applied D51964. We probably do need those checks,
but D51964 has tests and more discussion/motivation,
so they should be re-added with that patch.
llvm-svn: 342149
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't have a test case for this, but it's motivated by
the discussion in D51964, and I've added TODO comments for
the better fix - move simplifications into instsimplify
because that's more efficient and reduces risk of infinite
loops in instcombine caused by transforms trying to do the
opposite folds.
In this case, we know that the transform that tries to move
'not' through min/max can be fooled by the multiple uses
of a value in another min/max, so try to squash the
foldSPFofSPF() patterns first.
llvm-svn: 342147
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Revert min/max changes in rL341674 dues to high compile times causing timeouts (PR38897).
Checking in to unblock failing builds. Patch available for post-commit review and re-revert once resolved.
Working on a smaller reproducer for PR38897.
Reviewers: craig.topper, spatel
Subscribers: sanjoy, jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D51897
llvm-svn: 341883
|
| |
|
|
|
|
|
|
|
|
|
|
| |
invertible
If the ~X wasn't able to simplify above the max/min, we might be able to simplify it by moving it below the max/min.
I had to modify the ~(min/max ~X, Y) transform to prevent getting stuck in a loop when we saw the new ~(max/min X, ~Y) before the ~Y had been folded away to remove the new not.
Differential Revision: https://reviews.llvm.org/D51398
llvm-svn: 341674
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
If OtherOpT or OtherOpF have scalar types and the condition is a vector,
we would create an invalid select.
Reviewers: spatel, john.brawn, mssimpso, craig.topper
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D51781
llvm-svn: 341666
|
| |
|
|
|
|
| |
We were calling getNumUses to check for 1 or 2 uses. But getNumUses is linear in the number of uses. We can instead use !hasNUsesOrMore(3) which will stop the linear scan as soon as it determines there are at least 3 uses even if there are more.
llvm-svn: 340939
|
| |
|
|
| |
llvm-svn: 340790
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Follow up for https://reviews.llvm.org/rL339520 and https://reviews.llvm.org/rL338300
Alive:
```
%A = fcmp oeq float %x, 0.0
%B = fadd nsz float %x, %z
%C = select i1 %A, float %B, float %y
=>
%C = select i1 %A, float %z, float %y
----------
%A = fcmp oeq float %x, 0.0
%B = fadd nsz float %x, %z
%C = select %A, float %B, float %y
=>
%C = select %A, float %z, float %y
Done: 1
Optimization is correct
%A = fcmp une float %x, -0.0
%B = fadd nsz float %x, %z
%C = select i1 %A, float %y, float %B
=>
%C = select i1 %A, float %y, float %z
----------
%A = fcmp une float %x, -0.0
%B = fadd nsz float %x, %z
%C = select %A, float %y, float %B
=>
%C = select %A, float %y, float %z
Done: 1
Optimization is correct
```
Reviewers: spatel, lebedev.ri
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50714
llvm-svn: 340538
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
intersection
Summary: This change address bug 38641
Reviewers: spatel, wristow
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D50996
llvm-svn: 340222
|
| |
|
|
| |
llvm-svn: 340150
|
| |
|
|
| |
llvm-svn: 339938
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Basic version was merged - https://reviews.llvm.org/D49954
This adds support for FP & non-commutative opcodes
Precommited tests: https://reviews.llvm.org/rL338727
Reviewers: spatel, lebedev.ri
Reviewed By: spatel
Subscribers: jfb
Differential Revision: https://reviews.llvm.org/D50190
llvm-svn: 339520
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This is a retry of rL339439 with a fix for the problem that
caused the original commit to be reverted at rL339446.
That problem was that the compare can be integer while
the binop is FP or vice-versa, so we need to use the binop
type when we ask for the identity constant.
A test to guard against the problem was added at rL339453.
llvm-svn: 339469
|
| |
|
|
|
|
|
| |
That was supposed to be NFC, but it exposed a logic hole somewhere that
caused bots to fail.
llvm-svn: 339446
|
| |
|
|
|
|
|
| |
This should make it easier to folow and to add the planned enhancements
such as D50190.
llvm-svn: 339439
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fold
%A = icmp eq i8 %x, 0
%B = xor i8 %x, %z
%C = select i1 %A, i8 %B, i8 %y
To
%C = select i1 %A, i8 %z, i8 %y
Fixes https://bugs.llvm.org/show_bug.cgi?id=38345
Proof: https://rise4fun.com/Alive/43J
Reviewers: lebedev.ri, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49954
llvm-svn: 338300
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D48754
llvm-svn: 338092
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49238
llvm-svn: 337143
|
| |
|
|
|
|
|
|
|
|
| |
When adjusting a cmp in order to canonicalize an abs/nabs select pattern we need
to use the type of the existing operand when creating a new operand not the
type of a select operand, as the two may be different.
This fixes PR37686.
llvm-svn: 334019
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This is the planned enhancement to D47163 / rL333611.
We want to match cmp/select sizes because that will be recognized
as min/max more easily and lead to better codegen (especially for
vector types).
As mentioned in D47163, this improves some of the tests that would
also be folded by D46380, so we may want to adjust that patch to
match the new patterns where the extend op occurs after the select.
llvm-svn: 333689
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already do this for min/max (see the blob above the diff),
so we should do the same for abs/nabs.
A sign-bit check (<s 0) is used as a predicate for other IR
transforms and it's likely the best for codegen.
This might solve the motivating cases for D47037 and D47041,
but I think those patches still make sense. We can't guarantee
this canonicalization if the icmp has more than one use.
Differential Revision: https://reviews.llvm.org/D47076
llvm-svn: 332819
|
| |
|
|
|
|
| |
min/max and ignore abs/nabs.
llvm-svn: 332770
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add logic for the special case when a cmp+select can clearly be
reduced to just a bitwise logic instruction, and remove an
over-reaching chunk of general purpose bit magic. The primary goal
is to remove cases where we are not improving the IR instruction
count when doing these select transforms, and in all cases here that
is true.
In the motivating 3-way compare tests, there are further improvements
because we can combine/propagate select values (not sure if that
belongs in instcombine, but it's there for now).
DAGCombiner has folds to turn some of these selects into bit magic,
so there should be no difference in the end result in those cases.
Not all constant combinations are handled there yet, however, so it
is possible that some targets will see more cmov/csel codegen with
this change in IR canonicalization.
Ideally, we'll go further to *not* turn selects into multiple
logic/math ops in instcombine, and we'll canonicalize to selects.
But we should make sure that this step does not result in regressions
first (and if it does, we should fix those in the backend).
The general direction for this change was discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105373.html
http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html
Alive proofs for the new bit magic:
https://rise4fun.com/Alive/XG7
Differential Revision: https://reviews.llvm.org/D46086
llvm-svn: 331486
|
| |
|
|
|
|
|
|
|
|
| |
As discussed in D45862, we want to delete parts of
this code because it can create more instructions
than it removes. But we also want to preserve some
folds that are winners, so tidy up what's here to
make splitting the good from bad a bit easier.
llvm-svn: 330841
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The fold added in D45108 did not account for the fact that
the and instruction is commutative, and if the mask is a variable,
the mask variable and the fold variable may be swapped.
I have noticed this by accident when looking into [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]]
This extends/generalizes that fold, so it is handled too.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45539
llvm-svn: 330001
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
See [[ https://bugs.llvm.org/show_bug.cgi?id=36950 | PR36950 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=17564 | PR17564 ]], D45065, D45107
https://godbolt.org/g/iAYRup
Alive proof: https://rise4fun.com/Alive/uiH
Testing: `ninja check-llvm`
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45108
llvm-svn: 329492
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This continues the FP constant pattern matching improvements from:
https://reviews.llvm.org/rL327627
https://reviews.llvm.org/rL327339
https://reviews.llvm.org/rL327307
Several integer constant matchers also have this ability. I'm
separating matching of integer/pointer null from FP positive zero
and renaming/commenting to make the functionality clearer.
llvm-svn: 328461
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is complicated by -0.0 and nan. This is based on the DAG patterns
as shown in D44091. I'm hoping that we can just remove those DAG folds
and always rely on IR canonicalization to handle the matching to fabs.
We would still need to delete the broken code from DAGCombiner to fix
PR36600:
https://bugs.llvm.org/show_bug.cgi?id=36600
Differential Revision: https://reviews.llvm.org/D44550
llvm-svn: 327858
|
| |
|
|
|
|
|
|
|
|
| |
getNumUses is a linear time operation. It traverses the user linked list to the end and counts as it goes. Since we are only interested in small constant counts, we should use hasNUses or hasNUsesMore more that terminate the traversal as soon as it can provide the answer.
There are still two other locations in InstCombine, but changing those would force a rebase of D44266 which if accepted would remove them.
Differential Revision: https://reviews.llvm.org/D44398
llvm-svn: 327315
|
| |
|
|
| |
llvm-svn: 326828
|
| |
|
|
|
|
|
|
|
| |
ValueTracking
Most of the folds based on SelectPatternResult belong in InstSimplify rather than
InstCombine, so the helper code should be available to other passes/analysis.
llvm-svn: 326812
|
| |
|
|
|
|
|
|
|
|
|
|
| |
select(C, Z, binop(Y, W)) if the binop is rem or div.
The select may have been preventing a division by zero or INT_MIN/-1 so removing it might not be safe.
Fixes PR36362.
Differential Revision: https://reviews.llvm.org/D43276
llvm-svn: 325148
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the instcombine part of unsigned saturation canonicalization.
Backend patches already commited:
https://reviews.llvm.org/D37510
https://reviews.llvm.org/D37534
It converts unsigned saturated subtraction patterns to forms recognized
by the backend:
(a > b) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b < a) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b > a) ? 0 : a - b -> ((a > b) ? a : b) - b)
(a < b) ? 0 : a - b -> ((a > b) ? a : b) - b)
((a > b) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b < a) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b > a) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
((a < b) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
Patch by Yulia Koval!
Differential Revision: https://reviews.llvm.org/D41480
llvm-svn: 324255
|
| |
|
|
|
|
|
|
|
|
| |
Three (or more) operand getelementptrs could plausibly also be handled, but
handling only two-operand fits in easily with the existing BinaryOperator
handling.
Differential Revision: https://reviews.llvm.org/D39958
llvm-svn: 322930
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is precedence for factorization transforms in instcombine for FP ops with fast-math.
We also have similar logic in foldSPFofSPF().
It would take more work to add this to reassociate because that's specialized for binops,
and min/max are not binops (or even single instructions). Also, I don't have evidence that
larger min/max trees than this exist in real code, but if we find that's true, we might
want to reorganize where/how we do this optimization.
In the motivating example from https://bugs.llvm.org/show_bug.cgi?id=35717 , we have:
int test(int xc, int xm, int xy) {
int xk;
if (xc < xm)
xk = xc < xy ? xc : xy;
else
xk = xm < xy ? xm : xy;
return xk;
}
This patch solves that problem because we recognize more min/max patterns after rL321672
https://rise4fun.com/Alive/Qjne
https://rise4fun.com/Alive/3yg
Differential Revision: https://reviews.llvm.org/D41603
llvm-svn: 321998
|