| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/JvS
This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.
llvm-svn: 337100
|
|
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/ocb
This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.
llvm-svn: 337098
|
|
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/azI
This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.
llvm-svn: 337096
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bug was created by rL335258 because we used to always call instsimplify
after trying the associative folds. After that change it became possible
for subsequent folds to encounter unsimplified code (and potentially assert
because of it).
Instead of carrying changed state through instcombine, we can just return
immediately. This allows instsimplify to run, so we can continue assuming
that easy folds have already occurred.
llvm-svn: 336965
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch is crucial for proving equality laundered/stripped
pointers. eg:
bool foo(A *a) {
return a == std::launder(a);
}
Clang with -fstrict-vtable-pointers will emit something like:
define dso_local zeroext i1 @_Z3fooP1A(%struct.A* %a) {
entry:
%c = bitcast %struct.A* %a to i8*
%call = tail call i8* @llvm.launder.invariant.group.p0i8(i8* %c)
%0 = bitcast %struct.A* %a to i8*
%1 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %0)
%2 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %call)
%cmp = icmp eq i8* %1, %2
ret i1 %cmp
}
and because %2 can be replaced with @llvm.strip.invariant.group(%0)
and that %2 and %1 will produce the same value (because strip is readnone)
we can replace compare with true.
Reviewers: rsmith, hfinkel, majnemer, amharc, kuhar
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D47423
llvm-svn: 336963
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A complementary fold to D49179.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Rny
Caveat: one more thing in `test/Transforms/InstCombine/icmp-logical.ll` breaks.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49205
llvm-svn: 336911
|
|
|
|
|
|
| |
This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches.
llvm-svn: 336871
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
https://bugs.llvm.org/show_bug.cgi?id=38123
This pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958
https://bugs.llvm.org/show_bug.cgi?id=21530
in unsigned case, therefore it is probably a good idea to improve it.
https://rise4fun.com/Alive/Rny
^ there are more opportunities for folds, i will follow up with them afterwards.
Caveat: this somehow exposes a missing opportunities
in `test/Transforms/InstCombine/icmp-logical.ll`
It seems, the problem is in `foldLogOpOfMaskedICmps()` in `InstCombineAndOrXor.cpp`.
But i'm not quite sure what is wrong, because it calls `getMaskedTypeForICmpPair()`,
which calls `decomposeBitTestICmp()` which should already work for these cases...
As @spatel notes in https://reviews.llvm.org/D49179#1158760,
that code is a rather complex mess, so we'll let it slide.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: yamauchi, majnemer, t.p.northover, llvm-commits
Differential Revision: https://reviews.llvm.org/D49179
llvm-svn: 336834
|
|
|
|
|
|
|
| |
This corresponds with the code for the single binop pattern
added in rL336684.
llvm-svn: 336696
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was originally intended with D48893, but as discussed there, we
have to make the folds safe from producing extra poison. This should
give the single binop folds the same capabilities as the existing
folds for 2-binops+shuffle.
LLVM binary opcode review: there are a total of 18 binops. There are 7
commutative binops (add, mul, and, or, xor, fadd, fmul) which we already
fold. We're able to fold 6 more opcodes with this patch (shl, lshr, ashr,
fdiv, udiv, sdiv). There are no folds for srem/urem/frem AFAIK. We don't
bother with sub/fsub with constant operand 1 because those are
canonicalized to add/fadd. 7 + 6 + 3 + 2 = 18.
llvm-svn: 336684
|
|
|
|
| |
llvm-svn: 336679
|
|
|
|
|
|
|
|
|
|
|
|
| |
The case with 2 variables is more complicated than the case where
we eliminate the shuffle entirely because a shuffle with an undef
mask element creates an undef result.
I'm not aware of any current analysis/transform that recognizes that
undef propagating to a div/rem/shift, but we have to guard against
the possibility.
llvm-svn: 336668
|
|
|
|
|
|
|
|
|
|
| |
getSafeVectorConstantForBinop() was calling getBinOpIdentity() assuming
that the constant we wanted was operand 1 (RHS). That's wrong, but I
don't think we could expose a bug or even a suboptimal fold from that
because the callers have other guards for any binop that would have
been affected.
llvm-svn: 336617
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
|
|
|
|
|
|
|
|
|
|
| |
As discussed in D49047 / D48987, shift-by-undef produces poison,
so we can't use undef vector elements in that case..
Note that we need to extend this for poison-generating flags,
and there's a proposal to create poison from FMF in D47963,
llvm-svn: 336562
|
|
|
|
|
|
|
|
|
|
|
| |
This is almost NFC, but there could be some case where the original
code had undefs in the constants (rather than just the shuffle mask),
and we'll use safe constants rather than undefs now.
The FIXME noted in foldShuffledBinop() is already visible in existing
tests, so correcting that is the next step.
llvm-svn: 336558
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in D48987, there are many different ways for this transform to go wrong.
In particular, the poison potential for shifts means we have to more careful with those ops.
I added tests to make that behavior visible for all of the different cases that I could find.
This is a partial fix. To make this review easier, I did not make changes for the single binop
pattern (handled in foldSelectShuffleWith1Binop()). I also left out some potential optimizations
noted with TODO comments. I'll follow-up once we're confident that things are correct here.
The goal is to correct all marked FIXME tests to either avoid the shuffle transform or do it safely.
Note that distinguishing when the shuffle mask contains undefs and using getBinOpIdentity() allows
for some improvements to div/rem patterns, so there are wins along with the missed opportunities
and fixes.
Differential Revision: https://reviews.llvm.org/D49047
llvm-svn: 336546
|
|
|
|
|
|
|
|
|
|
|
| |
It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() ||
T.isPointerTy()`.
I used Python's re.sub with this regex to update users:
r'([\w.\->()]+)isIntegerTy\(\)\s*\|\|\s*\1isPointerTy\(\)'
llvm-svn: 336462
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The replaceAllDbgUsesWith utility helps passes preserve debug info when
replacing one value with another.
This improves upon the existing insertReplacementDbgValues API by:
- Updating debug intrinsics in-place, while preventing use-before-def of
the replacement value.
- Falling back to salvageDebugInfo when a replacement can't be made.
- Moving the responsibiliy for rewriting llvm.dbg.* DIExpressions into
common utility code.
Along with the API change, this teaches replaceAllDbgUsesWith how to
create DIExpressions for three basic integer and pointer conversions:
- The no-op conversion. Applies when the values have the same width, or
have bit-for-bit compatible pointer representations.
- Truncation. Applies when the new value is wider than the old one.
- Zero/sign extension. Applies when the new value is narrower than the
old one.
Testing:
- check-llvm, check-clang, a stage2 `-g -O3` build of clang,
regression/unit testing.
- This resolves a number of mis-sized dbg.value diagnostics from
Debugify.
Differential Revision: https://reviews.llvm.org/D48676
llvm-svn: 336451
|
|
|
|
|
|
| |
are done"
llvm-svn: 336410
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Better NaN handling for AMDGCN fmed3.
All operands are checked for NaN now. The checks
were moved before the canonicalization to provide
a better mapping from fclamp. Changed the behaviour
of fmed3(x,y,NaN) to return max(x,y) instead of
min(x,y) in light of this. Updated tests as a result
and added some new cases to cover the fix.
Patch by Alan Baker
llvm-svn: 336375
|
|
|
|
|
|
| |
independent FMA and extractelement/insertelement.
llvm-svn: 336315
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have bailout hacks based on min/max in various places in instcombine
that shouldn't be necessary. The affected test was added for:
D48930
...which is a consequence of the improvement in:
D48584 (https://reviews.llvm.org/rL336172)
I'm assuming the visitTrunc bailout in this patch was added specifically
to avoid a change from SimplifyDemandedBits, so I'm just moving that
below the EvaluateInDifferentType optimization. A narrow min/max is still
a min/max.
llvm-svn: 336293
|
|
|
|
|
|
|
|
|
|
|
| |
When zext is EvaluatedInDifferentType, InstCombine
drops the dbg.value intrinsic. This patch tries to
preserve said DI, by inserting the zext's old DI in the
resulting instruction. (Only for integer type for now)
Differential Revision: https://reviews.llvm.org/D48331
llvm-svn: 336254
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the last significant change suggested in PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806#c5
...though there are several follow-ups noted in the code comments
in this patch to complete this transform.
It's possible that a binop feeding a select-shuffle has been eliminated
by earlier transforms (or the code was just written like this in the 1st
place), so we'll fail to match the patterns that have 2 binops from:
D48401,
D48678,
D48662,
D48485.
In that case, we can try to materialize identity constants for the remaining
binop to fill in the "ghost" lanes of the vector (where we just want to pass
through the original values of the source operand).
I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups.
For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor).
Differential Revision: https://reviews.llvm.org/D48830
llvm-svn: 336196
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch changes order of transform in InstCombineCompares to avoid
performing transforms based on ranges which produce complex bit arithmetics
before more simple things (like folding with constants) are done. See PR37636
for the motivating example.
Differential Revision: https://reviews.llvm.org/D48584
Reviewed By: spatel, lebedev.ri
llvm-svn: 336172
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
folding
This extends D48485 to allow another pair of binops (add/or) to be combined either
with or without a leading shuffle:
or X, C --> add X, C (when X and C have no common bits set)
Here, we need value tracking to determine that the 'or' can be reversed into an 'add',
and we've added general infrastructure to allow extending to other opcodes or moving
to where other passes could use that functionality.
Differential Revision: https://reviews.llvm.org/D48662
llvm-svn: 336128
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR37806)
This was discussed in D48401 as another improvement for:
https://bugs.llvm.org/show_bug.cgi?id=37806
If we have 2 different variable values, then we shuffle (select) those lanes,
shuffle (select) the constants, and then perform the binop. This eliminates a binop.
The new shuffle uses the same shuffle mask as the existing shuffle, so there's no
danger of creating a difficult shuffle.
All of the earlier constraints still apply, but we also check for extra uses to
avoid creating more instructions than we'll remove.
Additionally, we're disallowing the fold for div/rem because that could expose a
UB hole.
Differential Revision: https://reviews.llvm.org/D48678
llvm-svn: 335974
|
|
|
|
|
|
|
|
|
| |
There's no way to expose this difference currently,
but we should use the updated variable because the
original opcodes can go stale if we transform into
something new.
llvm-svn: 335920
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an enhancement to D48401 that was discussed in:
https://bugs.llvm.org/show_bug.cgi?id=37806
We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other
direction because that's generally better of course). This allows us to remove the shuffle
as we do in the regular opcodes-are-the-same cases.
This requires a small hack to make sure we don't introduce any extra poison:
https://rise4fun.com/Alive/ZGv
Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already
canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There
are planned enhancements for opcode transforms such or -> add.
Note that there's a different fold needed if we've already managed to simplify away a binop
as seen in the test based on PR37806, but we manage to get that one case here because this
fold is positioned above the demanded elements fold currently.
Differential Revision: https://reviews.llvm.org/D48485
llvm-svn: 335888
|
|
|
|
|
|
|
|
| |
that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".
llvm-svn: 335744
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This prevents InstCombine from creating mis-sized dbg.values when
replacing a sequence of casts with a simpler cast. For example, in:
(fptrunc (floor (fpext X))) -> (floorf X)
We no longer emit dbg.value(X) (with a 32-bit float operand) to describe
(fpext X) (which is a 64-bit float).
This was diagnosed by the debugify check added in r335682.
llvm-svn: 335696
|
|
|
|
|
|
|
| |
Add an overload for the common case where the replacement dbg.values
have the same DIExpressions as the originals.
llvm-svn: 335643
|
|
|
|
| |
llvm-svn: 335623
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to other patches in this series:
https://reviews.llvm.org/rL335512
https://reviews.llvm.org/rL335527
https://reviews.llvm.org/rL335597
https://reviews.llvm.org/rL335616
...this is filling a gap in analysis that is exposed by an unrelated select-of-constants transform.
I didn't see a way to unify the sext cases because each div/rem opcode results in a different fold.
Note that in this case, the backend might want to convert the select into math:
Name: sext urem
%e = sext i1 %x to i32
%r = urem i32 %y, %e
=>
%c = icmp eq i32 %y, -1
%z = zext i1 %c to i32
%r = add i32 %z, %y
llvm-svn: 335622
|
|
|
|
|
|
|
|
|
|
| |
Note: I didn't add a hasOneUse() check because the existing,
related fold doesn't have that check. I suspect that the
improved analysis and codegen make these some of the rare
canonicalization cases where we allow an increase in
instructions.
llvm-svn: 335597
|
|
|
|
|
|
|
|
|
| |
Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B).
This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B.
Differential Revision: https://reviews.llvm.org/D48535
llvm-svn: 335579
|
|
|
|
|
|
|
|
|
| |
This removes a "UDivFoldAction" in favor of a simple constant
matcher. In theory, the existing code could do more matching,
but I don't see any evidence or need for it. I've left a TODO
about using ValueTracking in case we see any regressions.
llvm-svn: 335545
|
|
|
|
| |
llvm-svn: 335527
|
|
|
|
|
|
| |
bits" MSVC warning (again). NFCI.
llvm-svn: 335457
|
|
|
|
|
|
| |
bits" MSVC warning. NFCI.
llvm-svn: 335454
|
|
|
|
|
|
|
|
| |
The commutative matcher makes things more complicated
here, and I'm planning an enhancement where this
form is more readable.
llvm-svn: 335343
|
|
|
|
|
|
|
|
|
| |
With non-commutative binops, we could be using the same
variable value as operand 0 in 1 binop and operand 1 in
the other, so we have to check for that possibility and
bail out.
llvm-svn: 335312
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR37806)
This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806
If we have a common variable operand used in a pair of binops with vector constants
that are vector selected together, then we can constant shuffle the constant vectors
to eliminate the shuffle instruction.
This has some tricky parts that are hopefully addressed in the tests and their
respective comments:
1. If the shuffle mask contains an undef element, then that lane of the result is
undef:
http://llvm.org/docs/LangRef.html#shufflevector-instruction
Therefore, we can replace the constant in that lane with an undef value except
for div/rem. With div/rem, an undef in the divisor would cause the whole op to
be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'.
2. Intersect the wrapping and FMF of the original binops for the new binop. There
should be no extra poison or fast-math potential in the new binop that wasn't
possible in the original code.
3. Disregard other uses. Given that we're eliminating uses (shortening the
dependency chain), I think that's always the right IR canonicalization. But
I purposely chose the udiv test to demonstrate the scenario where both
intermediate values have other uses because that seems likely worse for
codegen with an expensive math op. This seems like a very rare possibility to
me, so I don't think it requires a backend patch first.
Differential Revision: https://reviews.llvm.org/D48401
llvm-svn: 335283
|
|
|
|
|
|
|
|
| |
The previous code worked with vectors, but it failed when the
vector constants contained undef elements.
The matchers handle those cases.
llvm-svn: 335262
|
|
|
|
|
|
|
|
|
|
| |
This is outwardly NFC from what I can tell, but it should be more efficient
to simplify first (despite the name, SimplifyAssociativeOrCommutative does
not actually simplify as InstSimplify does - it creates/morphs instructions).
This should make it easier to refactor duplicated code that runs for all binops.
llvm-svn: 335258
|
|
|
|
|
|
| |
This was originally in D48401 and will be used there.
llvm-svn: 335242
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This also removes the need for atomic pseudo instructions, since
we select the correct encoding directly in SITargetLowering::lowerImage
for dimension-aware image intrinsics.
Mesa uses dimension-aware image intrinsics since
commit a9a7993441.
Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a
Reviewers: arsenm, rampitec, mareko, tpr, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48167
llvm-svn: 335231
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Use the expanded features of the TableGen generic tables to avoid manually
adding the combinatorially exploded set of intrinsics. The
getAMDGPUImageDimIntrinsic lookup function is early-out,
i.e. non-AMDGPU intrinsics will never look at the underlying table.
Use a generic approach for getting the new intrinsic overload to keep the
code simple, and make the image dmask handling more generic:
- handle non-sampler image loads
- handle the case where the set of demanded elements is not a prefix
There is some overlap between this code and an optimization that happens
in the backend during code generation. They currently complement each other:
- only the codegen optimization can generate vec3 loads
- only the InstCombine optimization can handle D16
The InstCombine optimization also likely covers more cases since the
codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization
in codegen once the infrastructure for vec3 is in place (which will probably
take a long time).
Modify the test cases to use dimension-aware intrinsics. This makes it
easier to see that the test coverage for the new intrinsics is equivalent,
and the old style intrinsics will be removed in a follow-up commit anyway.
Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad
Reviewers: arsenm, rampitec, majnemer
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48165
llvm-svn: 335230
|
|
|
|
|
|
|
|
| |
There are more existing potential users of this,
but I've limited this patch to the first couple
that I found to minimize typo risk.
llvm-svn: 335157
|