| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
```
%a = sub i32 31, %x
%r = shl i32 1, %a
=>
%d = shl i32 1, 31
%r = lshr i32 %d, %x
Done: 1
Optimization is correct!
```
https://rise4fun.com/Alive/btZm
Reviewers: spatel, lebedev.ri, nikic
Reviewed By: lebedev.ri
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63652
llvm-svn: 364073
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Signedness does not change number of trailing zeros.
Reviewers: spatel, lebedev.ri, nikic
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D63546
llvm-svn: 364064
|
|
|
|
| |
llvm-svn: 363974
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The form that compares against 0 is better because:
1. It removes a use of the input value.
2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS).
This is a root cause for PR42314, but probably doesn't completely answer the codegen request:
https://bugs.llvm.org/show_bug.cgi?id=42314
Alive proof:
https://rise4fun.com/Alive/9kG
Name: is power-of-2
%neg = sub i32 0, %x
%a = and i32 %neg, %x
%r = icmp eq i32 %a, %x
=>
%dec = add i32 %x, -1
%a2 = and i32 %dec, %x
%r = icmp eq i32 %a2, 0
Name: is not power-of-2
%neg = sub i32 0, %x
%a = and i32 %neg, %x
%r = icmp ne i32 %a, %x
=>
%dec = add i32 %x, -1
%a2 = and i32 %dec, %x
%r = icmp ne i32 %a2, 0
llvm-svn: 363956
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Signedness does not change number of trailing zeros.
Reviewers: spatel, lebedev.ri, nikic
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63534
llvm-svn: 363951
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
To generate simplified IR, make sure fold
```
(X & signbit) ==/!= 0) -> X s>=/s< 0;
```
is scheduled before fold
```
((X << Y) & C) == 0 -> (X & (C >> Y)) == 0.
```
https://rise4fun.com/Alive/fbdh
Reviewers: lebedev.ri, efriedma, spatel, craig.topper
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63026
llvm-svn: 363845
|
|
|
|
| |
llvm-svn: 363587
|
|
|
|
|
|
|
|
| |
I'm not 100% sure about this, since I'm worried about IR transforms
that might end up introducing divergence downstream once replaced with
a constant, but I haven't come up with an example yet.
llvm-svn: 363406
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D63301
llvm-svn: 363339
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend the mechanism to overload intrinsic arguments by using either
backward or forward references to the overloadable arguments.
In for example:
def int_something : Intrinsic<[LLVMPointerToElt<0>],
[llvm_anyvector_ty], []>;
LLVMPointerToElt<0> is a forward reference to the overloadable operand
of type 'llvm_anyvector_ty' and would allow intrinsics such as:
declare i32* @llvm.something.v4i32(<4 x i32>);
declare i64* @llvm.something.v2i64(<2 x i64>);
where the result pointer type is deduced from the element type of the
first argument.
If the returned pointer is not a pointer to the element type, LLVM will
give an error:
Intrinsic has incorrect return type!
i64* (<4 x i32>)* @llvm.something.v4i32
Reviewers: RKSimon, arsenm, rnk, greened
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D62995
llvm-svn: 363233
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D62612
llvm-svn: 363082
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D62629
llvm-svn: 363080
|
|
|
|
|
|
|
|
|
|
| |
We have a known-never-nan value via 'nnan', so an unordered predicate
is the same as its ordered sibling.
Similar to:
rL362937
llvm-svn: 362954
|
|
|
|
|
|
| |
Forgot to translate the predicate clauses in rL362943.
llvm-svn: 362945
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to rL362909:
This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.
I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fsub because they have the
same operand.
llvm-svn: 362943
|
|
|
|
|
|
|
| |
PR42179:
https://bugs.llvm.org/show_bug.cgi?id=42179
llvm-svn: 362937
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A precondition 'x != 0' was forgotten by me:
https://rise4fun.com/Alive/JFNP
https://rise4fun.com/Alive/jHvL
These 4 folds with non-constants could be re-enabled,
but for now let's go for the simplest solution.
https://bugs.llvm.org/show_bug.cgi?id=42198
llvm-svn: 362911
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.
I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fneg (fsub) because they
have the same operand.
This works around the most glaring FMF logical inconsistency cited
in PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
llvm-svn: 362909
|
|
|
|
| |
llvm-svn: 362655
|
|
|
|
|
|
|
|
| |
When the byval attribute has a type, it must match the pointee type of
any parameter; but InstCombine was not updating the attribute when
folding casts of various kinds away.
llvm-svn: 362643
|
|
|
|
|
|
|
|
|
|
| |
It looks this fold was already partially happening, indirectly
via some other folds, but with one-use limitation.
No other fold here has that restriction.
https://rise4fun.com/Alive/ftR
llvm-svn: 362217
|
|
|
|
|
|
| |
https://rise4fun.com/Alive/qJQ
llvm-svn: 362216
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, this used a statement like this:
Map[A] = Map[B];
This is equivalent to the following:
const auto &Src = Map[B];
auto &Dest = Map[A];
Dest = Src;
The second statement, "auto &Dest = Map[A];" can insert a new
element into the DenseMap, which can potentially grow and reallocate
the DenseMap's internal storage, which will invalidate the existing
reference to the source. When doing the actual assignment,
the Src reference is dereferenced, accessing memory that was
freed when the DenseMap grew.
This issue hasn't shown up when LLVM was built with Clang, because
the right hand side ended up dereferenced before evaulating the
left hand side. (If the value type is a larger data type, Clang doesn't
do this but behaves like GCC.)
With GCC, a cast to Value* isn't enough to make it dereference the
right hand side reference before invoking operator[] (while that is
enough to make Clang/LLVM do the right thing for larger types), but
storing it in an intermediate variable in a separate statement works.
This fixes PR42065.
Differential Revision: https://reviews.llvm.org/D62624
llvm-svn: 362150
|
|
|
|
|
|
|
|
|
| |
Based on the overflow direction information added in D62463, we can
now fold always overflowing signed saturating add/sub to signed min/max.
Differential Revision: https://reviews.llvm.org/D62544
llvm-svn: 362006
|
|
|
|
|
|
|
| |
Reduce duplication and make it easier to handle signed
always-overflows conditions in the future.
llvm-svn: 361863
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to fold an always overflowing signed saturating add/sub,
we need to know in which direction the always overflow occurs.
This patch splits up AlwaysOverflows into AlwaysOverflowsLow and
AlwaysOverflowsHigh to pass through this information (but it is
not used yet).
Differential Revision: https://reviews.llvm.org/D62463
llvm-svn: 361858
|
|
|
|
|
|
|
| |
This was found/reduced from a fuzzer report:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14956
llvm-svn: 361729
|
|
|
|
|
|
|
|
|
| |
Extract method to compute overflow based on binop and signedness,
and then make the result handling code generic. This extends the
always-overflow handling to signed muls, but has currently no effect,
as we don't compute always overflow for them (thus NFC).
llvm-svn: 361721
|
|
|
|
|
|
|
| |
Instead pass binary op and signedness. The extra enum only makes
things more complicated in this case.
llvm-svn: 361720
|
|
|
|
|
|
|
|
|
| |
Just a minor refactoring to use the new helper method
DataLayout::typeSizeEqualsStoreSize(). This is done when
checking if getTypeSizeInBits is equal/non-equal to
getTypeStoreSizeInBits.
llvm-svn: 361613
|
|
|
|
|
|
|
|
| |
This was partly handled in InstCombine (only the constant
index case), so delete that and zap it more generally in
InstSimplify.
llvm-svn: 361576
|
|
|
|
|
|
|
|
| |
The out-of-bounds index pattern is handled by InstSimplify,
so the extractelement should be eliminated next time it is
visited.
llvm-svn: 361570
|
|
|
|
|
|
| |
The out-of-bounds index pattern is handled by InstSimplify.
llvm-svn: 361569
|
|
|
|
|
|
|
|
| |
This was part of InstCombine, but it's better placed in
InstSimplify. InstCombine also had an unreachable but weaker
fold for insertelement with undef index, so that is deleted.
llvm-svn: 361559
|
|
|
|
|
|
|
|
|
|
|
| |
This is reduced from a fuzzer test:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14890
Usually, demanded elements should be able to simplify shuffle
mask elements that are pointing to undef elements of its source
operands, but that doesn't happen in the test case.
llvm-svn: 361533
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
into llvm.ceil/floor. Remove some isel patterns that existed because that was happening.
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions. For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.
We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.
Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.
I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.
llvm-svn: 361425
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should be a valid exception to the general rule of not creating new shuffle masks in IR...
because we already do it. :)
Also, DAG combining/legalization will undo this by widening the shuffle back out if needed.
Explanation for how we already do this: SLP or vector source can create chains of insert/extract
as shown in 1 of the examples from PR16739:
https://godbolt.org/z/NlK7rA
https://bugs.llvm.org/show_bug.cgi?id=16739
And we expect instcombine or DAGCombine to clean that up by creating relatively simple shuffles.
Differential Revision: https://reviews.llvm.org/D62024
llvm-svn: 361338
|
|
|
|
| |
llvm-svn: 361197
|
|
|
|
|
|
|
|
| |
Also, break out a helper function, namely foldFNegIntoConstant(...), which performs transforms common between visitFNeg(...) and visitFSub(...).
Differential Revision: https://reviews.llvm.org/D61693
llvm-svn: 361188
|
|
|
|
| |
llvm-svn: 361058
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In D61918 i was looking at dropping it in DAGCombiner `visitShiftByConstant()`,
but as @craig.topper pointed out, it was copied from here.
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
https://rise4fun.com/Alive/ml6
So, why is it there then?
As far as i can tell, it dates all the way back to original check-in rL7793.
I think we should just drop it.
Reviewers: spatel, craig.topper, efriedma, majnemer
Reviewed By: spatel
Subscribers: llvm-commits, craig.topper
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61938
llvm-svn: 361043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a similar match for patterns ending in a truncate. This
should be ok for all targets because the default expansion would
still likely be better from replacing 2 'and' ops with 1.
Attempt to show the logic equivalence in Alive (which doesn't
currently have funnel-shift in its vocabulary AFAICT):
%shamt = zext i8 %i to i32
%m = and i32 %shamt, 31
%neg = sub i32 0, %shamt
%and4 = and i32 %neg, 31
%shl = shl i32 %v, %m
%shr = lshr i32 %v, %and4
%or = or i32 %shr, %shl
=>
%a = and i8 %i, 31
%shamt2 = zext i8 %a to i32
%neg2 = sub i32 0, %shamt2
%and4 = and i32 %neg2, 31
%shl = shl i32 %v, %shamt2
%shr = lshr i32 %v, %and4
%or = or i32 %shr, %shl
https://rise4fun.com/Alive/V9r
llvm-svn: 360605
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61784
llvm-svn: 360461
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZExt instead of calling replaceinstuseswith
The worklist loop that we're returning back to should be able to do the repacement itself. This is how we normally do replacements.
My main motivation was that I observed that we weren't preserving the name of the result when we do this transform. The replacement code in the worklist loop will call takeName as part of the replacement.
Differential Revision: https://reviews.llvm.org/D61695
llvm-svn: 360284
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(X | C1) + C2 --> (X | C1) ^ C1 iff (C1 == -C2)
I verified the correctness using Alive:
https://rise4fun.com/Alive/YNV
This transform enables the following transform that already exists in
instcombine:
(X | Y) ^ Y --> X & ~Y
As a result, the full expected transform is:
(X | C1) + C2 --> X & ~C1 iff (C1 == -C2)
There already exists the transform in the sub case:
(X | Y) - Y --> X & ~Y
However this does not trigger in the case where Y is constant due to an earlier
transform:
X - (-C) --> X + C
With this new add fold, both the add and sub constant cases are handled.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D61517
llvm-svn: 360185
|
|
|
|
|
|
|
|
|
| |
Fundamentally/generally, we should not have to rely on bailouts/crippling of
folds. In this particular case, I think we always recognize the inverted
predicate min/max pattern, so there should not be any loss of optimization.
Codegen looks better because we are eliminating an fneg.
llvm-svn: 360180
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't always get this:
Cond ? -X : -Y --> -(Cond ? X : Y)
...even with the legacy IR form of fneg in the case with extra uses,
and we miss matching with the newer 'fneg' instruction because we
are expecting binops through the rest of the path.
Differential Revision: https://reviews.llvm.org/D61604
llvm-svn: 360075
|
|
|
|
| |
llvm-svn: 360059
|
|
|
|
| |
llvm-svn: 360051
|
|
|
|
| |
llvm-svn: 359973
|