| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(to `sub A, zext B -> add A, sext B`)
Summary:
D68408 proposes to greatly improve our negation sinking abilities.
But in current canonicalization, we produce `sub A, zext(B)`,
which we will consider non-canonical and try to sink that negation,
undoing the existing canonicalization.
So unless we explicitly stop producing previous canonicalization,
we will have two conflicting folds, and will end up endlessly looping.
This inverts canonicalization, and adds back the obvious fold
that we'd miss:
* `sub [nsw] Op0, sext/zext (bool Y) -> add [nsw] Op0, zext/sext (bool Y)`
https://rise4fun.com/Alive/xx4
* `sext(bool) + C -> bool ? C - 1 : C`
https://rise4fun.com/Alive/fBl
It is obvious that `@ossfuzz_9880()` / `@lshr_out_of_range()`/`@ashr_out_of_range()`
(oss-fuzz 4871) are no longer folded as much, though those aren't really worrying.
Reviewers: spatel, efriedma, t.p.northover, hfinkel
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71064
|
|
|
|
|
|
|
|
| |
The reversion apparently deleted the test/Transforms directory.
Will be re-reverting again.
llvm-svn: 358552
|
|
|
|
|
|
|
|
| |
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
add A, sext(B) --> sub A, zext(B)
We have to choose 1 of these forms, so I'm opting for the
zext because that's easier for value tracking.
The backend should be prepared for this change after:
D57401
rL353433
This is also a preliminary step towards reducing the amount
of bit hackery that we do in IR to optimize icmp/select.
That should be waiting to happen at a later optimization stage.
The seeming regression in the fuzzer test was discussed in:
D58359
We were only managing that fold in instcombine by luck, and
other passes should be able to deal with that better anyway.
llvm-svn: 354748
|
|
|
|
| |
llvm-svn: 352517
|
|
|
|
| |
llvm-svn: 346225
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-trying r344082 because it unintentionally included extra diffs.
Original commit message:
icmp ne (and X, 1), 0 --> trunc X to N x i1
Ideally, we'd do the same for scalars, but there will likely be
regressions unless we add more trunc folds as we're doing here
for vectors.
The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549
define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {
%c = fcmp ole <4 x float> %x, %y
%s = sext <4 x i1> %c to <4 x i32>
%s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
%s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
%cond = or <4 x i32> %s1, %s2
%condtr = trunc <4 x i32> %cond to <4 x i1>
%r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
ret <4 x float> %r
}
Here's a sampling of the vector codegen for that case using
mask+icmp (current behavior) vs. trunc (with this patch):
AVX before:
vcmpleps %xmm1, %xmm0, %xmm0
vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps %xmm0, %xmm1, %xmm0
vandps LCPI0_0(%rip), %xmm0, %xmm0
vxorps %xmm1, %xmm1, %xmm1
vpcmpeqd %xmm1, %xmm0, %xmm0
vblendvps %xmm0, %xmm3, %xmm2, %xmm0
AVX after:
vcmpleps %xmm1, %xmm0, %xmm0
vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps %xmm0, %xmm1, %xmm0
vblendvps %xmm0, %xmm2, %xmm3, %xmm0
AVX512f before:
vcmpleps %xmm1, %xmm0, %xmm0
vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps %xmm0, %xmm1, %xmm0
vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd %zmm1, %zmm0, %k1
vblendmps %zmm3, %zmm2, %zmm0 {%k1}
AVX512f after:
vcmpleps %xmm1, %xmm0, %xmm0
vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps %xmm0, %xmm1, %xmm0
vpslld $31, %xmm0, %xmm0
vptestmd %zmm0, %zmm0, %k1
vblendmps %zmm2, %zmm3, %zmm0 {%k1}
AArch64 before:
fcmge v0.4s, v1.4s, v0.4s
zip1 v1.4s, v0.4s, v0.4s
zip2 v0.4s, v0.4s, v0.4s
orr v0.16b, v1.16b, v0.16b
movi v1.4s, #1
and v0.16b, v0.16b, v1.16b
cmeq v0.4s, v0.4s, #0
bsl v0.16b, v3.16b, v2.16b
AArch64 after:
fcmge v0.4s, v1.4s, v0.4s
zip1 v1.4s, v0.4s, v0.4s
zip2 v0.4s, v0.4s, v0.4s
orr v0.16b, v1.16b, v0.16b
bsl v0.16b, v2.16b, v3.16b
PowerPC-le before:
xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34
PowerPC-le after:
xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0
Differential Revision: https://reviews.llvm.org/D52747
llvm-svn: 344181
|
|
|
|
|
|
| |
This commit accidentally included the diffs from D53057.
llvm-svn: 344178
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
icmp ne (and X, 1), 0 --> trunc X to N x i1
Ideally, we'd do the same for scalars, but there will likely be
regressions unless we add more trunc folds as we're doing here
for vectors.
The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549
define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {
%c = fcmp ole <4 x float> %x, %y
%s = sext <4 x i1> %c to <4 x i32>
%s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
%s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
%cond = or <4 x i32> %s1, %s2
%condtr = trunc <4 x i32> %cond to <4 x i1>
%r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
ret <4 x float> %r
}
Here's a sampling of the vector codegen for that case using
mask+icmp (current behavior) vs. trunc (with this patch):
AVX before:
vcmpleps %xmm1, %xmm0, %xmm0
vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps %xmm0, %xmm1, %xmm0
vandps LCPI0_0(%rip), %xmm0, %xmm0
vxorps %xmm1, %xmm1, %xmm1
vpcmpeqd %xmm1, %xmm0, %xmm0
vblendvps %xmm0, %xmm3, %xmm2, %xmm0
AVX after:
vcmpleps %xmm1, %xmm0, %xmm0
vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps %xmm0, %xmm1, %xmm0
vblendvps %xmm0, %xmm2, %xmm3, %xmm0
AVX512f before:
vcmpleps %xmm1, %xmm0, %xmm0
vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps %xmm0, %xmm1, %xmm0
vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd %zmm1, %zmm0, %k1
vblendmps %zmm3, %zmm2, %zmm0 {%k1}
AVX512f after:
vcmpleps %xmm1, %xmm0, %xmm0
vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps %xmm0, %xmm1, %xmm0
vpslld $31, %xmm0, %xmm0
vptestmd %zmm0, %zmm0, %k1
vblendmps %zmm2, %zmm3, %zmm0 {%k1}
AArch64 before:
fcmge v0.4s, v1.4s, v0.4s
zip1 v1.4s, v0.4s, v0.4s
zip2 v0.4s, v0.4s, v0.4s
orr v0.16b, v1.16b, v0.16b
movi v1.4s, #1
and v0.16b, v0.16b, v1.16b
cmeq v0.4s, v0.4s, #0
bsl v0.16b, v3.16b, v2.16b
AArch64 after:
fcmge v0.4s, v1.4s, v0.4s
zip1 v1.4s, v0.4s, v0.4s
zip2 v0.4s, v0.4s, v0.4s
orr v0.16b, v1.16b, v0.16b
bsl v0.16b, v2.16b, v3.16b
PowerPC-le before:
xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34
PowerPC-le after:
xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0
Differential Revision: https://reviews.llvm.org/D52747
llvm-svn: 344082
|
|
|
|
| |
llvm-svn: 330516
|
|
|
|
|
|
| |
More fixes are needed to enable the helper SimplifyShrShlDemandedBits().
llvm-svn: 300898
|
|
|
|
|
|
|
|
|
|
| |
This fold already existed for vectors but only when 'C1' was a splat
constant (but 'C2' could be any constant).
There were no tests for any vector constants, so I'm adding a test
that shows non-splat constants for both operands.
llvm-svn: 294650
|
|
|
|
| |
llvm-svn: 293816
|
|
|
|
|
|
|
|
|
|
|
| |
Although this is 'no-functional-change-intended', I'm adding tests
for shl-shl and lshr-lshr pairs because there is no existing test
coverage for those folds.
It seems like we should be able to remove some code from foldShiftedShift()
at this point because we're handling those patterns on the general path.
llvm-svn: 293814
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already have this fold when the lshr has one use, but it doesn't need that
restriction. We may be able to remove some code from foldShiftedShift().
Also, move the similar:
(X << C) >>u C --> X & (-1 >>u C)
...directly into visitLShr to help clean up foldShiftByConstOfShiftByConst().
That whole function seems questionable since it is called by commonShiftTransforms(),
but there's really not much in common if we're checking the shift opcodes for every
fold.
llvm-svn: 293215
|
|
|
|
|
|
| |
splat vectors
llvm-svn: 293208
|
|
|
|
| |
llvm-svn: 293205
|
|
|
|
|
|
|
|
| |
constants
Some existing 'FIXME' tests are still not folded because of splat holes in value tracking.
llvm-svn: 292151
|
|
|
|
|
|
|
| |
The shift-shift possibilities became easier to see after:
https://reviews.llvm.org/rL292145
llvm-svn: 292150
|
|
|
|
|
|
| |
Also, add comments and remove bogus comment.
llvm-svn: 292082
|
|
|
|
|
|
| |
splat constant vectors
llvm-svn: 280873
|
|
|
|
|
|
| |
constant vectors
llvm-svn: 279937
|
|
|
|
|
|
| |
constant vectors
llvm-svn: 279677
|
|
|
|
| |
llvm-svn: 278689
|
|
|
|
|
|
|
| |
Note that several of these tests belong in InstSimplify rather than
InstCombine because they return existing operands or constants.
llvm-svn: 278684
|
|
|
|
| |
llvm-svn: 278639
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
functionality change.
This update was done with the following bash script:
find test/Transforms -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP
done
mv $TEMP $NAME
fi
done
llvm-svn: 186268
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Original commit message:
Defer some shl transforms to DAGCombine.
The shl instruction is used to represent multiplication by a constant
power of two as well as bitwise left shifts. Some InstCombine
transformations would turn an shl instruction into a bit mask operation,
making it difficult for later analysis passes to recognize the
constsnt multiplication.
Disable those shl transformations, deferring them to DAGCombine time.
An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'.
These transformations are deferred:
(X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses)
(X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1)
(X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2)
The corresponding exact transformations are preserved, just like
div-exact + mul:
(X >>?,exact C) << C --> X
(X >>?,exact C1) << C2 --> X << (C2-C1)
(X >>?,exact C1) << C2 --> X >>?,exact (C1-C2)
The disabled transformations could also prevent the instruction selector
from recognizing rotate patterns in hash functions and cryptographic
primitives. I have a test case for that, but it is too fragile.
llvm-svn: 155362
|
|
|
|
|
|
|
|
|
| |
While the patch was perfect and defect free, it exposed a really nasty
bug in X86 SelectionDAG that caused an llc crash when compiling lencod.
I'll put the patch back in after fixing the SelectionDAG problem.
llvm-svn: 155181
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The shl instruction is used to represent multiplication by a constant
power of two as well as bitwise left shifts. Some InstCombine
transformations would turn an shl instruction into a bit mask operation,
making it difficult for later analysis passes to recognize the
constsnt multiplication.
Disable those shl transformations, deferring them to DAGCombine time.
An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'.
These transformations are deferred:
(X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses)
(X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1)
(X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2)
The corresponding exact transformations are preserved, just like
div-exact + mul:
(X >>?,exact C) << C --> X
(X >>?,exact C1) << C2 --> X << (C2-C1)
(X >>?,exact C1) << C2 --> X >>?,exact (C1-C2)
The disabled transformations could also prevent the instruction selector
from recognizing rotate patterns in hash functions and cryptographic
primitives. I have a test case for that, but it is too fragile.
llvm-svn: 155136
|
|
|
|
| |
llvm-svn: 155010
|
|
|
|
| |
llvm-svn: 155009
|
|
|
|
|
|
|
|
| |
lshr+ashr instead of trunc+sext. We want to avoid type
conversions whenever possible, it is easier to codegen expressions
without truncates and extensions.
llvm-svn: 93107
|
|
|
|
| |
llvm-svn: 81257
|
|
|
|
|
|
| |
of using llvm-as, now that opt supports this.
llvm-svn: 81226
|
|
|
|
|
|
| |
Upgrade tests to work with new llvm.exp version of llvm_runtest.
llvm-svn: 36013
|
|
llvm-svn: 35288
|