| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 335773
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc
can produce -0.0 where the original code does not:
#include <stdio.h>
int main(int argc) {
float x;
x = -0.8 * argc;
printf("%f\n", (float)((int)x));
return 0;
}
$ clang -O0 -mavx fp.c ; ./a.out
0.000000
$ clang -O1 -mavx fp.c ; ./a.out
-0.000000
Ideally, we'd use IR/node flags to predicate the transform, but the IR parser
doesn't currently allow fast-math-flags on the cast instructions. So for now,
just use the function attribute that corresponds to clang's "-fno-signed-zeros"
option.
Differential Revision: https://reviews.llvm.org/D48085
llvm-svn: 335761
|
|
|
|
|
|
|
|
| |
pow2 expansion
For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs
llvm-svn: 335727
|
|
|
|
|
|
| |
Use the builtin constant folding of getNode() etc. instead of doing it manually.
llvm-svn: 335720
|
|
|
|
|
|
| |
Fixes PR37569.
llvm-svn: 335719
|
|
|
|
|
|
| |
expansion (PR37569)
llvm-svn: 335717
|
|
|
|
| |
llvm-svn: 335652
|
|
|
|
|
|
|
|
| |
(PR37119)
Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported.
llvm-svn: 335637
|
|
|
|
| |
llvm-svn: 335617
|
|
|
|
| |
llvm-svn: 335451
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch has the same motivating example as D48466:
define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) {
%c.0282 = and i32 %c.0282.in, 268435455
%a16 = lshr i64 32508, %x
%a17 = and i64 %a16, 1
%tobool = icmp eq i64 %a17, 0
%. = select i1 %tobool, i32 1, i32 2
%.286 = select i1 %tobool, i32 27, i32 26
%shr97 = lshr i32 %c.0282, %.
%shl98 = shl i32 %c.0282.in, %.286
%or99 = or i32 %shr97, %shl98
%shr100 = lshr i32 %d.0280, %.
%shl101 = shl i32 %d.0280, %.286
%or102 = or i32 %shr100, %shl101
store i32 %or99, i32* %ptr0
store i32 %or102, i32* %ptr1
ret void
}
...but I'm trying to kill the setcc bool math sooner rather than later.
By matching a larger pattern that includes both the low-bit mask and the trailing add/sub,
we can create a universally good fold because we always eliminate the condition code
intermediate value.
Here are Alive proofs for these (currently instcombine folds the 'add' variants, but
misses the 'sub' patterns):
https://rise4fun.com/Alive/Gsyp
Name: sub of zext cmp mask
%a = and i8 %x, 1
%c = icmp eq i8 %a, 0
%z = zext i1 %c to i32
%r = sub i32 C1, %z
=>
%optional_cast = zext i8 %a to i32
%r = add i32 %optional_cast, C1-1
Name: add of zext cmp mask
%a = and i32 %x, 1
%c = icmp eq i32 %a, 0
%z = zext i1 %c to i8
%r = add i8 %z, C1
=>
%optional_cast = trunc i32 %a to i8
%r = sub i8 C1+1, %optional_cast
All of the tests look like improvements or neutral to me. But it is possible that x86
test+set+bitop is better than what we now show here. I suspect we could do better by
adding another fold for the 'sub' variants.
We start with select-of-constant in IR in the larger motivating test, so that's why I
included tests with selects. Proofs for those variants:
https://rise4fun.com/Alive/Bx1
Name: true const is bigger
Pre: C2 == (C1 + 1)
%a = and i8 %x, 1
%c = icmp eq i8 %a, 0
%r = select i1 %c, i64 C2, i64 C1
=>
%z = zext i8 %a to i64
%r = sub i64 C2, %z
Name: false const is bigger
Pre: C2 == (C1 + 1)
%a = and i8 %x, 1
%c = icmp eq i8 %a, 0
%r = select i1 %c, i64 C1, i64 C2
=>
%z = zext i8 %a to i64
%r = add i64 C1, %z
Differential Revision: https://reviews.llvm.org/D48466
llvm-svn: 335433
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allowed folding for "and/or" binops with non-constant operand if
arguments of select are 0/-1 values.
Normally this code with "and" opcode does not get to a DAG combiner
and simplified yet in the InstCombine. However AMDGPU produces it
during lowering and InstCombine has no chance to optimize it out.
In turn the same pattern with "or" opcode can reach DAG.
Differential Revision: https://reviews.llvm.org/D48301
llvm-svn: 335250
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The alignment parameter to getExtLoad is treated as a base alignment,
not the alignment of the load (base + offset). When we infer a better
alignment for a Ptr we need to ensure that it applies to the base to
prevent the alignment on the load from being wrong.
This fixes a bug where the alignment could then be used to incorrectly
prove noalias between a load and a store, leading to a miscompile.
Differential Revision: https://reviews.llvm.org/D48029
llvm-svn: 335210
|
|
|
|
|
|
|
|
|
|
| |
Previously this folding was done only if select is a first operand.
However, for non-commutative operations constant may go before
select.
Differential Revision: https://reviews.llvm.org/D48223
llvm-svn: 335167
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Check that and masks are strictly smaller than implicit mask from
narrowed load.
Fixes PR37820.
Reviewers: samparker, RKSimon, nemanjai
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48335
llvm-svn: 335137
|
|
|
|
|
|
| |
obvious what they are. NFC
llvm-svn: 335095
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar
Reviewed By: spatel
Subscribers: wdng, nhaehnle
Differential Revision: https://reviews.llvm.org/D47909
llvm-svn: 334996
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Refactoring for all constant cases which require AllowNewConst and some staging for future fmf usage.
Reviewers: spatel, hfinkel, wristow
Reviewed By: spatel
Subscribers: nhaehnle
Differential Revision: https://reviews.llvm.org/D48289
llvm-svn: 334984
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai
Reviewed By: rampitec, nhaehnle
Subscribers: tpr, nemanjai, wdng
Differential Revision: https://reviews.llvm.org/D47918
llvm-svn: 334876
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: wdng, nhaehnle
Differential Revision: https://reviews.llvm.org/D47954
llvm-svn: 334862
|
|
|
|
|
|
|
| |
Test passes as is, but fails with future patch to make v4i16/v4f16
legal.
llvm-svn: 334823
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Here we relax the old constraint which utilized unsafe with the TargetOption flag HonorSignDependentRoundingFPMathOption, with the assertion that unsafe is no longer needed or never was required for correctness on FDIV/FMUL.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar
Reviewed By: spatel
Subscribers: efriedma, wdng, tpr
Differential Revision: https://reviews.llvm.org/D48057
llvm-svn: 334769
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: A FMF constraint is added to FADD with unsafe still available as the fallback
Reviewers: spatel, wristow, arsenm, hfinkel
Reviewed By: spatel
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D48180
llvm-svn: 334753
|
|
|
|
|
|
|
|
|
|
|
| |
We're constant folding here, so we shouldn't check uses. This matches
the IR optimizer behavior.
The x86 test shows the expected win. The AArch64 test shows something
else. This only seems to happen if the "generic" AArch64 CPU model is
used by MachineCombiner, so I'll file a bug report to follow-up.
llvm-svn: 334608
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D47831
llvm-svn: 334553
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fmul.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: nhaehnle, wdng
Differential Revision: https://reviews.llvm.org/D47911
llvm-svn: 334514
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement default legalization of rotates: either in terms of the rotation
in the opposite direction (if legal), or in terms of shifts and ors.
Implement generating of rotate instructions for Hexagon. Hexagon only
supports rotates by an immediate value, so implement custom lowering of
ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion.
Differential Revision: https://reviews.llvm.org/D47725
llvm-svn: 334497
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This would fail before because 1x vectors aren't legal,
so instead just use the scalar type.
Avoids regressions in a future AMDGPU commit to add
v4i16/v4f16 as legal types.
Test update is just the one test that this triggers
on in tree now. It wasn't checking anything before.
The result is completely changed since the selects
are eliminated. Not sure if it's considered better
or not.
llvm-svn: 334440
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR37427)
This patch started off much more general and ambitious, but it's been a nightmare
seeing all the ways x86 vector codegen can go wrong.
So the code is still structured to allow extending easily, but it's currently
limited in several ways:
1. Only handle cases with an extending load.
2. Only handle cases with a zero constant compare.
3. Ignore setcc with vector bitmask (SetCCWidth != 1) - so AVX512 should be unaffected.
The motivating case from PR37427:
https://bugs.llvm.org/show_bug.cgi?id=37427
...is the 1st test, and that shows the expected win - we eliminated the unnecessary
intermediate cast.
There's a clear regression in the last test (sgt_zero_fp_select) because we longer
recognize a 'SHRUNKBLEND' opportunity. I think that general problem is also present
in sgt_zero, so I'll try to fix that in a follow-up. We need to match a sign-bit
setcc from a sign-extended operand and remove it.
Differential Revision: https://reviews.llvm.org/D47330
llvm-svn: 334378
|
|
|
|
|
|
|
|
| |
SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work.
These places were found by hiding the begin/end methods in the SmallSet forwarding
llvm-svn: 334343
|
|
|
|
| |
llvm-svn: 334312
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fsub.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D47910
llvm-svn: 334306
|
|
|
|
|
|
|
|
|
|
| |
While trying to propagate AND masks back to loads, we currently allow
one non-load node to be included as a leaf in chain. This fix now
limits that node to produce only a single data value.
Differential Revision: https://reviews.llvm.org/D47878
llvm-svn: 334268
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode.
Reviewers: spatel, arsenm, hfinkel, javed.absar
Reviewed By: spatel
Subscribers: nemanjai, wdng
Differential Revision: https://reviews.llvm.org/D47388
llvm-svn: 334242
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This avoids regressions in a future AMDGPU change
to make v4i16/v4f16 legal. For these types, build_vector
is implemented as bitcasted operations on v2i32. This
combine was creating v4i16s out of what would have been
already been a v2i32 build_vector, creating a mess
of nodes that never get cleaned up.
I'm not sure this is the right condition to check.
I initially tried just checking for the legality of the
new build_vector. This works for my case, but breaks dozens
of x86 tests. A Mips test seems to show some improvement
or at least a neutral change. I don't want to think
about how long it would take to analyze the set of
different x86 vector operations impacted.
Test included in future commit.
llvm-svn: 334218
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
It contains only context for fsqrt.
Reviewers: spatel, hfinkel, arsenm
Reviewed By: spatel
Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai
Differential Revision: https://reviews.llvm.org/D47749
llvm-svn: 334113
|
|
|
|
| |
llvm-svn: 333967
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This include variant for add, uaddo and addcarry. usubo and subcarry require the carry to be flipped to preserve semantic, but we chose to do the transform anyway in that case as to push the transform down the carry chain.
Reviewers: efriedma, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46505
llvm-svn: 333943
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: It has been deprecated in favor of SETCCCARRY for a year now and isn't used by any in tree backend.
Reviewers: efriedma, craig.topper, dblaikie, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47685
llvm-svn: 333939
|
|
|
|
| |
llvm-svn: 333907
|
|
|
|
| |
llvm-svn: 333766
|
|
|
|
| |
llvm-svn: 333765
|
|
|
|
|
|
| |
Candidate check precludes this check.
llvm-svn: 333764
|
|
|
|
|
|
| |
Do not consider store sizes large than the maximum legal store size.
llvm-svn: 333763
|
|
|
|
|
|
|
|
|
| |
Additionally, implement handling of ADD/SUBCARRY on Hexagon, utilizing
the UADDO/USUBO expansion.
Differential Revision: https://reviews.llvm.org/D47559
llvm-svn: 333751
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As pointed out in D46528, we errneously transform cases like `xor X, -1`,
even though we use said function.
It's because the `-1` is actually a bitcast there.
So i think we can just look through it in the function.
Differential Revision: https://reviews.llvm.org/D47156
llvm-svn: 332905
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].
[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated (see `@in`), and we need to make sure that they are generated.
Differential Revision: https://reviews.llvm.org/D46528
llvm-svn: 332904
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
their amount masking modified by simplifyDemandedBits
SimplifyDemandedBits can remove bits from the masks for the shift amounts we need to see to detect rotates.
This patch uses zeroes from computeKnownBits to fill in some of these mask bits to make the match work.
As currently written this calls computeKnownBits even when the mask hasn't been simplified because it made the code simpler. If we're worried about compile time performance we can improve this.
I know we're talking about making a rotate intrinsic, but hopefully we can go ahead and do this change and just make sure the rotate intrinsic also handles it.
Differential Revision: https://reviews.llvm.org/D47116
llvm-svn: 332895
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of merging stores we check that fusing the nodes does not
cause a cycle due to one candidate store being indirectly dependent on
another store (this may happen via chained memory copies). This is
done by searching if a store is a predecessor to another store's
value.
Prune the search at the candidate search's root node which is a
predecessor to all candidate stores. This reduces the
size of the subgraph searched in large basic blocks.
Reviewers: jyknight
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D46955
llvm-svn: 332490
|
|
|
|
| |
llvm-svn: 332489
|