| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
We don't unroll vector loops for MVE targets, but we miss the case
when loops only contain intrinsic calls. So just move the logic a
bit to catch this case.
Differential Revision: https://reviews.llvm.org/D72440
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch updates the shape propagation to iterate until no new shape
information is discovered.
As initial seed for the forward propagation, we use the matrix intrinsic
instructions. Both propagateShapeForward and propagateShapeBackward
return new work lists, with the instructions to be used for the next
iteration. When propagating forward, we record all instructions we added
new shape information for. When propagating backward, we record all
users of instructions we added new shape information for.
Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70901
|
|
|
|
|
|
|
|
|
|
|
| |
This patch extends to shape propagation to also include load
instructions and implements shape aware lowering for vector loads.
Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70900
|
|
|
|
|
|
|
|
|
|
|
| |
This patch extends the shape propagation for matrix operations to also
propagate the shape of instructions to their operands.
Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70899
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This addresses a vectorisation regression for tail-folded loops that are
counting down, e.g. loops as simple as this:
void foo(char *A, char *B, char *C, uint32_t N) {
while (N > 0) {
*C++ = *A++ + *B++;
N--;
}
}
These are loops that can be vectorised, but when tail-folding is requested, it
can't find a primary induction variable which we do need for predicating the
loop. As a result, the loop isn't vectorised at all, which it is able to do
when tail-folding is not attempted. So, this adds a check for the primary
induction variable where we decide how to lower the scalar epilogue. I.e., when
there isn't a primary induction variable, a scalar epilogue loop is allowed
(i.e. don't request tail-folding) so that vectorisation could still be
triggered.
Having this check for the primary induction variable make sense anyway, and in
addition, in a follow-up of this I will look into discovering earlier the
primary induction variable for counting down loops, so that this can also be
tail-folded.
Differential revision: https://reviews.llvm.org/D72324
|
|
|
|
|
|
| |
Before we manually inserted unreachable early but that could lead to
broken PHI nodes. Now we use the existing late modification
functionality.
|
|
|
|
|
|
|
| |
When we replace instructions with unreachable we delete instructions. We
now avoid dangling pointers to those deleted instructions in the
`ToBeChangedToUnreachableInsts` set. Other modification collections
might need to be updated in the future as well.
|
|
|
|
|
|
|
|
| |
It looks like my patch breaks the sanitizer-windows build:
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/56324
This reverts commit ead815924e6ebeaf02c31c37ebf7a560b5fdf67b.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The added testcase shows the current transformation for the operation
Z / (1.0 / Y), which remains unchanged. This will be updated to align
with the transformed code (Y * Z) with D72319.
The existing transformation Z / (X / Y) => (Y * Z) / X is not handling
this case as there are multiple uses for (1.0 / Y) in this testcase.
Patch by: @raghesh (Raghesh Aloor)
Differential Revision: https://reviews.llvm.org/D72388
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch teaches JumpThreading.cpp to thread through two basic
blocks like:
bb3:
%var = phi i32* [ null, %bb1 ], [ @a, %bb2 ]
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %bb4, label ...
bb4:
%cmp = icmp eq i32* %var, null
br i1 %cmp, label bb5, label bb6
by duplicating basic blocks like bb3 above. Once we duplicate bb3 as
bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have:
bb3:
%var = phi i32* [ @a, %bb2 ]
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %bb4, label ...
bb3.dup:
%var = phi i32* [ null, %bb1 ]
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %bb4, label ...
bb4:
%cmp = icmp eq i32* %var, null
br i1 %cmp, label bb5, label bb6
Then the existing code in JumpThreading.cpp can thread edge
bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5.
Reviewers: wmi
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70247
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit a041c4ec6f7aa659b235cb67e9231a05e0a33b7d.
This looks like a non-trivial change and there has been no code
reviews (at least there were no phabricator revisions attached to the
commit description). It is also causing a regression in one of our
downstream integration tests, we haven't been able to come up with a
minimal reproducer yet.
|
|
|
|
|
|
| |
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D64869
|
|
|
|
|
|
| |
Factor out common logic into some reasonable commented helper functions. In the process, ensure that the in-block vs cross-block cases are handled the same. They previously weren't.
Differential Revision: https://reviews.llvm.org/D67126
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
not (select ?, (cmp TPred, ?, ?), (cmp FPred, ?, ?) -->
select ?, (cmp TPred', ?, ?), (cmp FPred', ?, ?)
If both sides of the select are cmps, we can remove an instruction.
The case where only side is a cmp is deferred to a possible
follow-on patch.
We have a more general 'isFreeToInvert' analysis, but I'm not seeing
a way to use that more widely without inducing infinite looping
(opposing transforms).
Here, we flip the compare predicates directly, so we should not have
any danger by creating extra intermediate 'not' ops.
Alive proofs:
https://rise4fun.com/Alive/jKa
Name: both select values are compares - invert predicates
%tcmp = icmp sle i32 %x, %y
%fcmp = icmp ugt i32 %z, %w
%sel = select i1 %cond, i1 %tcmp, i1 %fcmp
%not = xor i1 %sel, true
=>
%tcmp_not = icmp sgt i32 %x, %y
%fcmp_not = icmp ule i32 %z, %w
%not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not
Name: false val is compare - invert/not
%fcmp = icmp ugt i32 %z, %w
%sel = select i1 %cond, i1 %tcmp, i1 %fcmp
%not = xor i1 %sel, true
=>
%tcmp_not = xor i1 %tcmp, -1
%fcmp_not = icmp ule i32 %z, %w
%not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not
Differential Revision: https://reviews.llvm.org/D72007
|
|
|
|
|
|
|
|
|
|
|
| |
Don't overwrite existing target-cpu attributes.
I've often found the replacement behavior annoying, and this is
inconsistent with how the fast math command line flags interact with
the function attributes.
Does not yet change target-features, since I think that should behave
as a concatenation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and other floating point type
In https://reviews.llvm.org/D67148, we use isFloatTy to test floating
point type, otherwise we return GPRRC.
So 'double' will be classified as GPRRC, which is not accurate.
This patch covers other floating point types.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D71946
|
|
|
|
|
|
|
|
| |
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D72143
Patch by Kazuaki Ishizaki.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Make `AAMDNodes`' `getAAMetadata()` and `setAAMetadata()` to take `!tbaa.struct`
into account as well as `!tbaa`. This impacts llvm.org/pr42022.
This is a temprorary fix needed to keep `!tbaa.struct` tag by SROA pass.
New field `TBAAStruct` should be deleted when `!tbaa` tag replaces `!tbaa.struct`.
Merging two `!tbaa.struct`'s to one is conservatively considered to be `nullptr`
(giving `MayAlias`) -- this could be enhanced, but relying on the said future
replacement.
Reviewers: RKSimon, spatel, vporpo
Subscribers: hiraditya, kosarev, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70924
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
simplifyTerminatorLeadingToRet
Summary:
In addMustTailToCoroResumes, we set musttail on those resume instructions that are followed by a ret instruction. This is done by simplifyTerminatorLeadingToRet which replace a sequence of branches leading to a ret with a clone of the ret.
However it forgets to remove corresponding PHI values that come from basic block of replaced branch, and may cause jumpthreading pass hangs (https://bugs.llvm.org/show_bug.cgi?id=43720)
This patch fix this issue
Test Plan:
cppcoro library with O3+flto
check-llvm
Reviewers: modocache, GorNishanov, lewissbaker
Reviewed By: modocache
Subscribers: mehdi_amini, EricWF, hiraditya, dexonsmith, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71826
Patch by junparser (JunMa)!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR44426)
This decreases use count of %Op0, makes one hand of select to be 0,
and possibly exposes further folding potential.
Name: sub %Op0, (select %Cond, %Op0, %FalseVal) -> select %Cond, 0, (sub %Op0, %FalseVal)
%Op0 = %TrueVal
%o = select i1 %Cond, i8 %Op0, i8 %FalseVal
%r = sub i8 %Op0, %o
=>
%n = sub i8 %Op0, %FalseVal
%r = select i1 %Cond, i8 0, i8 %n
Name: sub %Op0, (select %Cond, %TrueVal, %Op0) -> select %Cond, (sub %Op0, %TrueVal), 0
%Op0 = %FalseVal
%o = select i1 %Cond, i8 %TrueVal, i8 %Op0
%r = sub i8 %Op0, %o
=>
%n = sub i8 %Op0, %TrueVal
%r = select i1 %Cond, i8 %n, i8 0
https://rise4fun.com/Alive/aHRt
https://bugs.llvm.org/show_bug.cgi?id=44426
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=44426
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This decreases use count of %Op1, makes one hand of select to be 0,
and possibly exposes further folding potential.
Name: sub (select %Cond, %Op1, %FalseVal), %Op1 -> select %Cond, 0, (sub %FalseVal, %Op1)
%Op1 = %TrueVal
%o = select i1 %Cond, i8 %Op1, i8 %FalseVal
%r = sub i8 %o, %Op1
=>
%n = sub i8 %FalseVal, %Op1
%r = select i1 %Cond, i8 0, i8 %n
Name: sub (select %Cond, %TrueVal, %Op1), %Op1 -> select %Cond, (sub %TrueVal, %Op1), 0
%Op1 = %FalseVal
%o = select i1 %Cond, i8 %TrueVal, i8 %Op1
%r = sub i8 %o, %Op1
=>
%n = sub i8 %TrueVal, %Op1
%r = select i1 %Cond, i8 %n, i8 0
https://rise4fun.com/Alive/avL
https://bugs.llvm.org/show_bug.cgi?id=44426
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=44426
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
memory usage.
Summary:
For artificial cases (huge array, few usages), Global SRA optimization creates
a lot of redundant data. It creates an instance of GlobalVariable for each array
element. For huge array, that means huge compilation time and huge memory usage.
Following example compiles for 10 minutes and requires 40GB of memory.
namespace {
char LargeBuffer[64 * 1024 * 1024];
}
int main ( void ) {
LargeBuffer[0] = 0;
printf("\n ");
return LargeBuffer[0] == 0;
}
The fix is to avoid Global SRA for large arrays.
Reviewers: craig.topper, rnk, efriedma, fhahn
Reviewed By: rnk
Subscribers: xbolva00, lebedev.ri, lkail, merge_guards_bot, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71993
|
|
|
|
| |
Add two tests to reg-usage.ll
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This decreases use count of Op1, potentially allows
us to further hoist said 'neg' later on,
and results in marginally better X86 codegen.
Name: (Op1 & С) - Op1 -> -(Op1 & ~C)
%o = and i64 %Op1, C1
%r = sub i64 %o, %Op1
=>
%n = and i64 %Op1, ~C1
%r = sub i64 0, %n
https://rise4fun.com/Alive/rwgA
https://godbolt.org/z/R_RMfM
https://bugs.llvm.org/show_bug.cgi?id=44427
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Name: (X & (- Y)) - X -> - (X & (Y - 1)) (PR44448)
%negy = sub i8 0, %y
%unbiasedx = and i8 %negy, %x
%r = sub i8 %unbiasedx, %x
=>
%ymask = add i8 %y, -1
%xmasked = and i8 %ymask, %x
%r = sub i8 0, %xmasked
https://rise4fun.com/Alive/OIpla
This decreases use count of %x, may allow us to
later hoist said negation even further,
and results in marginally nicer X86 codegen.
See
https://bugs.llvm.org/show_bug.cgi?id=44448
https://reviews.llvm.org/D71499
|
|
|
|
|
| |
As discussed in https://bugs.llvm.org/show_bug.cgi?id=44448,
we can hoist negation out of the pattern.
|
|
|
|
|
|
|
| |
If we replace a function with a new one because we rewrite the
signature, dead users may still refer to the old version. With this
patch we reuse the code that deals with dead functions, which the old
versions are, to avoid problems.
|
|
|
|
|
| |
An integer isn't allowed in getAlignmentForValue so we need to stop at a
ptr2int instruction during exploration.
|
|
|
|
|
|
|
| |
An inbounds GEP results in poison if the value is not "inbounds", not in
UB. We accidentally derived nonnull and dereferenceable from these
inbounds GEPs even in the absence of accesses that would make the poison
to UB.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction.
While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration.
In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container.
Reviewers: fhahn, bcahoon, efriedma
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D65326
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
folds (PR44383)
As shown in P44383:
https://bugs.llvm.org/show_bug.cgi?id=44383
...we can't safely propagate a vector constant through this icmp fold
if that vector constant contains undefined elements.
We know that each defined element of the constant is safe though, so
find the first of those and replicate it into the formerly undef lanes.
Differential Revision: https://reviews.llvm.org/D72101
|
|
|
|
|
|
| |
constant range"
This reverts commit e9963034314edf49a12ea5e29f694d8f9f52734a.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a less ambitious alternative to previous attempts to fix
this bug with:
rG56b2aee1875a
rGef02831f0a4e
rG56b2aee1875a
...because those all failed bot testing with use-after-free or
other problems.
The original crashing/assert problem is still showing up on
various fuzzers, so I've added a new minimal test based on
another one of those failures.
Instead of trying to manage and coordinate the logic in
isAllocSiteRemovable() with the deletion loops, just loosen
the existing code that handles casts and GEP by replacing
with undef to allow other opcodes. That means that no
instructions with uses should assert on deletion, and there
are hopefully no non-obvious sanitizer bugs induced.
|
|
|
|
|
|
|
|
| |
This addresses https://bugs.llvm.org/show_bug.cgi?id=44423.
If one of the GEPs is inbounds and the other is zero-index,
we can also preserve inbounds.
Differential Revision: https://reviews.llvm.org/D72060
|
|
|
|
|
|
|
|
| |
This fixes https://bugs.llvm.org/show_bug.cgi?id=44425. We need to
drop inbounds if one of the GEPs is not inbounds. This was already
done when creating a new GEP, but not when modifying in place.
Differential Revision: https://reviews.llvm.org/D72059
|
| |
|
| |
|
|
|
|
| |
Tests for PR44419.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point.
One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument).
The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty).
Currently, AAValueConstantRange is created when AAValueSimplify cannot
simplify the value.
Supported
- BinaryOperator(add, sub, ...)
- CmpInst(icmp eq, ...)
- !range metadata
`AAValueConstantRange` is not intended to extend to polyhedral range value analysis.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D71620
|
|
|
|
|
|
|
|
|
|
|
|
| |
and pext
The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result.
There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted.
Fixes PR44389
Differential Revision: https://reviews.llvm.org/D71952
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This does not solve PR17101, but it is one of the
underlying diffs noted here:
https://bugs.llvm.org/show_bug.cgi?id=17101#c8
We could ease the one-use checks for the 'clear'
(no 'not' op) half of the transform, but I do not
know if that asymmetry would make things better
or worse.
Proofs:
https://rise4fun.com/Alive/uVB
Name: masked bit set
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp ne i32 %and, 0
%r = zext i1 %cmp to i32
=>
%s = lshr i32 %x, %y
%r = and i32 %s, 1
Name: masked bit clear
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp eq i32 %and, 0
%r = zext i1 %cmp to i32
=>
%xn = xor i32 %x, -1
%s = lshr i32 %xn, %y
%r = and i32 %s, 1
|
| |
|
|
|
|
|
|
| |
This reverts commit 27a0795943fee0f30b995fe5165428afc2dfd402.
Seems to break test-suite.
|