| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're already removing allocsize attributes from Functions that we
remove args from, since removing arguments from a function may make the
allocsize attribute incorrect. It appears we forgot to also remove them
from callsites.
Without this, I get verifier errors on `@Test2`.
It probably wouldn't be too hard to make DAE properly update allocsize
attributes instead of dropping them, but I can't think of a scenario
where that'd be useful in practice.
llvm-svn: 329868
|
| |
|
|
|
|
|
|
| |
Summary:
These tests show that DSE currently does nothing with the atomic memory
intrinsics. Future work will teach DSE how to simplify these.
llvm-svn: 329845
|
| |
|
|
|
|
|
|
|
| |
Summary:
In preparation for a future commit, this regenerates the test checks for
test/Transforms/DeadStoreElimination/OverwriteStoreBegin.ll
test/Transforms/DeadStoreElimination/OverwriteStoreEnd.ll
llvm-svn: 329839
|
| |
|
|
|
|
|
|
|
| |
Summary:
In preparation for a future commit, this regenerates the test checks for
test/Transforms/DeadStoreElimination/simple.ll
test/Transforms/DeadStoreElimination/memintrinsics.ll
llvm-svn: 329824
|
| |
|
|
| |
llvm-svn: 329821
|
| |
|
|
| |
llvm-svn: 329818
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bitwise 'not' of the min/max could be eliminated in the pattern:
%notx = xor i32 %x, -1
%cmp1 = icmp sgt[slt/ugt/ult] i32 %notx, %y
%smax = select i1 %cmp1, i32 %notx, i32 %y
%res = xor i32 %smax, -1
https://rise4fun.com/Alive/lCN
Reviewers: spatel
Reviewed by: spatel
Subscribers: a.elovikov, llvm-commits
Differential Revision: https://reviews.llvm.org/D45317
llvm-svn: 329791
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Samuel Pitoiset
ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.
Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).
v2: - fix regressions in merge-stores.ll and multiple_tails.ll
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329764
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
edge values
This is based on an example that was recently posted on llvm-dev:
void *propagate_null(void* b, int* g) {
if (!b) {
return 0;
}
(*g)++;
return b;
}
https://godbolt.org/g/xYk3qG
The original code or constant propagation in other passes has obscured the fact
that the phi can be removed completely.
Differential Revision: https://reviews.llvm.org/D45448
llvm-svn: 329755
|
| |
|
|
| |
llvm-svn: 329660
|
| |
|
|
|
|
|
|
|
|
| |
Fix PR36484, as suggested:
<quote>
during moves, mark the direct users of the erased things that were phis as "not to be optimized"
<quote>
llvm-svn: 329621
|
| |
|
|
| |
llvm-svn: 329603
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We do not try to move the instructions and split the block till we
know the blocks can be split, i.e. BCE-cmp-insts can be separated from
non-BCE-cmp-insts.
Reviewers: davide, courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44443
llvm-svn: 329564
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In IRCE, we have a very old legacy check that works when we collect comparisons that we
treat as range checks. It ensures that the value against which the indvar is compared is
loop invariant and is also positive.
This latter condition remained there since the times when IRCE was only able to handle
signed latch comparison. As the optimization evolved, it now learned how to intersect
signed or unsigned ranges, and this logic has no reliance on the fact that the right border
of each range should be positive.
The old implementation of this non-negativity check was also naive enough and just looked
into ranges (while most of other IRCE logic tries to use power of SCEV implications), so this
check did not allow to deal with the most simple case that looks like follows:
int size; // not known non-negative
int length; //known non-negative;
i = 0;
if (size != 0) {
do {
range_check(i < size);
range_check(i < length);
++i;
} while (i < size)
}
In this case, even if from some dominating conditions IRCE could parse loop
structure, it could only remove the range check against `length` and simply
ignored the check against `size`.
In this patch we remove this obsolete check. It will allow IRCE to pick comparison
against `size` as a potential range check and then let Range Intersection logic
decide whether it is OK to eliminate it or not.
Differential Revision: https://reviews.llvm.org/D45362
Reviewed By: samparker
llvm-svn: 329547
|
| |
|
|
| |
llvm-svn: 329533
|
| |
|
|
|
|
|
|
| |
There are a pair of folds that try to merge fneg into fsub
with an intervening cast, but as shown in the FIXME tests,
they can create extra instructions.
llvm-svn: 329501
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
See [[ https://bugs.llvm.org/show_bug.cgi?id=36950 | PR36950 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=17564 | PR17564 ]], D45065, D45107
https://godbolt.org/g/iAYRup
Alive proof: https://rise4fun.com/Alive/uiH
Testing: `ninja check-llvm`
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45108
llvm-svn: 329492
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45193
llvm-svn: 329459
|
| |
|
|
|
|
|
| |
As noted in the post-commit discussion for r329350, we shouldn't
generally assume that fsub is the same cost as fneg.
llvm-svn: 329429
|
| |
|
|
| |
llvm-svn: 329418
|
| |
|
|
|
|
|
|
| |
D45344 is proposing to remove the use restriction that made the calloc
transform safe, but it doesn't currently address the problematic example
given inD16337. Add a test to make sure that doesn't break.
llvm-svn: 329412
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fixing an issue where initializations of globals where constructors use
casts were silently translated to 0-initialization.
Reviewers: davidxl, evgeny777
Reviewed By: evgeny777
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45198
llvm-svn: 329409
|
| |
|
|
|
|
|
|
| |
The SimpleLoopUnrollPass isn't suppose to perform loop peeling.
Differential Revision: https://reviews.llvm.org/D45334
llvm-svn: 329395
|
| |
|
|
|
|
|
|
| |
Inserting instrumentation between a musttail call and ret instruction
would create invalid IR. Instead, treat musttail calls as function
exits.
llvm-svn: 329385
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This restores what was lost with rL73243 but without
re-introducing the bug that was present in the old code.
Note that we already have these transforms if the ops are
marked 'fast' (and I assume that's happening somewhere in
the code added with rL170471), but we clearly don't need
all of 'fast' for these transforms.
llvm-svn: 329362
|
| |
|
|
|
|
|
|
| |
A fold for this pattern was removed at rL73243 to fix PR4374:
https://bugs.llvm.org/show_bug.cgi?id=4374
...and apparently there were no tests that went with that fold.
llvm-svn: 329360
|
| |
|
|
|
|
| |
This restores part of the fold that was removed with rL73243 (PR4374).
llvm-svn: 329350
|
| |
|
|
|
|
| |
As requested by spatel in https://reviews.llvm.org/D45329
llvm-svn: 329349
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(D45108, PR36950 / PR17564)
Summary:
More tests for D45108:
* One use tests
* allow shift to be a variable, too
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45329
llvm-svn: 329348
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
chains
Summary:
This is a fix to PR37005.
Essentially, rL328539 ([InstCombine] reassociate loop invariant GEP chains to enable LICM) contains a bug
whereby it will convert:
%src = getelementptr inbounds i8, i8* %base, <2 x i64> %val
%res = getelementptr inbounds i8, <2 x i8*> %src, i64 %val2
into:
%src = getelementptr inbounds i8, i8* %base, i64 %val2
%res = getelementptr inbounds i8, <2 x i8*> %src, <2 x i64> %val
By swapping the index operands if the GEPs are in a loop, and %val is loop variant while %val2
is loop invariant.
This fix recreates new GEP instructions if the index operand swap would result in the type
of %src changing from vector to scalar, or vice versa.
Reviewers: sebpop, spatel
Reviewed By: sebpop
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45287
llvm-svn: 329331
|
| |
|
|
|
|
|
|
| |
There used to be a fold that would handle this case more generally,
but it was removed at rL73243 to fix PR4374:
https://bugs.llvm.org/show_bug.cgi?id=4374
llvm-svn: 329322
|
| |
|
|
|
|
| |
This allows folding for vectors with undef elements.
llvm-svn: 329316
|
| |
|
|
| |
llvm-svn: 329313
|
| |
|
|
|
|
|
|
|
| |
Using cstfp_pred_ty in the definition allows us to match vectors with undef elements.
This replicates the change for m_Not from D44076 / rL326823 and continues
towards making all pattern matchers allow undef elements in vectors.
llvm-svn: 329303
|
| |
|
|
| |
llvm-svn: 329294
|
| |
|
|
|
|
| |
This fixes a buildbot failure.
llvm-svn: 329279
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: samparker, karthikthecool, blitz.opensource
Reviewed By: samparker
Differential Revision: https://reviews.llvm.org/D45209
llvm-svn: 329269
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
LoopInterchange relies on LoopInfo being up-to-date, so we should
preserve it after interchanging. This patch updates restructureLoops to
move the BBs of the interchanged loops to the right place.
Reviewers: davide, efriedma, karthikthecool, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D45278
llvm-svn: 329264
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If the callsite is inside landing pad, do not perform callsite splitting.
Callsite splitting uses utility function llvm::DuplicateInstructionsInSplitBetween, which eventually calls llvm::SplitEdge. llvm::SplitEdge calls llvm::SplitCriticalEdge with an assumption that the function returns nullptr only when the target edge is not a critical edge (and further assumes that if the return value was not nullptr, the predecessor of the original target edge always has a single successor because critical edge splitting was successful). However, this assumtion is not true because SplitCriticalEdge returns nullptr if the destination block is a landing pad. This invalid assumption results assertion failure.
Fundamental solution might be fixing llvm::SplitEdge to not to rely on the invalid assumption. However, it'll involve a lot of work because current API assumes that llvm::SplitEdge never fails. Instead, this patch makes callsite splitting to not to attempt splitting if the callsite is in a landing pad.
Attached test case will crash with assertion failure without the fix.
Reviewers: fhahn, junbuml, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45130
llvm-svn: 329250
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: @llvm.icall.branch.funnel is musttail with variable number of
arguments. After inlining current backend can't separate call targets from call
arguments.
Reviewers: pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45116
llvm-svn: 329235
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well.
This allows the compiler to perform certain optimizations including eliding new/delete calls.
Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer
Reviewed By: bkramer
Subscribers: ckennelly, llvm-commits
Differential Revision: https://reviews.llvm.org/D44769
llvm-svn: 329218
|
| |
|
|
|
|
| |
This reverts commit bee3bbd9bdd3ab3364b8fb0cdb6326bc1ae740e0.
llvm-svn: 329217
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well.
This allows the compiler to perform certain optimizations including eliding new/delete calls.
Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer
Reviewed By: bkramer
Subscribers: ckennelly, llvm-commits
Differential Revision: https://reviews.llvm.org/D44769
llvm-svn: 329215
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
/ PR17564)
Summary: See [[ https://bugs.llvm.org/show_bug.cgi?id=36950 | PR36950 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=17564 | PR17564 ]], D45065, D45108
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45107
llvm-svn: 329198
|
| |
|
|
| |
llvm-svn: 329196
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform, so the branch is
non-uniform.
This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.
As discovered after committing an earlier version of this change, this
exposes a subtle interaction between this pass and DivergenceAnalysis:
since we remove and re-create branch instructions, we can no longer rely
on DivergenceAnalysis for branches in subregions that were already
processed by the pass.
Explicitly remove branch instructions from DivergenceAnalysis to
avoid dangling pointers as a matter of defensive programming, and
change how we detect non-uniform subregions.
Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4
Differential Revision: https://reviews.llvm.org/D43743
llvm-svn: 329165
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch teaches SCEV how to prove implications for SCEVUnknown nodes that are Phis.
If we need to prove `Pred` for `LHS, RHS`, and `LHS` is a Phi with possible incoming values
`L1, L2, ..., LN`, then if we prove `Pred` for `(L1, RHS), (L2, RHS), ..., (LN, RHS)` then we can also
prove it for `(LHS, RHS)`. If both `LHS` and `RHS` are Phis from the same block, it is sufficient
to prove the predicate for values that come from the same predecessor block.
The typical case that it handles is that we sometimes need to prove that `Phi(Len, Len - 1) >= 0`
given that `Len > 0`. The new logic was added to `isImpliedViaOperations` and only uses it and
non-recursive reasoning to prove the facts we need, so it should not hurt compile time a lot.
Differential Revision: https://reviews.llvm.org/D44001
Reviewed By: anna
llvm-svn: 329150
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PostBB has more than 2 predecessors by inserting a new block for the store.
Summary:
Currently merge conditional stores can't handle cases where PostBB (the block we need to move the store to) has more than 2 predecessors.
This patch removes that restriction by creating a new block with only the 2 predecessors we care about and an unconditional branch to the original block. This provides a place to put the store.
Reviewers: efriedma, jmolloy, ABataev
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39760
llvm-svn: 329142
|
| |
|
|
|
|
|
|
| |
The tests marked with 'FIXME' require loosening the check
in SimplifyAssociativeOrCommutative() to optimize completely;
that's still checking isFast() in Instruction::isAssociative().
llvm-svn: 329121
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If an alloca need to be stored in the coroutine frame and it has an alignment specified and the alignment does not match the natural alignment of the alloca type. Insert appropriate padding into the coroutine frame to make sure that it gets requested alignment.
For example for a packet type (which natural alignment is 1), but alloca alignment is 8, we may need to insert a padding field with required number of bytes to make sure it is properly aligned.
```
%PackedStruct = type <{ i64 }>
...
%data = alloca %PackedStruct, align 8
```
If the previous field in the coroutine frame had alignment 2, we would have [6 x i8] inserted before %PackedStruct in the coroutine frame:
```
%f.Frame = type { ..., i16, [6 x i8], %PackedStruct }
```
Reviewers: rnk, lewissbaker, modocache
Reviewed By: modocache
Subscribers: EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D45221
llvm-svn: 329112
|