| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For some unclear reason rewriteLoopExitValues considers recalculation
after the loop profitable if it has some "soft uses" outside the loop (i.e. any
use other than call and return), even if we have proved that it has a user inside
the loop which we think will not be optimized away.
There is no existing unit test that would explain this. This patch provides an
example when rematerialisation of exit value is not profitable but it passes
this check due to presence of a "soft use" outside the loop.
It makes no sense to recalculate value on exit if we are going to compute it
due to some irremovable within the loop. This patch disallows applying this
transform in the described situation.
Differential Revision: https://reviews.llvm.org/D51581
Reviewed By: etherzhhb
llvm-svn: 345708
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
optsize using masked wide loads
Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53668
llvm-svn: 345705
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Turns out it's not always possible to figure out whether an asm()
statement argument points to a valid memory region.
One example would be per-CPU objects in the Linux kernel, for which the
addresses are calculated using the FS register and a small offset in the
.data..percpu section.
To avoid pulling all sorts of checks into the instrumentation, we replace
actual checking/unpoisoning code with calls to
msan_instrument_asm_load(ptr, size) and
msan_instrument_asm_store(ptr, size) functions in the runtime.
This patch doesn't implement the runtime hooks in compiler-rt, as there's
been no demand in assembly instrumentation for userspace apps so far.
llvm-svn: 345702
|
| |
|
|
|
|
|
|
| |
This is modeled after C++17 std::empty().
Differential Revision: https://reviews.llvm.org/D53909
llvm-svn: 345679
|
| |
|
|
| |
llvm-svn: 345647
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
noop casts
InstCombine features an optimization that essentially replaces:
if (a)
free(a)
into:
free(a)
Right now, this optimization is gated by the minsize attribute and therefore
we only perform it if we can prove that we are going to be able to eliminate
the branch and the destination block.
However when casts are involved the optimization would fail to apply, because
the optimization was not smart enough to realize that it is possible to also
move the casts away from the destination block and that is harmless to the
performance since they are just noops.
E.g.,
foo(int *a)
if (a)
free((char*)a)
Wouldn't be optimized by instcombine, because
- We would refuse to hoist the `bitcast i32* %a to i8` in the source block
- We would fail to see that `bitcast i32* %a to i8` and %a are the same value.
This patch fixes both these problems:
- It teaches the pattern matching of the comparison how to look
through casts.
- It checks that whether the additional instruction in the destination block
can be hoisted and are harmless performance-wise.
- It hoists all the code of the destination block in the source block.
Differential Revision: D53356
llvm-svn: 345644
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
After commit https://reviews.llvm.org/rL344228, the function definitions have a counter but when on one line the counter is wrong (e.g. void foo() { })
I added a test in: https://reviews.llvm.org/D53601
Reviewers: marco-c
Reviewed By: marco-c
Subscribers: llvm-commits, sylvestre.ledru
Differential Revision: https://reviews.llvm.org/D53600
llvm-svn: 345624
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.
Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination!
I've done my best to fix a number of vectorizer uses:
SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment.
LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta.
Differential Revision: https://reviews.llvm.org/D53573
llvm-svn: 345617
|
| |
|
|
| |
llvm-svn: 345613
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shuffle (insert ?, Scalar, IndexC), V1, Mask --> insert V1, Scalar, IndexC'
The motivating case is at least a couple of steps away: I noticed that
SLPVectorizer does not analyze shuffles as well as sequences of
insert/extract in PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724
...so SLP may fail to vectorize when source code has shuffles to start
with or instcombine has converted insert/extract to shuffles.
Independent of that, an insertelement is always a simpler op for IR
analysis vs. a shuffle, so we should transform to insert when possible.
I don't think there's any codegen concern here - if a target can't insert
a scalar directly to some fixed element in a vector (x86?), then this
should get expanded to the insert+shuffle that we started with.
Differential Revision: https://reviews.llvm.org/D53507
llvm-svn: 345607
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit is a combination of two patches:
* "Fix in getScalarizationOverhead()"
If target returns false in TTI.prefersVectorizedAddressing(), it means the
address registers will not need to be extracted. Therefore, there should
be no operands scalarization overhead for a load instruction.
* "Don't pass the instruction pointer from getMemInstScalarizationCost."
Since VF is always > 1, this is a cost query for an instruction in the
vectorized loop and it should not be evaluated within the scalar
context of the instruction.
Review: Ulrich Weigand, Hal Finkel
https://reviews.llvm.org/D52351
https://reviews.llvm.org/D52417
llvm-svn: 345603
|
| |
|
|
|
|
|
|
|
|
| |
This fixes an assertion when constant folding a GEP when the part of the offset
was in i32 (IndexSize, as per DataLayout) and part in the i64 (PointerSize) in
the newly created test case.
Differential Revision: https://reviews.llvm.org/D52609
llvm-svn: 345585
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It can be profitable to outline single-block cold regions because they
may be large.
Allow outlining single-block regions if they have over some threshold of
non-debug, non-terminator instructions. I chose 3 as the threshold after
experimenting with several internal frameworks.
In practice, reducing the threshold further did not give much
improvement, whereas increasing it resulted in substantial regressions.
Differential Revision: https://reviews.llvm.org/D53824
llvm-svn: 345524
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As K has to dominate I, IIUC I's range metadata must be a subset of
K's. After Eli's recent clarification to the LangRef, loading a value
outside of the range is undefined behavior.
Therefore if I's range contains elements outside of K's range and we would load
one such value, K would cause undefined behavior.
In cases like hoisting/sinking, we still want the most generic range
over all code paths to/from the hoist/sink point. As suggested in the
patches related to D47339, I will refactor the handling of those
scenarios and try to decouple it from this function as follow up, once
we switched to a similar handling of metadata in most of
combineMetadata.
I updated some tests checking mostly the merging of metadata to keep the
metadata of to dominating load. The most interesting one is probably test8 in
test/Transforms/JumpThreading/thread-loads.ll. It contained a comment
about the alias metadata preventing us to eliminate the branch, but it
seem like the actual problem currently is that we merge the ranges of
both loads and cannot eliminate the icmp afterwards. With this patch, we
manage to eliminate the icmp, as the range of the first load excludes 8.
Reviewers: efriedma, nlopes, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D51629
llvm-svn: 345456
|
| |
|
|
| |
llvm-svn: 345454
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
passes with -fsanitize=address"
This reverts commit 8d6af840396f2da2e4ed6aab669214ae25443204 and commit
b78d19c287b6e4a9abc9fb0545de9a3106d38d3d which causes slower build times
by initializing the AddressSanitizer on every function run.
The corresponding revisions are https://reviews.llvm.org/D52814 and
https://reviews.llvm.org/D52739.
llvm-svn: 345433
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The visitICmp analysis function would record compares of pointer types, as size 0. This causes the resulting memcmp() call to have the wrong total size.
Found with "self-build" of clang/LLVM on Windows.
Reviewers: christylee, trentxintong, courbet
Reviewed By: courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53536
llvm-svn: 345413
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support of `llvm.experimental.guard` intrinsics to non-trivial
simple loop unswitching. These intrinsics represent implicit control flow which
has pretty much the same semantics as usual conditional branches. The
algorithm of dealing with them is following:
- Consider guards as unswitching candidates;
- If a guard is considered the best candidate, turn it into a branch;
- Apply normal unswitching algorithm on this branch.
The patch has no compile time effect on code that does not contain any guards.
Differential Revision: https://reviews.llvm.org/D53744
Reviewed By: chandlerc
llvm-svn: 345387
|
| |
|
|
|
|
|
|
|
|
|
| |
We should be able to make all relevant checks before we actually start the non-trivial
unswitching, so that we could guarantee that once we have started the transform,
it will always succeed.
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D53747
llvm-svn: 345375
|
| |
|
|
|
|
|
|
|
| |
Replacing BinaryOperator::isFNeg(...) to avoid regressions when we
separate FNeg from the FSub IR instruction.
Differential Revision: https://reviews.llvm.org/D53650
llvm-svn: 345295
|
| |
|
|
|
|
|
|
| |
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines.
Differential Revision: https://reviews.llvm.org/D53287
llvm-svn: 345250
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: craig.topper, andrew.w.kaylor, efriedma
Reviewed By: craig.topper, andrew.w.kaylor, efriedma
Differential Revision: https://reviews.llvm.org/D52709
llvm-svn: 345248
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Teach LoopRotate to preserve MemorySSA.
Enable tests for correctness, dependency disabled by default.
Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits
Differential Revision: https://reviews.llvm.org/D51718
llvm-svn: 345216
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current splitting algorithm works in three stages:
1) Identify cold blocks, then
2) Use forward/backward propagation to mark hot blocks, then
3) Grow a SESE region of blocks *outside* of the set of hot blocks and
start outlining.
While testing this pass on Apple internal frameworks I noticed that some
kinds of control flow (e.g. loops) are never outlined, even though they
unconditionally lead to / follow cold blocks. I noticed two other issues
related to how cold regions are identified:
- An inconsistency can arise in the internal state of the hotness
propagation stage, as a block may end up in both the ColdBlocks set
and the HotBlocks set. Further inconsistencies can arise as these sets
do not match what's in ProfileSummaryInfo.
- It isn't necessary to limit outlining to single-exit regions.
This patch teaches the splitting algorithm to identify maximal cold
regions and outline them. A maximal cold region is defined as the set of
blocks post-dominated by a cold sink block, or dominated by that sink
block. This approach can successfully outline loops in the cold path. As
a side benefit, it maintains less internal state than the current
approach.
Due to a limitation in CodeExtractor, blocks within the maximal cold
region which aren't dominated by a single entry point (a so-called "max
ancestor") are filtered out.
Results:
- X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to
47KB pre-patch, or a ~3x improvement. Did not see a performance impact
across two runs.
- AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB
of TEXT were outlined. Ditto re: performance impact.
- Outlining results improve marginally in the internal frameworks I
tested.
Follow-ups:
- Outline more than once per function, outline large single basic
blocks, & try to remove unconditional branches in outlined functions.
Differential Revision: https://reviews.llvm.org/D53627
llvm-svn: 345209
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The current default of appending "_"+entry block label to the new
extracted cold function breaks demangling. Change the deliminator from
"_" to "." to enable demangling. Because the header block label will
be empty for release compile code, use "extracted" after the "." when
the label is empty.
Additionally, add a mechanism for the client to pass in an alternate
suffix applied after the ".", and have the hot cold split pass use
"cold."+Count, where the Count is currently 1 but can be used to
uniquely number multiple cold functions split out from the same function
with D53588.
Reviewers: sebpop, hiraditya
Subscribers: llvm-commits, erik.pilkington
Differential Revision: https://reviews.llvm.org/D53534
llvm-svn: 345178
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original patch was committed here:
rL344609
...and reverted:
rL344612
...because it did not properly check/test data types before calling
ComputeNumSignBits().
The tests that caused bot failures for the previous commit are
over-reaching front-end tests that run the entire -O optimizer
pipeline:
Clang :: CodeGen/builtins-systemz-zvector.c
Clang :: CodeGen/builtins-systemz-zvector2.c
I've added a negative test here to ensure coverage for that case.
The new early exit check also tests the type of the 'B' parameter,
so we don't waste time on matching if either value is unsuitable.
Original commit message:
This is part of solving PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549
The patterns shown here are a special case of something
that we already convert to select. Using ComputeNumSignBits()
catches that case (but not the more complicated motivating
patterns yet).
The backend has hooks/logic to convert back to logic ops
if that's better for the target.
llvm-svn: 345149
|
| |
|
|
|
|
|
|
| |
This work is to avoid regressions when we seperate FNeg from the FSub IR instruction.
Differential Revision: https://reviews.llvm.org/D53205
llvm-svn: 345146
|
| |
|
|
|
|
| |
Investigating fails.
llvm-svn: 345123
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
masked-interleaving is enabled
Enable interleave-groups under fold-tail scenario for Opt for size compilation;
D50480 added support for vectorizing loops of arbitrary trip-count without a
remiander, which in turn makes everything in the loop conditional, including
interleave-groups if any. It therefore invalidated all interleave-groups
because we didn't have support for vectorizing predicated interleaved-groups
at the time. In the meantime, D53011 introduced this support, so we don't
have to invalidate interleave-groups when masked-interleaved support is enabled.
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: hsaito
Differential Revision: https://reviews.llvm.org/D53559
llvm-svn: 345115
|
| |
|
|
|
|
|
|
|
|
|
|
| |
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.
Differential Revision: https://reviews.llvm.org/D51861
llvm-svn: 345114
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in the same round of SCC update.
In https://reviews.llvm.org/rL309784, inline history is added to prevent
infinite inlining across multiple run of inliner and SCC update, but the
history will only be kept when new SCC is actually generated during SCC update.
We found a case that SCC can be split and then merge into itself in the same
round of SCC update, so the same SCC will be pop out from UR.CWorklist and
then added back immediately, without any new SCC generated, that is why the
existing patch cannot catch the infinite inline case.
What the patch does is even if no new SCC is generated, if only the current
SCC appears in UR.CWorklist again, then keep the inline history.
Differential Revision: https://reviews.llvm.org/D52915
llvm-svn: 345103
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Outlined code is cold by assumption, so it makes sense to optimize it
for minimal code size rather than performance.
After r344869 moved the splitting pass to the end of the IR pipeline,
this does not result in much of a code size reduction. This is probably
because a comparatively small number backend transforms make use of the
MinSize hint.
Running LNT on x86_64, I see that 33/1020 binaries shrink for a total of
919 bytes of TEXT reduction. I didn't measure a significant performance
impact.
Differential Revision: https://reviews.llvm.org/D53518
llvm-svn: 345072
|
| |
|
|
|
|
|
|
|
|
|
|
| |
There's probably some vector-with-undef-element pattern
that shows an improvement, so this is probably not quite
'NFC'.
This is the last step towards removing the fake binop
queries for not/neg. Ie, there are no more uses of those
functions in trunk. Fneg should follow.
llvm-svn: 345050
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
booleans.
Summary:
TryToShrinkGlobalToBoolean, when possible, will split store <value> + load <value> into store <bool> + select <bool ? value : 0>. This preserves DebugLoc during that pass.
Fixes PR37959. The test case here is the simplified .ll for:
```
static int foo;
int bar() {
foo = 5;
return foo;
}
```
Reviewers: dblaikie, gbedwell, aprantl
Reviewed By: dblaikie
Subscribers: mehdi_amini, JDevlieghere, dexonsmith, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D53531
llvm-svn: 345046
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to update this code before introducing an 'fneg' instruction in IR,
so we might as well kill off the integer neg/not queries too.
This is no-functional-change-intended for scalar code and most vector code.
For vectors, we can see that the 'match' API allows for undef elements in
constants, so we optimize those cases better.
Ideally, there would be a test for each code diff, but I don't see evidence
of that for the existing code, so I didn't try very hard to come up with new
vector tests for each code change.
Differential Revision: https://reviews.llvm.org/D53533
llvm-svn: 345042
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Expand arithmetic reduction to include mul/and/or/xor instructions.
This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests.
This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason.
Differential Revision: https://reviews.llvm.org/D53473
llvm-svn: 345037
|
| |
|
|
|
|
|
| |
This is another step towards completely removing the fake
binop queries for not/neg/fneg.
llvm-svn: 345036
|
| |
|
|
| |
llvm-svn: 345034
|
| |
|
|
|
|
|
|
|
| |
This pass could probably be modified slightly to allow
vector splat transforms for practically no cost, but
it only works on scalars for now. So the use of the
newer 'match' API should make no functional difference.
llvm-svn: 345030
|
| |
|
|
|
|
| |
out of revision 344883
llvm-svn: 345021
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
At compile-time, create an array of {PC,HumanReadableStackFrameDescription}
for every function that has an instrumented frame, and pass this array
to the run-time at the module-init time.
Similar to how we handle pc-table in SanitizerCoverage.
The run-time is dummy, will add the actual logic in later commits.
Reviewers: morehouse, eugenis
Reviewed By: eugenis
Subscribers: srhines, llvm-commits, kubamracek
Differential Revision: https://reviews.llvm.org/D53227
llvm-svn: 344985
|
| |
|
|
| |
llvm-svn: 344959
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Emit optimization remark on successful hot cold split.
Reviewers: sebpop, hiraditya
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53512
llvm-svn: 344938
|
| |
|
|
|
|
| |
No functionality change.
llvm-svn: 344893
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
optimizing for size
LV is careful to respect -Os and not to create a scalar epilog in all cases
(runtime tests, trip-counts that require a remainder loop) except for peeling
due to gaps in interleave-groups. This patch fixes that; -Os will now have us
invalidate such interleave-groups and vectorize without an epilog.
The patch also removes a related FIXME comment that is now obsolete, and was
also inaccurate:
"FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller
MaxVF that does not require a scalar epilog."
(requiresScalarEpilog() has nothing to do with VF).
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53420
llvm-svn: 344883
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In the new+old pass manager, hot cold splitting was schedule too early.
Thanks to Vedant for pointing this out.
Reviewers: sebpop, vsk
Reviewed By: sebpop, vsk
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D53437
llvm-svn: 344869
|
| |
|
|
| |
llvm-svn: 344855
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I couldn't tell from svn history when these checks were added,
but it pre-dates the split of instcombine into its own directory
at rL92459.
The motivation for changing the check is partly shown by the
code in PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724
There are also existing regression tests for SLPVectorizer with
sequences of extract+insert that are likely assumed to become
shuffles by the vectorizer cost models.
llvm-svn: 344854
|
| |
|
|
| |
llvm-svn: 344852
|
| |
|
|
|
|
| |
Undo stray change introduced by r344725.
llvm-svn: 344814
|