| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements PGO-driven loop peeling.
The basic idea is that when the average dynamic trip-count of a loop is known,
based on PGO, to be low, we can expect a performance win by peeling off the
first several iterations of that loop.
Unlike unrolling based on a known trip count, or a trip count multiple, this
doesn't save us the conditional check and branch on each iteration. However,
it does allow us to simplify the straight-line code we get (constant-folding,
etc.). This is important given that we know that we will usually only hit this
code, and not the actual loop.
This is currently disabled by default.
Differential Revision: https://reviews.llvm.org/D25963
llvm-svn: 288274
|
| |
|
|
|
|
|
| |
We had a limited version of this for scalar 'and'; this expands
the transform to 'or' and 'xor' and allows vectors types too.
llvm-svn: 288273
|
| |
|
|
| |
llvm-svn: 288270
|
| |
|
|
| |
llvm-svn: 288261
|
| |
|
|
| |
llvm-svn: 288254
|
| |
|
|
|
|
|
|
| |
This reverts commit r288210.
The failure on the stage2 LTO build is back.
llvm-svn: 288226
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[recommiting patches one-by-one to see which breaks the stage2 LTO bot]
Follow-on patches will add more interesting cases.
The goal of this patch-set is to get the GVN messages printed in
opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This
is the optimization view for the function (the last remark in the
function has a bug which is fixed in this series):
http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430
Differential Revision: https://reviews.llvm.org/D26488
llvm-svn: 288210
|
| |
|
|
|
|
|
|
|
| |
Michel Dänzer reported that r288051, "[StructurizeCFG] Use range-based
for loops", introduced a bug into rebuildSSA, wherein we were iterating
over an instruction's use list while modifying it, without taking care
to do this correctly.
llvm-svn: 288200
|
| |
|
|
|
|
|
|
|
| |
This reverts commit r288046.
Trying to see if the revert fixes a compiler crash during a stage2 LTO
build with a GVN backtrace.
llvm-svn: 288179
|
| |
|
|
|
|
|
|
|
| |
This reverts commit r288047.
Trying to see if the revert fixes a compiler crash during a stage2 LTO
build with a GVN backtrace.
llvm-svn: 288178
|
| |
|
|
|
|
|
|
|
|
|
| |
load-elimination"
This reverts commit r288090.
Trying to see if the revert fixes a compiler crash during a stage2 LTO
build with a GVN backtrace.
llvm-svn: 288177
|
| |
|
|
|
|
| |
The flag was removed by 288154
llvm-svn: 288161
|
| |
|
|
|
|
| |
instruction.
llvm-svn: 288148
|
| |
|
|
|
|
|
|
|
|
|
| |
Currently SLP vectorizer tries to vectorize a binary operation and dies
immediately after unsuccessful the first unsuccessfull attempt. Patch
tries to improve the situation, trying to vectorize all binary
operations of all children nodes in the binop tree.
Differential Revision: https://reviews.llvm.org/D25517
llvm-svn: 288115
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
load-elimination
This includes the intervening store and the load/store that we're trying
to forward from in the optimization remark for the missed load
elimination.
This is hooked up under a new mode in ORE that allows for compile-time
budget for a bit more analysis to print more insightful messages. This
mode is currently enabled for -fsave-optimization-record (-Rpass is
trickier since it is controlled in the front-end).
With this we can now print the red remark in http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446
Differential Revision: https://reviews.llvm.org/D26490
llvm-svn: 288090
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Preserving lifetime markers isn't as important as allowing promotion,
so just drop the lifetime markers if necessary.
This also fixes an assertion failure where other parts of SROA assumed
that lifetime markers never block promotion.
Fixes https://llvm.org/bugs/show_bug.cgi?id=29139.
Differential Revision: https://reviews.llvm.org/D24854
llvm-svn: 288074
|
| |
|
|
|
|
|
| |
It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line
670 ("Expected irreducible CFG").
llvm-svn: 288052
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This requires some changes to the opt-diag API. Hal and I have
discussed this at the Dev Meeting and came up with a streaming delimiter
(setExtraArgs) to solve this.
Arguments after this delimiter are only included in the optimization
records and not in the remarks printed in the compiler output. (Note,
how in the test the content of the YAML file changes but the remarks on
the compiler output don't.)
This implements the green GVN message with a bug fix at line
http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446
The fix is that now we properly include the constant value in the
message: "load of type i32 eliminated in favor of 7"
Differential Revision: https://reviews.llvm.org/D26489
llvm-svn: 288047
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Follow-on patches will add more interesting cases.
The goal of this patch-set is to get the GVN messages printed in
opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This
is the optimization view for the function (the last remark in the
function has a bug which is fixed in this series):
http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430
Differential Revision: https://reviews.llvm.org/D26488
llvm-svn: 288046
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In r286814, the algorithm for calculating inline costs changed. This
caused more inlining to take place which is especially apparent
in optsize and minsize modes.
As the cost calculation removed a skewed behaviour (we were inconsistent
about the cost of calls) it isn't possible to update the thresholds to
get exactly the same behaviour as before. However, this threshold change
accounts for the very common case where an inline candidate has no
calls within it. In this case, r286814 would inline around 5-6 more (IR)
instructions.
The changes to -Oz have been heavily benchmarked. The "obvious" value
for the inline threshold at -Oz is zero, but due to inaccuracies in the
inline heuristics this can actually cause code size increases due to
not inlining key thunk functions (that then disappear). Experimentally,
5 was the sweet spot for code size over the test-suite.
For -Os, this change removes the outlier results shown up by green dragon
(http://104.154.54.203/db_default/v4/nts/13248).
Fixes D26848.
llvm-svn: 288024
|
| |
|
|
|
|
|
| |
incoming patch for vectorization of jumbled load
Change-Id: Ifb9091bb0f84c1937c2c8bd2fc345734f250d2f9
llvm-svn: 287992
|
| |
|
|
| |
llvm-svn: 287982
|
| |
|
|
| |
llvm-svn: 287980
|
| |
|
|
| |
llvm-svn: 287954
|
| |
|
|
| |
llvm-svn: 287953
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions.
Summary:
The iterative algorithm for Loop Unswitching may render some of the branches unreachable in the unswitched loops.
Given the exponential nature of the algorithm, this is quite an overhead.
This patch fixes this problem by selectively unswitching only those branches within a loop that are reachable from the loop header.
Reviewers: Michael Zolothukin, Anna Thomas, Weiming Zhao.
Subscribers: llvm-commits.
Differential Revision: http://reviews.llvm.org/D26299
llvm-svn: 287925
|
| |
|
|
|
|
|
|
| |
AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances
llvm-svn: 287882
|
| |
|
|
| |
llvm-svn: 287801
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
aliasing load
Summary:
The "getVectorizablePrefix" method would give up if it found an aliasing load for a store chain.
In practice, the aliasing load can be treated as a memory barrier and all stores that precede it
are a valid vectorizable prefix.
Issue found by volkan in D26962. Testcase is a pruned version of the one in the original patch.
Reviewers: jlebar, arsenm, tstellarAMD
Subscribers: mzolotukhin, wdng, nhaehnle, anna, volkan, llvm-commits
Differential Revision: https://reviews.llvm.org/D27008
llvm-svn: 287781
|
| |
|
|
|
|
|
|
| |
AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances
llvm-svn: 287762
|
| |
|
|
| |
llvm-svn: 287760
|
| |
|
|
|
|
|
|
|
|
| |
Without this test, you can just remove the code fixing the
switch to the first constant in ResolvedUndefs in and everything
pass. This test, instead, fails with an assertion if the code
is removed. Found while refactoring SCCP to integrate undef in
the solver.
llvm-svn: 287731
|
| |
|
|
|
|
|
|
| |
info. (NFC)
If there is no debug info in the callee, inlining it will not help annotator. This avoids infinite loop as reported in PR/31119.
llvm-svn: 287710
|
| |
|
|
|
|
|
|
|
|
| |
We visit and/or, we try to derive a lattice value for the
instruction even if one of the operands is overdefined.
If the non-overdefined value is still 'unknown' just return and wait
for ResolvedUndefsIn to "plug in" the correct value. This simplifies
the logic a bit. While I'm here add tests for missing cases.
llvm-svn: 287709
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In PR27925:
https://llvm.org/bugs/show_bug.cgi?id=27925
...we proposed adding this fold to eliminate a bitcast. In D20774, there was
some concern about changing the type of a bitwise op as well as creating
bitcasts that might not be free for a target. However, if we're strictly
eliminating an instruction (by limiting this to one-use ops), then we should
be able to do this in InstCombine.
But we're cautiously restricting the transform for now to vector types to
avoid possible backend problems. A transform to make sure the logic op is
legal for the target should be added to reverse this transform and improve
codegen.
Differential Revision: https://reviews.llvm.org/D26641
llvm-svn: 287707
|
| |
|
|
|
|
|
| |
Reviewer: Hal Finkel.
Differential Revision: https://reviews.llvm.org/D26952
llvm-svn: 287700
|
| |
|
|
|
|
|
| |
Reviewer: Hal Finkel.
Differential Revision: https://reviews.llvm.org/D26957
llvm-svn: 287695
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously, CGP would unconditionally sink addrspacecast instructions,
even going so far as to sink them into a loop.
Now we check that the cast is "cheap", as defined by TLI.
We introduce a new "is-cheap" function to TLI rather than using
isNopAddrSpaceCast because some GPU platforms want the ability to ask
for non-nop casts to be sunk.
Reviewers: arsenm, tra
Subscribers: jholewinski, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D26923
llvm-svn: 287591
|
| |
|
|
|
|
|
|
|
|
| |
Allow using an instruction other than a mul or phi as the base for
root-finding. For example, the included testcase includes a loop
which requires using a getelementptr as the base for root-finding.
Differential Revision: https://reviews.llvm.org/D26529
llvm-svn: 287588
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a first step towards canonicalization and improved folding/codegen
for integer min/max as discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html
Here, we're just matching the simplest min/max patterns and adjusting the
icmp predicate while swapping the select operands.
I've included FIXME tests in test/Transforms/InstCombine/select_meta.ll
so it's easier to see how this might be extended (corresponds to the TODO
comment in the code). That's also why I'm using matchSelectPattern()
rather than a simpler check; once the backend is patched, we can just
remove some of the restrictions to allow the obfuscated min/max patterns
in the FIXME tests to be matched.
Differential Revision: https://reviews.llvm.org/D26525
llvm-svn: 287585
|
| |
|
|
|
|
| |
Pipe from stdin to avoid accidentally matching on the path.
llvm-svn: 287583
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
D26704 fixed the non-determinism in codegen by sorting basic blocks before
iteration so as to have a defined iteration order. As a result we need to fix
the names (numbers) of the temporaries in the following unit tests:
test/Transforms/Util/MemorySSA/multi-edges.ll
test/Transforms/Util/MemorySSA/multiple-backedges-hal.ll
Reviewers: dberlin, david2050, mgrang
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26926
llvm-svn: 287575
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes the non-determinism caused due to iterating SmallPtrSet's
which was uncovered due to the experimental "reverse iteration order " patch:
https://reviews.llvm.org/D26718
The following unit tests failed because of the undefined order of iteration.
LLVM :: Transforms/Util/MemorySSA/cyclicphi.ll
LLVM :: Transforms/Util/MemorySSA/many-dom-backedge.ll
LLVM :: Transforms/Util/MemorySSA/many-doms.ll
LLVM :: Transforms/Util/MemorySSA/phi-translation.ll
Reviewers: dberlin, mgrang
Subscribers: dberlin, llvm-commits, david2050
Differential Revision: https://reviews.llvm.org/D26704
llvm-svn: 287563
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Merging an empty case block into the header block of switch could cause
ISel to add COPY instructions in the header of switch, instead of the case
block, if the case block is used as an incoming block of a PHI. This could
potentially increase dynamic instructions, especially when the switch is in a
loop. I added a test case which was reduced from the benchmark I was targetting.
Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl
Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22696
llvm-svn: 287553
|
| |
|
|
|
|
|
|
|
|
| |
Currently LLVM assumes that a pointer addrspacecasted to a different addr space is equivalent to trunc or zext bitwise, which is not true. For example, in amdgcn target, when a null pointer is addrspacecasted from addr space 4 to 0, its value is changed from i64 0 to i32 -1.
This patch teaches LLVM not to assume known bits of addrspacecast instruction to its operand.
Differential Revision: https://reviews.llvm.org/D26803
llvm-svn: 287545
|
| |
|
|
| |
llvm-svn: 287511
|
| |
|
|
|
|
|
|
|
| |
This is a prerequisite patch for D26556:
https://reviews.llvm.org/D26556
...because there was no direct coverage for these folds (which in some cases are adding instructions).
llvm-svn: 287400
|
| |
|
|
|
|
| |
This fixes PR30454.
llvm-svn: 287379
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
insertUniqueBackedgeBlock in lib/Transforms/Utils/LoopSimplify.cpp now
propagates existing llvm.loop metadata to newly the added backedge.
llvm::TryToSimplifyUncondBranchFromEmptyBlock in lib/Transforms/Utils/Local.cpp
now propagates existing llvm.loop metadata to the branch instructions in the
predecessor blocks of the empty block that is removed.
Differential Revision: https://reviews.llvm.org/D26495
llvm-svn: 287341
|
| |
|
|
|
|
|
|
| |
for variable shift with 16-bit elements.
This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches.
llvm-svn: 287316
|