| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math)
when converting scalar instructions into vectors. But this isn't a simple copy - we need to take
the intersection (the logical 'and') of the sets of flags on the scalars.
The solution is further complicated because we can have non-uniform (non-SIMD) vector ops after:
http://reviews.llvm.org/D4015
http://llvm.org/viewvc/llvm-project?view=revision&revision=211339
The vast majority of changed files are existing tests that were not propagating IR flags, but I've
also added a new test file for focused testing of IR flag possibilities.
Differential Revision: http://reviews.llvm.org/D5172
llvm-svn: 217051
|
|
|
|
|
|
| |
instruction. radar://18144665
llvm-svn: 216946
|
|
|
|
|
|
|
|
|
|
| |
vector instruction.
For a detailed description of the problem see the comment in the test file.
The problematic moveBefore() calls are not required anymore because the new
scheduling algorithm ensures a correct ordering anyway.
llvm-svn: 216656
|
|
|
|
| |
llvm-svn: 216549
|
|
|
|
|
|
| |
PR 20642.
llvm-svn: 216475
|
|
|
|
|
|
|
|
| |
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)
llvm-svn: 216371
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions.
In unreachable blocks it's legal to have instructions like "%x = op %x".
Such instuctions are not schedulable. Therefore the SLPVectorizer has to check for
unreachable blocks and ignore them.
Fixes bug 20646.
llvm-svn: 216256
|
|
|
|
|
|
| |
We were using the pointer type which is incorrect.
llvm-svn: 215162
|
|
|
|
|
|
|
|
| |
have a cost.
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.
llvm-svn: 214859
|
|
|
|
| |
llvm-svn: 214638
|
|
|
|
| |
llvm-svn: 214494
|
|
|
|
| |
llvm-svn: 214338
|
|
|
|
|
|
|
|
|
| |
This patch adds support to recognize patterns such as fadd,fsub,fadd,fsub.../add,sub,add,sub... and
vectorizes them as vector shuffles if they are profitable.
These patterns of vector shuffle can later be converted to instructions such as addsubpd etc on X86.
Thanks to Arnold and Hal for the reviews. http://reviews.llvm.org/D4015
llvm-svn: 211339
|
|
|
|
|
|
| |
We can just split targets_to_build in one place and make it immutable.
llvm-svn: 210496
|
|
|
|
| |
llvm-svn: 210343
|
|
|
|
|
|
|
|
| |
The use cases look like the following:
x->a = y->a + 10
x->b = y->b + 12
llvm-svn: 210342
|
|
|
|
|
|
|
|
| |
If we have common uses on separate paths in the tree; process the one with greater common depth first.
This makes sure that we do not assume we need to extract a load when it is actually going to be part of a vectorized tree.
Review: http://reviews.llvm.org/D3800
llvm-svn: 210310
|
|
|
|
|
|
|
|
|
|
| |
Vectorizer.
This patch adds support to vectorize intrinsics such as powi, cttz and ctlz in Vectorizer. These intrinsics are different from other
intrinsics as second argument to these function must be same in order to vectorize them and it should be represented as a scalar.
Review: http://reviews.llvm.org/D3851#inline-32769 and http://reviews.llvm.org/D3937#inline-32857
llvm-svn: 209873
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.
"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.
This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.
llvm-svn: 209577
|
|
|
|
|
|
|
|
|
|
|
| |
sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformation in Scalar
Evolution.
That helps SLP-vectorizer to recognize consecutive loads/stores.
<rdar://problem/14860614>
llvm-svn: 209568
|
|
|
|
| |
llvm-svn: 209538
|
|
|
|
|
|
|
|
|
| |
The cost model conservatively assumes that it will always get scalarized and
that's about as good as we can get with the generic TTI; reasoning whether a
shuffle with an efficient lowering is available is hard. We can override that
conservative estimate for some targets in the future.
llvm-svn: 209125
|
|
|
|
|
|
|
|
|
|
|
|
| |
unreachable code.
There is no total ordering if the CFG is disconnected. We don't care if we
catch all CSE opportunities in dead code either so just exclude ignore them in
the assert.
PR19646
llvm-svn: 208461
|
|
|
|
|
|
| |
<rdar://problem/16812145>
llvm-svn: 207983
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When can't assume a vectorized tree is rooted in an instruction. The IRBuilder
could have constant folded it. When we rebuild the build_vector (the series of
InsertElement instructions) use the last original InsertElement instruction. The
vectorized tree root is guaranteed to be before it.
Also, we can't assume that the n-th InsertElement inserts the n-th element into
a vector.
This reverts r207746 which reverted the revert of the revert of r205018 or so.
Fixes the test case in PR19621.
llvm-svn: 207939
|
|
|
|
|
|
|
| |
This patch adds support to recognize and vectorize intrinsic math functions in SLPVectorizer.
Review: http://reviews.llvm.org/D3560 and http://reviews.llvm.org/D3559
llvm-svn: 207901
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
=[
Turns out that this was the root cause of PR19621. We found a crasher
only recently (likely due to improvements elsewhere in the SLP
vectorizer) but the reduced test case failed all the way back to here.
I've confirmed that reverting this patch both fixes the reduced test
case in PR19621 and the actual source file that led to it, so it seems
to really be rooted here. I've replied to the commit thread with
discussion of my (feeble) attempts to debug this. Didn't make it very
far, so reverting now that we have a good test case so that things can
get back to healthy while the debugging carries on.
llvm-svn: 207746
|
|
|
|
|
|
| |
radar://16641956
llvm-svn: 207572
|
|
|
|
|
|
|
|
|
| |
reschedule them"
This commit reapplies 205018. After 205855 we should correctly vectorize
intrinsics.
llvm-svn: 205965
|
|
|
|
|
|
|
|
|
| |
The vectorizer only knows how to vectorize intrinics by widening all operands by
the same factor.
Patch by Tyler Nowicki!
llvm-svn: 205855
|
|
|
|
|
|
|
|
|
| |
Some Intrinsics are overloaded to the extent that return type equality (all
that's been checked up to now) does not guarantee that the arguments are the
same. In these cases SLP vectorizer should not recurse into the operands, which
can be achieved by comparing them as "Function *" rather than simply the ID.
llvm-svn: 205424
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reschedule them"
This reverts commit r205018.
Conflicts:
lib/Transforms/Vectorize/SLPVectorizer.cpp
test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
This is breaking libclc build.
llvm-svn: 205260
|
|
|
|
|
|
|
|
|
| |
Extract element instructions that will be removed when vectorzing lower the
cost.
Patch by Arch D. Robison!
llvm-svn: 205020
|
|
|
|
|
|
| |
Patch by Arch D. Robison!
llvm-svn: 205018
|
|
|
|
|
|
|
| |
This reverts commit 86cb795388643710dab34941ddcb5a9470ac39d8.
The problems previously found have been resolved through other CLs.
llvm-svn: 203707
|
|
|
|
| |
llvm-svn: 202924
|
|
|
|
|
|
|
|
|
| |
Vectorize sequential stores of a broadcasted value.
5% on eon.
radar://16124699
llvm-svn: 202067
|
|
|
|
|
|
| |
vectorizeTree()). radar://16064178
llvm-svn: 201501
|
|
|
|
|
|
|
| |
This reverts commit r200576. It broke 32-bit self-host builds by
vectorizing two calls to @llvm.bswap.i64, which we then fail to expand.
llvm-svn: 200602
|
|
|
|
|
|
|
|
|
|
| |
transform accordingly. Based on similar code from Loop vectorization.
Subsequent commits will include vectorization of function calls to
vector intrinsics and form function calls to vector library calls.
Patch by Raul Silvera! (Much delayed due to my not running dcommit)
llvm-svn: 200576
|
|
|
|
| |
llvm-svn: 199016
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were creating external uses for scalar values in MustGather entries that also
had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
This is not necessary because when we create a MustGather vector we explicitly
create external uses entries for the insertelement instructions of the
MustGather vector elements.
Fixes PR18129.
radar://15582184
llvm-svn: 196508
|
|
|
|
|
|
|
|
|
|
| |
clang enables vectorization at optimization levels > 1 and size level < 2. opt
should behave similarily.
Loop vectorization and SLP vectorization can be disabled with the flags
-disable-(loop/slp)-vectorization.
llvm-svn: 196294
|
|
|
|
|
|
|
|
| |
some of these instructions
may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE.
llvm-svn: 195791
|
|
|
|
|
|
|
|
| |
we generate PHI nodes with multiple entries from the same basic block but
with different values. Enabling CSE on ExtractElement instructions make sure
that all of the RAUWed instructions are the same.
llvm-svn: 195773
|
|
|
|
| |
llvm-svn: 195691
|
|
|
|
|
|
|
|
| |
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.
llvm-svn: 195504
|
|
|
|
|
|
| |
multiple external uses.
llvm-svn: 195406
|
|
|
|
|
|
| |
require ARM in targets.
llvm-svn: 193580
|
|
|
|
|
|
|
|
|
| |
Updated a test case that assumed that <2 x double> would vectorize to use
<4 x float>.
radar://15338229
llvm-svn: 193574
|