| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When unrolling is disabled in the pass manager, the loop vectorizer should also
not unroll loops. This will allow the -fno-unroll-loops option in Clang to
behave as expected (even for vectorizable loops). The loop vectorizer's
-force-vector-unroll option will (continue to) override the pass-manager
setting (including -force-vector-unroll=0 to force use of the internal
auto-selection logic).
In order to test this, I added a flag to opt (-disable-loop-unrolling) to force
disable unrolling through opt (the analog of -fno-unroll-loops in Clang). Also,
this fixes a small bug in opt where the loop vectorizer was enabled only after
the pass manager populated the queue of passes (the global_alias.ll test needed
a slight update to the RUN line as a result of this fix).
llvm-svn: 189499
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Instead of setting the suffixes in a bunch of places, just set one master
list in the top-level config. We now only modify the suffix list in a few
suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).
- Aside from removing the need for a bunch of lit.local.cfg files, this enables
4 tests that were inadvertently being skipped (one in
Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
XFAILED).
- This commit also fixes a bunch of config files to use config.root instead of
older copy-pasted code.
llvm-svn: 188513
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
functionality change.
This update was done with the following bash script:
find test/Transforms -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP
done
mv $TEMP $NAME
fi
done
llvm-svn: 186268
|
|
|
|
|
|
| |
radar://14351991
llvm-svn: 186189
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic
by making the root of additional loop metadata.
- Loop::isAnnotatedParallel now looks for llvm.loop and associated
llvm.mem.parallel_loop_access
- document llvm.loop and update llvm.mem.parallel_loop_access
- add support for llvm.vectorizer.width and llvm.vectorizer.unroll
- document llvm.vectorizer.* metadata
- add utility class LoopVectorizerHints for getting/setting loop metadata
- use llvm.vectorizer.width=1 to indicate already vectorized instead of
already_vectorized
- update existing tests that used llvm.loop.parallel and
llvm.vectorizer.already_vectorized
Reviewed by: Nadav Rotem
llvm-svn: 182802
|
|
|
|
|
|
|
| |
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.
llvm-svn: 180796
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch disables memory-instruction vectorization for types that need padding
bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on
x86_64. Because the load/store vectorization is performed by the bit casting to
a packed vector, which has incompatible memory layout due to the lack of padding
bytes, the present vectorizer produces inconsistent result for memory
instructions of those types.
This patch checks an equality of the AllocSize of a scalar type and allocated
size for each vector element, to ensure that there is no padding bytes and the
array can be read/written using vector operations.
Patch by Daisuke Takahashi!
Fixes PR15758.
llvm-svn: 180196
|
|
|
|
| |
llvm-svn: 180195
|
|
|
|
|
|
| |
Made the uniform write test's checks a bit stricter.
llvm-svn: 180119
|
|
|
|
|
|
|
|
|
| |
even if erroneously annotated with the parallel loop metadata.
Fixes Bug 15794:
"Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata"
llvm-svn: 180081
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pass down the fact that an operand is going to be a vector of constants.
This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.
radar://13576547
llvm-svn: 178809
|
|
|
|
|
|
| |
instruction counts.
llvm-svn: 178459
|
|
|
|
|
|
| |
Also remove some unneeded function attributes.
llvm-svn: 177114
|
|
|
|
| |
llvm-svn: 177102
|
|
|
|
|
|
|
| |
We generate a select with a vectorized condition argument when the condition is
NOT loop invariant. Not the other way around.
llvm-svn: 177098
|
|
|
|
| |
llvm-svn: 176702
|
|
|
|
|
|
|
|
| |
domination.
Fixes PR15344.
llvm-svn: 176701
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This matters for example in following matrix multiply:
int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
int i, j, k, val;
for (i=0; i<rows; i++) {
for (j=0; j<cols; j++) {
val = 0;
for (k=0; k<cols; k++) {
val += m1[i][k] * m2[k][j];
}
m3[i][j] = val;
}
}
return(m3);
}
Taken from the test-suite benchmark Shootout.
We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).
Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.
I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:
for (i ...)
r += a[i] * 3;
for (i ...)
m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.
In each case the vectorized version was considerably slower.
radar://13304919
llvm-svn: 176403
|
|
|
|
|
|
| |
metadata, sorry.
llvm-svn: 175311
|
|
|
|
| |
llvm-svn: 174380
|
|
|
|
|
|
| |
requires +Asserts.
llvm-svn: 174379
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the loop vectorizer cost model, we used to ignore stores/loads of a pointer
type when computing the widest type within a loop. This meant that if we had
only stores/loads of pointers in a loop we would return a widest type of 8bits
(instead of 32 or 64 bit) and therefore a vector factor that was too big.
Now, if we see a consecutive store/load of pointers we use the size of a pointer
(from data layout).
This problem occured in SingleSource/Benchmarks/Shootout-C++/hash.cpp (reduced
test case is the first test in vector_ptr_load_store.ll).
radar://13139343
llvm-svn: 174377
|
|
|
|
|
|
| |
breakage with builds without X86-support.
llvm-svn: 174052
|
|
|
|
|
|
|
| |
We ignore the cpu frontend and focus on pipeline utilization. We do this because we
don't have a good way to estimate the loop body size at the IR level.
llvm-svn: 172964
|
|
|
|
| |
llvm-svn: 172963
|
|
|
|
|
|
| |
Should fix the arm buildbot (which only builds the arm target).
llvm-svn: 172611
|
|
|
|
|
|
| |
instruction to determine the max vectorization factor.
llvm-svn: 172010
|
|
|
|
|
|
| |
vectorizer does it now.
llvm-svn: 171930
|
|
|
|
|
|
| |
Cost Model support on ARM.
llvm-svn: 171928
|
|
|
|
|
|
|
|
|
| |
at once. This is a good thing, except for
small loops. On small loops post-loop that handles scalars (and runs slower) can take more time to execute than the
rest of the loop. This patch disables widening of loops with a small static trip count.
llvm-svn: 171798
|
|
|
|
|
|
|
|
| |
1. Add code to estimate register pressure.
2. Add code to select the unroll factor based on register pressure.
3. Add bits to TargetTransformInfo to provide the number of registers.
llvm-svn: 171469
|
|
|
|
| |
llvm-svn: 171043
|
|
|
|
|
|
|
|
| |
the StoreInst operands.
PR14705.
llvm-svn: 171023
|
|
|
|
|
|
|
|
| |
the cost of arithmetic functions. We now assume that the cost of arithmetic
operations that are marked as Legal or Promote is low, but ops that are
marked as custom are higher.
llvm-svn: 171002
|
|
|
|
|
|
| |
them more expensive.
llvm-svn: 170995
|
|
|
|
|
|
|
|
|
| |
- An MVT can become an EVT when being split (e.g. v2i8 -> v1i8, the latter doesn't exist)
- Return the scalar value when an MVT is scalarized (v1i64 -> i64)
Fixes PR14639ff.
llvm-svn: 170546
|
|
|
|
|
|
| |
induction variables costs the same as scalar trunc.
llvm-svn: 170051
|
|
|
|
| |
llvm-svn: 167480
|
|
|
|
| |
llvm-svn: 167412
|
|
|
|
| |
llvm-svn: 167395
|
|
|
|
| |
llvm-svn: 167174
|
|
|
|
|
|
| |
bitcasts cost zero.
llvm-svn: 167170
|
|
|
|
|
|
|
| |
This is important for loops in the LAPACK test-suite.
These loops start at 1 because they are auto-converted from fortran.
llvm-svn: 167084
|
|
|
|
|
|
|
|
|
| |
return the type of the split result.
2. Change the maximum vectorization width from 4 to 8.
3. A test for both.
llvm-svn: 166864
|
|
|
|
|
|
|
|
|
|
| |
Add getCostXXX calls for different families of opcodes, such as casts, arithmetic, cmp, etc.
Port the LoopVectorizer to the new API.
The LoopVectorizer now finds instructions which will remain uniform after vectorization. It uses this information when calculating the cost of these instructions.
llvm-svn: 166836
|
|
that only run if the target is present.
llvm-svn: 166796
|