| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
llvm-svn: 175964
|
|
|
|
| |
llvm-svn: 175898
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Storing the load/store instructions with the values
and inspect them using Alias Analysis to make sure
they don't alias, since the GEP pointer operand doesn't
take the offset into account.
Trying hard to not add any extra cost to loads and stores
that don't overlap on global values, AA is *only* calculated
if all of the previous attempts failed.
Using biggest vector register size as the stride for the
vectorization access, as we're being conservative and
the cost model (which calculates the real vectorization
factor) is only run after the legalization phase.
We might re-think this relationship in the future, but
for now, I'd rather be safe than sorry.
llvm-svn: 175818
|
|
|
|
|
|
| |
metadata, sorry.
llvm-svn: 175311
|
|
|
|
| |
llvm-svn: 174380
|
|
|
|
|
|
| |
requires +Asserts.
llvm-svn: 174379
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the loop vectorizer cost model, we used to ignore stores/loads of a pointer
type when computing the widest type within a loop. This meant that if we had
only stores/loads of pointers in a loop we would return a widest type of 8bits
(instead of 32 or 64 bit) and therefore a vector factor that was too big.
Now, if we see a consecutive store/load of pointers we use the size of a pointer
(from data layout).
This problem occured in SingleSource/Benchmarks/Shootout-C++/hash.cpp (reduced
test case is the first test in vector_ptr_load_store.ll).
radar://13139343
llvm-svn: 174377
|
|
|
|
|
|
| |
breakage with builds without X86-support.
llvm-svn: 174052
|
|
|
|
|
|
|
|
|
|
|
| |
Changing ARMBaseTargetMachine to return ARMTargetLowering intead of
the generic one (similar to x86 code).
Tests showing which instructions were added to cast when necessary
or cost zero when not. Downcast to 16 bits are not lowered in NEON,
so costs are not there yet.
llvm-svn: 173849
|
|
|
|
|
|
| |
to a command line switch.
llvm-svn: 173837
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
contain pointers that count backwards.
For example, this is the hot loop in BZIP:
do {
m = *--p;
*p = ( ... );
} while (--n);
llvm-svn: 173219
|
|
|
|
|
|
|
| |
We ignore the cpu frontend and focus on pipeline utilization. We do this because we
don't have a good way to estimate the loop body size at the IR level.
llvm-svn: 172964
|
|
|
|
| |
llvm-svn: 172963
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This separates the check for "too few elements to run the vector loop" from the
"memory overlap" check, giving a lot nicer code and allowing to skip the memory
checks when we're not going to execute the vector code anyways. We still leave
the decision of whether to emit the memory checks as branches or setccs, but it
seems to be doing a good job. If ugly code pops up we may want to emit them as
separate blocks too. Small speedup on MultiSource/Benchmarks/MallocBench/espresso.
Most of this is legwork to allow multiple bypass blocks while updating PHIs,
dominators and loop info.
llvm-svn: 172902
|
|
|
|
|
|
| |
Should fix the arm buildbot (which only builds the arm target).
llvm-svn: 172611
|
|
|
|
|
|
| |
and i16).
llvm-svn: 172348
|
|
|
|
|
|
|
|
| |
the target if it supports the different CAST types. We didn't do this
on X86 because of the different register sizes and types, but on ARM
this makes sense.
llvm-svn: 172245
|
|
|
|
|
|
|
|
|
|
|
| |
order to select the max vectorization factor.
We don't have a detailed analysis on which values are vectorized and which stay scalars in the vectorized loop so we use
another method. We look at reduction variables, loads and stores, which are the only ways to get information in and out
of loop iterations. If the data types are extended and truncated then the cost model will catch the cost of the vector
zext/sext/trunc operations.
llvm-svn: 172178
|
|
|
|
|
|
|
|
| |
BinaryOperator can be folded to an Undef, and we don't want to set NSW flags to undef vals.
PR14878
llvm-svn: 172079
|
|
|
|
|
|
| |
instruction to determine the max vectorization factor.
llvm-svn: 172010
|
|
|
|
| |
llvm-svn: 171931
|
|
|
|
|
|
| |
vectorizer does it now.
llvm-svn: 171930
|
|
|
|
|
|
| |
Cost Model support on ARM.
llvm-svn: 171928
|
|
|
|
| |
llvm-svn: 171812
|
|
|
|
|
|
|
|
|
| |
at once. This is a good thing, except for
small loops. On small loops post-loop that handles scalars (and runs slower) can take more time to execute than the
rest of the loop. This patch disables widening of loops with a small static trip count.
llvm-svn: 171798
|
|
|
|
| |
llvm-svn: 171584
|
|
|
|
|
|
|
|
| |
as long as the reduction chain is used in the LHS.
PR14803.
llvm-svn: 171583
|
|
|
|
|
|
| |
This should fix clang-native-arm-cortex-a9. Thanks Renato.
llvm-svn: 171582
|
|
|
|
|
|
|
|
|
| |
Since subtraction does not commute the loop vectorizer incorrectly vectorizes
reductions such as x = A[i] - x.
Disabling for now.
llvm-svn: 171537
|
|
|
|
|
|
|
|
| |
1. Add code to estimate register pressure.
2. Add code to select the unroll factor based on register pressure.
3. Add bits to TargetTransformInfo to provide the number of registers.
llvm-svn: 171469
|
|
|
|
| |
llvm-svn: 171446
|
|
|
|
| |
llvm-svn: 171429
|
|
|
|
|
|
|
|
|
| |
LCSSA PHIs may have undef values. The vectorizer updates values that are used by outside users such as PHIs.
The bug happened because undefs are not loop values. This patch handles these PHIs.
PR14725
llvm-svn: 171251
|
|
|
|
|
|
|
|
| |
even if the read objects are unidentified.
PR14719.
llvm-svn: 171124
|
|
|
|
|
|
| |
the iteration step is -1
llvm-svn: 171114
|
|
|
|
| |
llvm-svn: 171076
|
|
|
|
| |
llvm-svn: 171043
|
|
|
|
|
|
|
|
| |
the StoreInst operands.
PR14705.
llvm-svn: 171023
|
|
|
|
|
|
|
|
| |
The bug was in the code that detects PHIs in if-then-else block sequence.
PR14701.
llvm-svn: 171008
|
|
|
|
|
|
|
|
| |
the cost of arithmetic functions. We now assume that the cost of arithmetic
operations that are marked as Legal or Promote is low, but ops that are
marked as custom are higher.
llvm-svn: 171002
|
|
|
|
|
|
| |
them more expensive.
llvm-svn: 170995
|
|
|
|
|
|
|
|
|
|
| |
memory bound checks. Before the fix we were able to vectorize this loop from
the Livermore Loops benchmark:
for ( k=1 ; k<n ; k++ )
x[k] = x[k-1] + y[k];
llvm-svn: 170811
|
|
|
|
|
|
|
|
|
|
| |
Before if-conversion we could check if a value is loop invariant
if it was declared inside the basic block. Now that loops have
multiple blocks this check is incorrect.
This fixes External/SPEC/CINT95/099_go/099_go
llvm-svn: 170756
|
|
|
|
|
|
|
|
|
| |
- An MVT can become an EVT when being split (e.g. v2i8 -> v1i8, the latter doesn't exist)
- Return the scalar value when an MVT is scalarized (v1i64 -> i64)
Fixes PR14639ff.
llvm-svn: 170546
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instead of scalar operations.
For example on x86 with SSE4.2 a <8 x i8> add reduction becomes
movdqa %xmm0, %xmm1
movhlps %xmm1, %xmm1 ## xmm1 = xmm1[1,1]
paddw %xmm0, %xmm1
pshufd $1, %xmm1, %xmm0 ## xmm0 = xmm1[1,0,0,0]
paddw %xmm1, %xmm0
phaddw %xmm0, %xmm0
pextrb $0, %xmm0, %edx
instead of
pextrb $2, %xmm0, %esi
pextrb $0, %xmm0, %edx
addb %sil, %dl
pextrb $4, %xmm0, %esi
addb %dl, %sil
pextrb $6, %xmm0, %edx
addb %sil, %dl
pextrb $8, %xmm0, %esi
addb %dl, %sil
pextrb $10, %xmm0, %edi
pextrb $14, %xmm0, %edx
addb %sil, %dil
pextrb $12, %xmm0, %esi
addb %dil, %sil
addb %sil, %dl
llvm-svn: 170439
|
|
|
|
|
|
| |
induction variables costs the same as scalar trunc.
llvm-svn: 170051
|
|
|
|
|
|
| |
increase the function size.
llvm-svn: 170004
|
|
|
|
|
|
| |
in if-conversion.
llvm-svn: 169916
|
|
|
|
|
|
| |
truncation is now done on scalars.
llvm-svn: 169904
|
|
|
|
| |
llvm-svn: 169813
|