| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
| |
non-consecutive (jumbled) way.
The jumbled scalar loads will be sorted while building the tree and these accesses will be marked to generate shufflevector after the vectorized load with proper mask.
Reviewers: hfinkel, mssimpso, mkuper
Differential Revision: https://reviews.llvm.org/D26905
Change-Id: I9c0c8e6f91a00076a7ee1465440a3f6ae092f7ad
llvm-svn: 293386
|
|
|
|
|
|
| |
it vectorizable.
llvm-svn: 293162
|
|
|
|
| |
llvm-svn: 293153
|
|
|
|
| |
llvm-svn: 293076
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions.
If number of instructions in horizontal reduction list is not power of 2
then only PowerOf2Floor(NumberOfInstructions) last elements are actually
vectorized, other instructions remain scalar. Patch tries to vectorize
the remaining elements either.
Differential Revision: https://reviews.llvm.org/D28959
llvm-svn: 293042
|
|
|
|
|
|
| |
not reconstructed.
llvm-svn: 292911
|
|
|
|
| |
llvm-svn: 292821
|
|
|
|
| |
llvm-svn: 292783
|
|
|
|
| |
llvm-svn: 292631
|
|
|
|
|
|
| |
instructions.
llvm-svn: 292626
|
|
|
|
|
| |
Change-Id: I905ce08a02c76a6896dcfd9629547417c99adc4a
llvm-svn: 292581
|
|
|
|
|
|
|
| |
Add a test for PR30787: Failure to beneficially vectorize 'copyable'
elements in integer binary ops.
llvm-svn: 292416
|
|
|
|
|
|
|
|
|
|
|
|
| |
The removed assert seems bogus - it's perfectly legal for the roots of the
vectorized subtrees to be equal even if the original scalar values aren't,
if the original scalars happen to be equivalent.
This fixes PR31599.
Differential Revision: https://reviews.llvm.org/D28539
llvm-svn: 291692
|
|
|
|
|
|
| |
The check script will use var names before they are declared, which filecheck doesn't like.
llvm-svn: 290971
|
|
|
|
|
|
| |
Missed var name
llvm-svn: 290970
|
|
|
|
| |
llvm-svn: 290969
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.
This fixes PR31286.
Differential Revision: https://reviews.llvm.org/D27992
llvm-svn: 290641
|
|
|
|
| |
llvm-svn: 289817
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch ensures the correct minimum bit width during type-shrinking.
Previously when type-shrinking, we always sign-extended values back to their
original width. However, if we are going to sign-extend, and the sign bit is
unknown, we have to increase the minimum bit width by one bit so the
sign-extend will fill the upper bits correctly. If the sign bit is known to be
zero, we can perform a zero-extend instead. This should fix PR31243.
Reference: https://llvm.org/bugs/show_bug.cgi?id=31243
Differential Revision: https://reviews.llvm.org/D27466
llvm-svn: 289470
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.
Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:
- That by the time an struct access tuple `(base-type, offset)` is
"reduced" to a scalar base type, the offset is `0`. For instance, in
C++ you can't start from, say `("struct-a", 16)`, and end up with
`("int", 4)` -- by the time the base type is `"int"`, the offset
better be zero. In particular, a variant of this invariant is needed
for `llvm::getMostGenericTBAA` to be correct.
- That there are no cycles in a struct path.
- That struct type nodes have their offsets listed in an ascending
order.
- That when generating the struct access path, you eventually reach the
access type listed in the tbaa tag node.
Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D26438
llvm-svn: 289402
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Differential Revision: https://reviews.llvm.org/D27215
llvm-svn: 289043
|
|
|
|
|
|
|
|
|
| |
vectorizations
e.g.
buildvector(sitofp(i32), sitofp(i32), sitofp(i32), sitofp(i32)) --> sitofp(buildvector(i32, i32, i32, i32))
llvm-svn: 288807
|
|
|
|
|
|
|
|
| |
This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's
builtins (twice: once in r288412 and once in r288497). We should investigate
this offline.
llvm-svn: 288508
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Differential Revision: https://reviews.llvm.org/D27215
llvm-svn: 288497
|
|
|
|
|
|
| |
fp-ops (PR6246)
llvm-svn: 288492
|
|
|
|
|
|
| |
This reverts r288412 which causes severe compile-time regression.
llvm-svn: 288431
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Differential Revision: https://reviews.llvm.org/D27215
llvm-svn: 288412
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently when cost of scalar operations is evaluated the vector type is
used for scalar operations. Patch fixes this issue and fixes evaluation
of the vector operations cost.
Several test showed that vector cost model is too optimistic. It
allowed vectorization of 8 or less add/fadd operations, though scalar
code is faster. Actually, only for 16 or more operations vector code
provides better performance.
Differential Revision: https://reviews.llvm.org/D26277
llvm-svn: 288398
|
|
|
|
| |
llvm-svn: 288377
|
|
|
|
|
|
| |
This reverts commit a61718435fc4118c82f8aa6133fd81f803789c1e.
llvm-svn: 288371
|
|
|
|
| |
llvm-svn: 288369
|
|
|
|
|
|
| |
instruction.
llvm-svn: 288148
|
|
|
|
|
|
|
|
|
|
|
| |
Currently SLP vectorizer tries to vectorize a binary operation and dies
immediately after unsuccessful the first unsuccessfull attempt. Patch
tries to improve the situation, trying to vectorize all binary
operations of all children nodes in the binop tree.
Differential Revision: https://reviews.llvm.org/D25517
llvm-svn: 288115
|
|
|
|
|
|
|
| |
incoming patch for vectorization of jumbled load
Change-Id: Ifb9091bb0f84c1937c2c8bd2fc345734f250d2f9
llvm-svn: 287992
|
|
|
|
|
|
|
|
| |
AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances
llvm-svn: 287882
|
|
|
|
| |
llvm-svn: 287801
|
|
|
|
|
|
|
|
| |
AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances
llvm-svn: 287762
|
|
|
|
| |
llvm-svn: 287760
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This extends FCOPYSIGN support to 512-bit vectors.
I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.
Reviewers: delena, zvi, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26791
llvm-svn: 287298
|
|
|
|
|
|
|
| |
Reviewer: Michael Zolotukhin.
Differential Revision: https://reviews.llvm.org/D26575
llvm-svn: 287064
|
|
|
|
|
|
|
| |
Reviewer: Michael Zolotukhin.
Differential Revision: https://reviews.llvm.org/D26543
llvm-svn: 286626
|
|
|
|
|
|
|
|
|
|
| |
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.
This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.
Differential Revision: https://reviews.llvm.org/D25910
llvm-svn: 286233
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After successfull horizontal reduction vectorization attempt for PHI node
vectorizer tries to update root binary op by combining vectorized tree
and the ReductionPHI node. But during vectorization this ReductionPHI
can be vectorized itself and replaced by the `undef` value, while the
instruction itself is marked for deletion. This 'marked for deletion'
PHI node then can be used in new binary operation, causing "Use still
stuck around after Def is destroyed" crash upon PHI node deletion.
Also the test is fixed to make it perform actual testing.
Differential Revision: https://reviews.llvm.org/D25671
llvm-svn: 285286
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.
This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.
It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.
Differential Revision: https://reviews.llvm.org/D23808
llvm-svn: 284459
|
|
|
|
| |
llvm-svn: 283756
|
|
|
|
| |
llvm-svn: 283751
|
|
|
|
|
|
|
|
| |
Fixed copy+paste vector alignment to correct for per-element scalar loads
Increased to 512-bit data sizes in preparation of avx512 tests
llvm-svn: 283748
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
unrolling.
The next code is not vectorized by the SLPVectorizer:
```
int test(unsigned int *p) {
int sum = 0;
for (int i = 0; i < 8; i++)
sum += p[i];
return sum;
}
```
During optimization this loop is fully unrolled and SLPVectorizer is
unable to vectorize it. Patch tries to fix this problem.
Differential Revision: https://reviews.llvm.org/D24796
llvm-svn: 283535
|
|
|
|
| |
llvm-svn: 283225
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR30433)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433
There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?
Differential Revision: https://reviews.llvm.org/D25165
llvm-svn: 283119
|