| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...>
This is another missing shuffle fold pattern uncovered by the
shuffle correctness fix from D70246.
The problem was visible in the post-commit thread example, but
we managed to overcome the limitation for that particular case
with D71220.
This is something like the inverse of the previous fix - there
we didn't demand the inserted scalar, and here we are only
demanding an inserted scalar.
Differential Revision: https://reviews.llvm.org/D71488
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
not demanded
This pattern is noted as a regression from:
D70246
...where we removed an over-aggressive shuffle simplification.
SimplifyDemandedVectorElts fails to catch this case when the insert has multiple uses,
so I'm proposing to pattern match the minimal sequence directly. This fold does not
conflict with any of our current shuffle undef/poison semantics.
Differential Revision: https://reviews.llvm.org/D71220
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is similar to the existing fold for splats added with:
rL365379
If we can adjust the shuffle mask to include another element
in an identity mask (if it changes vector length, that's an
extract/insert subvector operation in the backend), then that
can eliminate extractelement/insertelement pairs in IR.
All targets are expected to lower shuffles with identity masks
efficiently.
llvm-svn: 371340
|
|
|
|
| |
llvm-svn: 370901
|
|
|
|
|
|
|
|
|
|
|
|
| |
Forming the canonical splat shuffle improves analysis and
may allow follow-on transforms (although some possibilities
are missing as shown in the test diffs).
The backend generically turns these patterns into build_vector,
so there should be no codegen regressions. All targets are
expected to be able to lower splats efficiently.
llvm-svn: 365379
|
|
|
|
| |
llvm-svn: 365362
|
|
|
|
|
|
|
|
|
|
|
| |
We recognize a splat from element 0 in (VectorUtils) llvm::getSplatValue()
and also in ShuffleVectorInst::isZeroEltSplatMask(), so this converts
to that form for better matching.
The backend generically turns these patterns into build_vector,
so there should be no codegen difference.
llvm-svn: 365342
|
|
|
|
|
|
| |
I added this test in rL365325, but didn't mean to create an undef insert.
llvm-svn: 365333
|
|
|
|
| |
llvm-svn: 365325
|
|
|
|
|
|
|
|
| |
The reversion apparently deleted the test/Transforms directory.
Will be re-reverting again.
llvm-svn: 358552
|
|
|
|
|
|
|
|
| |
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shuffle (insert ?, Scalar, IndexC), V1, Mask --> insert V1, Scalar, IndexC'
The motivating case is at least a couple of steps away: I noticed that
SLPVectorizer does not analyze shuffles as well as sequences of
insert/extract in PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724
...so SLP may fail to vectorize when source code has shuffles to start
with or instcombine has converted insert/extract to shuffles.
Independent of that, an insertelement is always a simpler op for IR
analysis vs. a shuffle, so we should transform to insert when possible.
I don't think there's any codegen concern here - if a target can't insert
a scalar directly to some fixed element in a vector (x86?), then this
should get expanded to the insert+shuffle that we started with.
Differential Revision: https://reviews.llvm.org/D53507
llvm-svn: 345607
|
|
|
|
| |
llvm-svn: 344908
|
|
|
|
| |
llvm-svn: 344860
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug that can cause extractelements with operands that
haven't been defined yet to be inserted at a wrong point when
optimising insertelements.
Patch by Karl Hylen.
Differential Revision: https://reviews.llvm.org/D33449
llvm-svn: 304701
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
insertelement of variable
insertelement (insertelement X, Y, IdxC1), ScalarC, IdxC2 -->
insertelement (insertelement X, ScalarC, IdxC2), Y, IdxC1
As noted in the code comment and seen in the test changes, the motivation is that by pulling
constant insertion up, we may be able to constant fold some insertelement instructions.
Differential Revision: https://reviews.llvm.org/D31196
llvm-svn: 298520
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removing the limitation in visitInsertElementInst() causes several regressions
because we're not prepared to fold sequences of shuffles or inserts and extracts
separated by shuffles. Fixing that appears to be a difficult mission because we
are purposely trying to avoid creating shuffles with arbitrary shuffle masks
because some targets may choke on those.
https://llvm.org/bugs/show_bug.cgi?id=30923
llvm-svn: 286423
|
|
|
|
| |
llvm-svn: 286399
|
|
|
|
|
|
|
|
|
|
|
| |
opposite extractelement fold (PR26354)
We would infinite loop because we created a shufflevector that was wider than
needed and then failed to combine that with the insertelement. When subsequently
visiting the extractelement from that shuffle, we see that it's unnecessary,
delete it, and trigger another visit to the insertelement.
llvm-svn: 259236
|
|
|
|
|
|
|
|
| |
Limit this transform to a basic block and guard against PHIs.
Hopefully, this fixes the remaining failures in PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999
llvm-svn: 257133
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although this solves the test case in PR26015:
https://llvm.org/bugs/show_bug.cgi?id=26015
And may solve PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999
...I suspect this is not the best solution. I think we want to insert the new shuffle
just ahead of the earliest ExtractElementInst that we're replacing, but I don't know
how that should be implemented.
Differential Revision: http://reviews.llvm.org/D15878
llvm-svn: 256857
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229
The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.
The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109
For that example, the IR becomes:
%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2
And x86 SSE output improves from:
movq (%rdi), %xmm1 ## xmm1 = mem[0],zero
movdqa %xmm1, %xmm2
shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3]
shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0]
retq
To the almost optimal:
movhpd (%rdi), %xmm0
Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases.
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.
Differential Revision: http://reviews.llvm.org/D15096
llvm-svn: 256394
|
|
|
|
| |
llvm-svn: 254342
|
|
Sequences of insertelement/extractelements are sometimes used to build
vectorsr; this code tries to put them back together into shuffles, but
could only produce a completely uniform shuffle types (<N x T> from two
<N x T> sources).
This should allow shuffles with different numbers of elements on the
input and output sides as well.
llvm-svn: 203229
|