summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/InstCombine/insert-extract-shuffle.ll
Commit message (Collapse)AuthorAgeFilesLines
* [InstSimplify] fold splat of inserted constant to vector constantSanjay Patel2019-12-151-2/+1
| | | | | | | | | | | | | | | | | shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...> This is another missing shuffle fold pattern uncovered by the shuffle correctness fix from D70246. The problem was visible in the post-commit thread example, but we managed to overcome the limitation for that particular case with D71220. This is something like the inverse of the previous fix - there we didn't demand the inserted scalar, and here we are only demanding an inserted scalar. Differential Revision: https://reviews.llvm.org/D71488
* [InstSimplify] add tests for insert constant + splat; NFCSanjay Patel2019-12-101-0/+13
|
* [InstCombine] replace shuffle's insertelement operand if inserted scalar is ↵Sanjay Patel2019-12-101-2/+6
| | | | | | | | | | | | | | not demanded This pattern is noted as a regression from: D70246 ...where we removed an over-aggressive shuffle simplification. SimplifyDemandedVectorElts fails to catch this case when the insert has multiple uses, so I'm proposing to pattern match the minimal sequence directly. This fold does not conflict with any of our current shuffle undef/poison semantics. Differential Revision: https://reviews.llvm.org/D71220
* [InstCombine] add tests for shuffle with insertelement operand; NFCSanjay Patel2019-12-091-0/+52
|
* [InstCombine] fold extract+insert into identity shuffleSanjay Patel2019-09-081-8/+9
| | | | | | | | | | | | | | | This is similar to the existing fold for splats added with: rL365379 If we can adjust the shuffle mask to include another element in an identity mask (if it changes vector length, that's an extract/insert subvector operation in the backend), then that can eliminate extractelement/insertelement pairs in IR. All targets are expected to lower shuffles with identity masks efficiently. llvm-svn: 371340
* [InstCombine] add tests for insert/extract with identity shuffles; NFCSanjay Patel2019-09-041-0/+92
| | | | llvm-svn: 370901
* [InstCombine] fold insertelement into splat of same scalarSanjay Patel2019-07-081-3/+8
| | | | | | | | | | | | Forming the canonical splat shuffle improves analysis and may allow follow-on transforms (although some possibilities are missing as shown in the test diffs). The backend generically turns these patterns into build_vector, so there should be no codegen regressions. All targets are expected to be able to lower splats efficiently. llvm-svn: 365379
* [InstCombine] add tests for insert of same splatted scalar; NFCSanjay Patel2019-07-081-0/+69
| | | | llvm-svn: 365362
* [InstCombine] canonicalize insert+splat to/from element 0 of vectorSanjay Patel2019-07-081-6/+34
| | | | | | | | | | | We recognize a splat from element 0 in (VectorUtils) llvm::getSplatValue() and also in ShuffleVectorInst::isZeroEltSplatMask(), so this converts to that form for better matching. The backend generically turns these patterns into build_vector, so there should be no codegen difference. llvm-svn: 365342
* [InstCombine] fix typo in test; NFCSanjay Patel2019-07-081-3/+5
| | | | | | I added this test in rL365325, but didn't mean to create an undef insert. llvm-svn: 365333
* [InstCombine] add tests for splat shuffles; NFCSanjay Patel2019-07-081-0/+43
| | | | llvm-svn: 365325
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-171-0/+427
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-171-427/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [InstCombine] try to turn shuffle into insertelementSanjay Patel2018-10-301-20/+19
| | | | | | | | | | | | | | | | | | | | | | shuffle (insert ?, Scalar, IndexC), V1, Mask --> insert V1, Scalar, IndexC' The motivating case is at least a couple of steps away: I noticed that SLPVectorizer does not analyze shuffles as well as sequences of insert/extract in PR34724: https://bugs.llvm.org/show_bug.cgi?id=34724 ...so SLP may fail to vectorize when source code has shuffles to start with or instcombine has converted insert/extract to shuffles. Independent of that, an insertelement is always a simpler op for IR analysis vs. a shuffle, so we should transform to insert when possible. I don't think there's any codegen concern here - if a target can't insert a scalar directly to some fixed element in a vector (x86?), then this should get expanded to the insert+shuffle that we started with. Differential Revision: https://reviews.llvm.org/D53507 llvm-svn: 345607
* [InstCombine] add tests for shuffle+insert folds; NFCSanjay Patel2018-10-221-0/+123
| | | | llvm-svn: 344908
* [InstCombine] add test for possible shuffle fold; NFCSanjay Patel2018-10-201-31/+51
| | | | llvm-svn: 344860
* [InstCombine] Fix extractelement use before defSven van Haastregt2017-06-051-0/+23
| | | | | | | | | | | | This fixes a bug that can cause extractelements with operands that haven't been defined yet to be inserted at a wrong point when optimising insertelements. Patch by Karl Hylen. Differential Revision: https://reviews.llvm.org/D33449 llvm-svn: 304701
* [InstCombine] canonicalize insertelement of scalar constant ahead of ↵Sanjay Patel2017-03-221-7/+3
| | | | | | | | | | | | | | insertelement of variable insertelement (insertelement X, Y, IdxC1), ScalarC, IdxC2 --> insertelement (insertelement X, ScalarC, IdxC2), Y, IdxC1 As noted in the code comment and seen in the test changes, the motivation is that by pulling constant insertion up, we may be able to constant fold some insertelement instructions. Differential Revision: https://reviews.llvm.org/D31196 llvm-svn: 298520
* [InstCombine] avoid infinite loop from shuffle-extract-insert sequence (PR30923)Sanjay Patel2016-11-101-0/+27
| | | | | | | | | | | | Removing the limitation in visitInsertElementInst() causes several regressions because we're not prepared to fold sequences of shuffles or inserts and extracts separated by shuffles. Fixing that appears to be a difficult mission because we are purposely trying to avoid creating shuffles with arbitrary shuffle masks because some targets may choke on those. https://llvm.org/bugs/show_bug.cgi?id=30923 llvm-svn: 286423
* [InstCombine] regenerate checks; NFCSanjay Patel2016-11-091-51/+83
| | | | llvm-svn: 286399
* [InstCombine] avoid an insertelement transformation that induces the ↵Sanjay Patel2016-01-291-0/+30
| | | | | | | | | | | opposite extractelement fold (PR26354) We would infinite loop because we created a shufflevector that was wider than needed and then failed to combine that with the insertelement. When subsequently visiting the extractelement from that shuffle, we see that it's unnecessary, delete it, and trigger another visit to the insertelement. llvm-svn: 259236
* [InstCombine] insert a new shuffle in a safe place (PR25999)Sanjay Patel2016-01-081-0/+50
| | | | | | | | Limit this transform to a basic block and guard against PHIs. Hopefully, this fixes the remaining failures in PR25999: https://llvm.org/bugs/show_bug.cgi?id=25999 llvm-svn: 257133
* [InstCombine] insert a new shuffle before its uses (PR26015)Sanjay Patel2016-01-051-0/+53
| | | | | | | | | | | | | | | | Although this solves the test case in PR26015: https://llvm.org/bugs/show_bug.cgi?id=26015 And may solve PR25999: https://llvm.org/bugs/show_bug.cgi?id=25999 ...I suspect this is not the best solution. I think we want to insert the new shuffle just ahead of the earliest ExtractElementInst that we're replacing, but I don't know how that should be implemented. Differential Revision: http://reviews.llvm.org/D15878 llvm-svn: 256857
* [InstCombine] transform more extract/insert pairs into shuffles (PR2109)Sanjay Patel2015-12-241-16/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an extension of the shuffle combining from r203229: http://reviews.llvm.org/rL203229 The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in. The motivation is to finally solve PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109 For that example, the IR becomes: %1 = bitcast <2 x i32>* %P to <2 x float>* %ld1 = load <2 x float>, <2 x float>* %1, align 8 %2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5> ret <4 x float> %i2 And x86 SSE output improves from: movq (%rdi), %xmm1 ## xmm1 = mem[0],zero movdqa %xmm1, %xmm2 shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3] shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0] shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2] shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0] shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0] retq To the almost optimal: movhpd (%rdi), %xmm0 Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples. Differential Revision: http://reviews.llvm.org/D15096 llvm-svn: 256394
* [InstCombine] add tests to show potential vector IR shuffle transformsSanjay Patel2015-11-301-5/+48
| | | | llvm-svn: 254342
* InstCombine: form shuffles from wider range of insert/extractelementsTim Northover2014-03-071-0/+37
Sequences of insertelement/extractelements are sometimes used to build vectorsr; this code tries to put them back together into shuffles, but could only produce a completely uniform shuffle types (<N x T> from two <N x T> sources). This should allow shuffles with different numbers of elements on the input and output sides as well. llvm-svn: 203229
OpenPOWER on IntegriCloud