summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/InstCombine/vec_shuffle.ll
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] remove identity shuffle simplification for mask with undefsSanjay Patel2019-11-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And simultaneously enhance SimplifyDemandedVectorElts() to rcognize that pattern. That preserves some of the old optimizations in IR. Given a shuffle that includes undef elements in an otherwise identity mask like: define <4 x float> @shuffle(<4 x float> %arg) { %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3> ret <4 x float> %shuf } We were simplifying that to the input operand. But as discussed in PR43958: https://bugs.llvm.org/show_bug.cgi?id=43958 ...that means that per-vector-element poison that would be stopped by the shuffle can now leak to the result. Also note that we still have (and there are tests for) the same transform with no undef elements in the mask (a fully-defined identity mask). I don't think there's any controversy about that case - it's a valid transform under any interpretation of shufflevector/undef/poison. Looking at a few of the diffs into codegen, I don't see any difference in final asm. So depending on your perspective, that's good (no real loss of optimization power) or bad (poison exists in the DAG, so we only partially fixed the bug). Differential Revision: https://reviews.llvm.org/D70246
* [ConstantFold] Handle identity folds at top of ConstantFoldBinaryInstFlorian Hahn2019-11-171-14/+10
| | | | | | | | | | | | | | | | | | | | | | Currently we miss folds with undef and identity values for binary ops that do not fold to undef in general. We can generalize the identity simplifications and do them before checking for undef in particular. Alive checks: * OR - https://rise4fun.com/Alive/8OsK * AND - https://rise4fun.com/Alive/e3tE This will also allow us to remove some now redundant cases throughout the function, but I would like to do this as follow-up. That should make tracking down potential issues easier. Reviewers: spatel, RKSimon, lebedev.ri Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D70169
* [InstCombine] Avoid moving ops that do restrict undef across shuffles.Florian Hahn2019-11-131-13/+32
| | | | | | | | | | | | | | | | | | | I think we have to be a bit more careful when it comes to moving ops across shuffles, if the op does restrict undef. For example, without this patch, we would move 'and %v, <0, 0, -1, -1>' over a 'shufflevector %a, undef, <undef, undef, 1, 2>'. As a result, the first 2 lanes of the result are undef after the combine, but they really should be 0, unless I am missing something. For ops that do fold to undef on undef operands, the current behavior should be fine. I've add conservative check OpDoesRestrictUndef, maybe there's a better existing utility? Reviewers: spatel, RKSimon, lebedev.ri Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D70093
* [InstCombine] Precommit shuffle tests for D70093.Florian Hahn2019-11-131-0/+186
|
* [InstCombine] be more careful when transforming a shuffle maskSanjay Patel2019-05-231-0/+17
| | | | | | | | | | | This is reduced from a fuzzer test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14890 Usually, demanded elements should be able to simplify shuffle mask elements that are pointing to undef elements of its source operands, but that doesn't happen in the test case. llvm-svn: 361533
* [InstCombine] fold shuffles of insert_subvectorsSanjay Patel2019-05-221-9/+6
| | | | | | | | | | | | | | | | | This should be a valid exception to the general rule of not creating new shuffle masks in IR... because we already do it. :) Also, DAG combining/legalization will undo this by widening the shuffle back out if needed. Explanation for how we already do this: SLP or vector source can create chains of insert/extract as shown in 1 of the examples from PR16739: https://godbolt.org/z/NlK7rA https://bugs.llvm.org/show_bug.cgi?id=16739 And we expect instcombine or DAGCombine to clean that up by creating relatively simple shuffles. Differential Revision: https://reviews.llvm.org/D62024 llvm-svn: 361338
* [InstCombine] add more tests for shuffle folding; NFCSanjay Patel2019-05-211-0/+26
| | | | | | | | | As discussed in D62024, we want to limit any potential IR transforms of shuffles to cases where we know the SDAG conversion would result in equivalent patterns for these IR variants. llvm-svn: 361317
* [InstCombine] add tests for shuffle of insert subvectors; NFCSanjay Patel2019-05-161-0/+72
| | | | llvm-svn: 360923
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-171-0/+1142
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-171-1142/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [InstCombine] fix crash while trying to narrow a binop of shuffles (PR40734)Sanjay Patel2019-02-151-0/+15
| | | | | | https://bugs.llvm.org/show_bug.cgi?id=40734 llvm-svn: 354144
* [InstCombine] limit extracting shuffle transform based on usesSanjay Patel2019-02-051-2/+2
| | | | | | | | | | As discussed in D53037, this can lead to worse codegen, and we don't generally expect the backend to be able to optimize arbitrary shuffles. If there's only one use of the 1st shuffle, that means it's getting removed, so that should always be safe. llvm-svn: 353235
* [InstCombine] split shuffle test to show extra use constraint; NFCSanjay Patel2019-02-051-3/+14
| | | | | | | As discussed in D53037, this transform can cause codegen problems if the 1st shuffle has multiple uses. llvm-svn: 353233
* [InstCombine] fix undef propagation bug with shuffle+binopSanjay Patel2018-12-031-6/+6
| | | | | | | | | | | | | | When we have a shuffle that extends a source vector with undefs and then do some binop on that, we must make sure that the extra elements remain undef with that binop if we reverse the order of the binop and shuffle. 'or' is probably the easiest example to show the bug because 'or C, undef --> -1' (not undef). But there are other opcode/constant combinations where this is true as shown by the 'shl' test. llvm-svn: 348191
* [InstCombine] add tests for shuffle+binop fold; NFCSanjay Patel2018-12-031-2/+58
| | | | llvm-svn: 348173
* [InstCombine] combine a shuffle and an extract subvector shuffle Sanjay Patel2018-10-141-5/+3
| | | | | | | | | | | | This is part of the missing IR-level folding noted in D52912. This should be ok as a canonicalization because the new shuffle mask can't be any more complicated than the existing shuffle mask. If there's some target where the shorter vector shuffle is not legal, it should just end up expanding to something like the pair of shuffles that we're starting with here. Differential Revision: https://reviews.llvm.org/D53037 llvm-svn: 344476
* revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalizationSanjay Patel2018-10-101-2/+3
| | | | | | This commit accidentally included the diffs from D53057. llvm-svn: 344178
* [InstCombine] reverse 'trunc X to <N x i1>' canonicalizationSanjay Patel2018-10-091-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344082
* [InstCombine] add tests for extract subvector shuffles; NFCSanjay Patel2018-10-091-6/+39
| | | | llvm-svn: 344067
* [InstCombine] allow SimplifyDemandedVectorElts to work with FP binopsSanjay Patel2018-10-031-1/+1
| | | | | | | | | | | | | | | | | We're a long way from D50992 and D51553, but this is where we have to start. We weren't back-propagating undefs into binop constant values for anything but add/sub/mul/and/or/xor. This is likely because we have to be careful about not introducing UB/poison with div/rem/shift. But I suspect we already are getting the poison part wrong for add/sub/mul (although it may not be possible to expose the bug currently because we use SimplifyDemandedVectorElts from a limited set of opcodes). See the discussion/implementation from D48987 and D49047. This patch just enables functionality for FP ops because those do not have UB/poison potential. llvm-svn: 343727
* [InstCombine] allow lengthening of insertelement to eliminate shufflesSanjay Patel2018-09-301-5/+4
| | | | | | | | | | | | | As noted in post-commit comments for D52548, the limitation on increasing vector length can be applied by opcode. As a first step, this patch only allows insertelement to be widened because that has no logical downsides for IR and has little risk of pessimizing codegen. This may cause PR39132 to go into hiding during a full compile, but that bug is not fixed. llvm-svn: 343406
* [InstCombine] add test for vector widening of insertelements; NFCSanjay Patel2018-09-291-0/+15
| | | | | | The test shows a potential overreach with the fix from D52548. llvm-svn: 343378
* [InstCombine] don't propagate wider shufflevector arguments to predecessorsSanjay Patel2018-09-281-10/+15
| | | | | | | | | | | | | | | InstCombine would propagate shufflevector insts that had wider output vectors onto predecessors, which would sometimes push undef's onto the divisor of a div/rem and result in bad codegen. I've fixed this by just banning propagating shufflevector back if the result of the shufflevector is wider than the input vectors. Patch by: @sheredom (Neil Henning) Differential Revision: https://reviews.llvm.org/D52548 llvm-svn: 343329
* [InstCombine] adjust shuffle undef propagation tests; NFCSanjay Patel2018-09-281-0/+29
| | | | | | | | | | | These are the updated baseline tests for D52548 - I'm putting the tests next to the tests where the transform functions as expected, so we can see the intended/unintended consequences. Patch by: @sheredom (Neil Henning) llvm-svn: 343328
* [InstCombine] add fneg variation of shuffle-binop fold; NFCSanjay Patel2018-09-251-0/+11
| | | | | | | | If the fsub in this pattern was replaced by an actual fneg instruction, we would need to add a fold to recognize that because fneg would not be a binop. llvm-svn: 343041
* [InstCombine] allow shuffle+binop canonicalization with widening shufflesSanjay Patel2018-08-271-8/+8
| | | | | | | | This lines up with the behavior of an existing transform where if both operands of the binop are shuffled, we allow moving the binop before the shuffle regardless of whether the shuffle changes the size of the vector. llvm-svn: 340787
* [InstCombine] add tests for shuffle+binop transform; NFCSanjay Patel2018-08-251-0/+52
| | | | llvm-svn: 340683
* [InstCombine] avoid extra poison when moving shift above shuffleSanjay Patel2018-07-091-3/+3
| | | | | | | | | | As discussed in D49047 / D48987, shift-by-undef produces poison, so we can't use undef vector elements in that case.. Note that we need to extend this for poison-generating flags, and there's a proposal to create poison from FMF in D47963, llvm-svn: 336562
* [InstCombine] refine UB-handling in shuffle-binop transformSanjay Patel2018-06-041-19/+19
| | | | | | | | | | | | | | | | | | As noted in rL333782, we can be both better for optimization and safer with this transform: BinOp (shuffle V1, Mask), C --> shuffle (BinOp V1, NewC), Mask The only potentially unsafe-to-speculate binops are integer div/rem. All other binops are always safe (although I don't see a way to assert that in code here). For opcodes like shifts that can produce poison, it can't matter here because we know the lanes with undef are dropped by the subsequent shuffle. Differential Revision: https://reviews.llvm.org/D47686 llvm-svn: 333962
* [InstCombine] call simplify before trying vector foldsSanjay Patel2018-06-021-76/+76
| | | | | | | | | | | | | | | | | | | | As noted in the review thread for rL333782, we could have made a bug harder to hit if we were simplifying instructions before trying other folds. The shuffle transform in question isn't ever a simplification; it's just a canonicalization. So I've renamed that to make that clearer. This is NFCI at this point, but I've regenerated the test file to show the cosmetic value naming difference of using instcombine's RAUW vs. the builder. Possible follow-ups: 1. Move reassociation folds after simplifies too. 2. Refactor common code; we shouldn't have so much repetition. llvm-svn: 333820
* [InstCombine] add more tests for shuffle-binop; NFCSanjay Patel2018-06-021-4/+315
| | | | | | | | As noted in the review thread for rL333782, we're lacking coverage for this transform, so add tests for each binop opcode with constant operand. llvm-svn: 333818
* [InstCombine] fix vector shuffle transform to replace undef elements (PR37648)Sanjay Patel2018-06-011-4/+4
| | | | | | | | | | | | | | This bug: https://bugs.llvm.org/show_bug.cgi?id=37648 ...was created with the enhancement to this transform with rL332479. The urem test shows the disaster potential: any undef divisor lane makes the whole op undef. The test diffs show that vector demanded elements turns some of the potential, but not all, unused binop operands back into undef already. llvm-svn: 333782
* [InstCombine] add tests for broken shuffle transform (PR37648)Sanjay Patel2018-06-011-0/+23
| | | | llvm-svn: 333779
* [InstCombine] allow more binop (shuffle X), C transformsSanjay Patel2018-05-161-6/+19
| | | | | | | | | The canonicalization was restricted to shuffle masks with a 1-to-1 mapping to the constant vector, but that disqualifies the common splat pattern. This is part of solving PR37463: https://bugs.llvm.org/show_bug.cgi?id=37463 llvm-svn: 332479
* [InstCombine] fix binop (shuffle X), C --> shuffle (binop X, C') to check usesSanjay Patel2018-05-151-4/+3
| | | | llvm-svn: 332407
* [InstCombine] add more tests for binop-shuffle; NFCSanjay Patel2018-05-151-5/+58
| | | | | | | The splat pattern is part of PR37463: https://bugs.llvm.org/show_bug.cgi?id=37463 llvm-svn: 332393
* [InstCombine] fix binop-of-shuffles to check usesSanjay Patel2018-05-151-4/+3
| | | | llvm-svn: 332375
* [InstCombine] add multi-use shuffle tests and regenerate checks; NFCSanjay Patel2018-05-151-141/+196
| | | | llvm-svn: 332373
* [InstCombine] regenerate checks; NFCSanjay Patel2016-11-091-159/+185
| | | | llvm-svn: 286402
* [InstCombine] fix propagation of fast-math-flagsSanjay Patel2015-11-241-6/+6
| | | | | | | Noticed while working on D4583: http://reviews.llvm.org/D4583 llvm-svn: 253997
* move a single test case to where most other instcombine shuffle bug test ↵Sanjay Patel2015-11-211-0/+15
| | | | | | cases exist llvm-svn: 253784
* Fix a typoDavid Majnemer2015-04-031-1/+1
| | | | | | CHECK-LABEL had the wrong function name. llvm-svn: 234051
* [InstCombine] Use DataLayout to determine vector element widthDavid Majnemer2015-04-031-0/+8
| | | | | | | | | InstCombine didn't realize that it needs to use DataLayout to determine how wide pointers are. This lead to assertion failures. This fixes PR23113. llvm-svn: 234046
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* InstCombine: Don't try to reorder shuffles where the mask is a ConstantExpr.Benjamin Kramer2014-06-241-0/+9
| | | | | | We can't analyze the individual values of a vector expression. PR20114. llvm-svn: 211581
* Fix the case when reordering shuffle and binop produces a constant.Serge Pavlov2014-05-141-0/+11
| | | | | | This resolves PR19737. llvm-svn: 208762
* Fix type of shuffle resulted from shuffle merge.Serge Pavlov2014-05-131-0/+8
| | | | | | This fix resolves PR19730. llvm-svn: 208666
* Fix type of shuffle obtained from reordering with binary operationSerge Pavlov2014-05-121-0/+11
| | | | | | | | In transformation: BinOp(shuffle(v1,undef), shuffle(v2,undef)) -> shuffle(BinOp(v1, v2),undef) type of the undef argument must be same as type of BinOp. llvm-svn: 208531
* Fix reordering of shuffles and binary operationsSerge Pavlov2014-05-121-0/+12
| | | | | | | | | | | | Do not apply transformation: BinOp(shuffle(v1), shuffle(v2)) -> shuffle(BinOp(v1, v2)) if operands v1 and v2 are of different size. This change fixes PR19717, which was caused by r208488. llvm-svn: 208518
* Reorder shuffle and binary operation.Serge Pavlov2014-05-111-1/+119
| | | | | | | | | | | | | This patch enables transformations: BinOp(shuffle(v1), shuffle(v2)) -> shuffle(BinOp(v1, v2)) BinOp(shuffle(v1), const1) -> shuffle(BinOp, const2) They allow to eliminate extra shuffles in some cases. Differential Revision: http://reviews.llvm.org/D3525 llvm-svn: 208488
OpenPOWER on IntegriCloud