summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/InstCombine/vector-casts.ll
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-171-0/+413
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-171-413/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [InstCombine] don't widen an arbitrary sequence of vector ops (PR40032)Sanjay Patel2018-12-171-14/+16
| | | | | | | | | | | | | | | | | | | | | The problem is shown specifically for a case with vector multiply here: https://bugs.llvm.org/show_bug.cgi?id=40032 ...and this might mask the original backend bug for ARM shown in: https://bugs.llvm.org/show_bug.cgi?id=39967 As the test diffs here show, we were (and probably still aren't) doing these kinds of transforms in a principled way. We are producing more or equal wide instructions than we started with in some cases, so we still need to restrict/correct other transforms from overstepping. If there are perf regressions from this change, we can either carve out exceptions to the general IR rules, or improve the backend to do these transforms when we know the transform is profitable. That's probably similar to a change like D55448. Differential Revision: https://reviews.llvm.org/D55744 llvm-svn: 349389
* [InstCombine] add tests for vector widening transforms (PR40032); NFCSanjay Patel2018-12-161-0/+32
| | | | llvm-svn: 349306
* [InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd trySanjay Patel2018-10-101-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-trying r344082 because it unintentionally included extra diffs. Original commit message: icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344181
* revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalizationSanjay Patel2018-10-101-7/+12
| | | | | | This commit accidentally included the diffs from D53057. llvm-svn: 344178
* [InstCombine] reverse 'trunc X to <N x i1>' canonicalizationSanjay Patel2018-10-091-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344082
* [InstCombine] add tests with undef elements; NFCSanjay Patel2018-10-021-0/+30
| | | | | | See discussion in D52747. llvm-svn: 343595
* [InstCombine] add inverse test for vector trunc canonical form; NFCSanjay Patel2018-10-011-2/+14
| | | | llvm-svn: 343529
* [InstCombine] regenerate test checks; NFCSanjay Patel2018-10-011-26/+26
| | | | | | | These files used an old version of the script. We regex more now. llvm-svn: 343527
* [InstCombine] Remove check for sext of vector icmp from shouldOptimizeCastCraig Topper2017-08-221-3/+2
| | | | | | | | | | | | Looks like for 'and' and 'or' we end up performing at least some of the transformations this is bocking in a round about way anyway. For 'and sext(cmp1), sext(cmp2) we end up later turning it into 'select cmp1, sext(cmp2), 0'. Then we optimize that back to sext (and cmp1, cmp2). This is the same result we would have gotten if shouldOptimizeCast hadn't blocked it. We do something analogous for 'or'. With this patch we allow that transformation to happen directly in foldCastedBitwiseLogic. And we now support the same thing for 'xor'. This is definitely opening up many other cases, but since we already went around it for some cases hopefully it's ok. Differential Revision: https://reviews.llvm.org/D36213 llvm-svn: 311508
* [InstCombine] Teach the code that pulls logical operators through constant ↵Craig Topper2017-08-051-2/+2
| | | | | | shifts to handle vector splats too. llvm-svn: 310185
* [InstCombine] Remove the (not (sext)) case from foldBoolSextMaskToSelect and ↵Craig Topper2017-08-041-1/+1
| | | | | | | | | | | | | | | | | | | inline the remaining code to match visitOr Summary: The (not (sext)) case is really (xor (sext), -1) which should have been simplified to (sext (xor, 1)) before we got here. So we shouldn't need to handle it. With that taken care of we only need to two cases so don't need the swap anymore. This makes us in sync with the equivalent code in visitOr so inline this to match. Reviewers: spatel, eli.friedman, majnemer Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36240 llvm-svn: 310063
* [InstCombine] Add test cases for 'or' and 'xor' to match the vector 'and' of ↵Craig Topper2017-08-021-0/+37
| | | | | | | | | | 'sext' of 'cmp' test. When the 'and' test was originally added it was intended to make sure we didn't change it to a sext of and of cmp. But since then the test was changed to expect it to be turned into 'select cmp1, sext cmp2, 0'. Then another optimization was added to turn the select into 'sext (and cmp1, cmp2)' which is exactly the transformation that was being blocked when the test case started. Looks like 'or' gets optimized in a similar way, but not 'xor'. llvm-svn: 309793
* [InstCombine] allow ashr/lshr demanded bits folds with splat constantsSanjay Patel2017-04-201-3/+3
| | | | llvm-svn: 300888
* [InstCombine] shrink truncated insertelement into undef vectorSanjay Patel2017-03-071-0/+32
| | | | | | | | | | | | This is the 2nd part of solving: http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html D30123 moves the trunc ahead of the shuffle, and this moves the trunc ahead of the insertelement. We're limiting this transform to undef rather than any constant to avoid backend problems. Differential Revision: https://reviews.llvm.org/D30137 llvm-svn: 297242
* [InstCombine] add tests for trunc(insertelement); NFCSanjay Patel2017-02-181-0/+56
| | | | llvm-svn: 295553
* [InstCombine] use m_APInt to allow ashr folds for vectors with splat constantsSanjay Patel2017-01-151-15/+3
| | | | llvm-svn: 292064
* clean up tests and auto-generate checksSanjay Patel2016-09-301-56/+124
| | | | llvm-svn: 282896
* [InstCombine] try to fold (select C, (sext A), B) into logical opsNicolai Haehnle2016-08-051-3/+2
| | | | | | | | | | | | | | | | | | | | | | Summary: Turn (select C, (sext A), B) into (sext (select C, A, B')) when A is i1 and B is a compatible constant, also for zext instead of sext. This will then be further folded into logical operations. The transformation would be valid for non-i1 types as well, but other parts of InstCombine prefer to have sext from non-i1 as an operand of select. Motivated by the shader compiler frontend in Mesa for AMDGPU, which emits i32 for boolean operations. With this change, the boolean logic is fully recovered. Reviewers: majnemer, spatel, tstellarAMD Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22747 llvm-svn: 277801
* Fix a crash where a utility function wasn't aware of fcmp vectors and ↵Nick Lewycky2015-08-141-0/+11
| | | | | | created a value with the wrong type. Fixes PR24458! llvm-svn: 245119
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float*> %x, ... ->getelementptr float, <4 x float*> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name *.ll | xargs ./apply.sh From llvm/src/tools/clang: find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll | xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786
* Update Transforms tests to use CHECK-LABEL for easier debugging. No ↵Stephen Lin2013-07-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | functionality change. This update was done with the following bash script: find test/Transforms -name "*.ll" | \ while read NAME; do echo "$NAME" if ! grep -q "^; *RUN: *llc" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \ while read FUNC; do sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP done mv $TEMP $NAME fi done llvm-svn: 186268
* InstCombine: canonicalize sext-and --> selectNadav Rotem2013-01-301-1/+2
| | | | | | | | sext-not-and --> select. Patch by Muhammad Tauqir Ahmad. llvm-svn: 173901
* Stop emitting instructions with the name "tmp" they eat up memory and have ↵Benjamin Kramer2011-09-271-1/+1
| | | | | | | | to be uniqued, without any benefit. If someone prefers %tmp42 to %42, run instnamer. llvm-svn: 140634
* Teach PatternMatch that splat vectors could be floating point as well asNick Lewycky2011-02-151-0/+28
| | | | | | integer. Fixes PR9228! llvm-svn: 125613
* Rename ValueRequiresCast to ShouldOptimizeCast, to better reflectChris Lattner2010-02-111-0/+16
| | | | | | | | | | | | | | what it does. Enhance it to return false to optimizing vector sign extensions from vector comparisions, which is the idiom used to get a splatted vector for a vector comparison. Doing this breaks vector-casts.ll, add some compensating transformations to handle the important case they cover without depending on this canonicalization. This fixes rdar://7434900 a serious pessimization of vector compares. llvm-svn: 95855
* Change tests from "opt %s" to "opt < %s" so that opt doesn't see theDan Gohman2009-09-111-1/+1
| | | | | | | | input filename so that opt doesn't print the input filename in the output so that grep lines in the tests don't unintentionally match strings in the input filename. llvm-svn: 81537
* Use opt -S instead of piping bitcode output through llvm-dis.Dan Gohman2009-09-081-1/+1
| | | | llvm-svn: 81257
* Change these tests to feed the assembly files to opt directly, insteadDan Gohman2009-09-081-1/+1
| | | | | | of using llvm-as, now that opt supports this. llvm-svn: 81226
* merge vector-casts-0.ll into vector-casts.llChris Lattner2009-07-231-0/+56
| | | | llvm-svn: 76864
* Make some existing optimizations that would only trigger on scalarsChris Lattner2009-07-231-2/+31
| | | | | | | | | | | | | | | | | | | | | also apply to vectors. This allows us to compile this: #include <emmintrin.h> __m128i a(__m128 a, __m128 b) { return a==a & b==b; } __m128i b(__m128 a, __m128 b) { return a!=a | b!=b; } to: _a: cmpordps %xmm1, %xmm0 ret _b: cmpunordps %xmm1, %xmm0 ret with clang instead of to a ton of horrible code. llvm-svn: 76863
* convert a test to filecheck format. This fixes an endemic problemChris Lattner2009-07-231-5/+12
| | | | | | | with negative tests: this test wasn't checking what it thought it was because it was grepping .bc, not .ll. llvm-svn: 76861
* rename testChris Lattner2009-07-231-0/+15
| | | | llvm-svn: 76860
* Generalize a few more instcombines to be vector/scalar-independent.Dan Gohman2009-06-161-55/+0
| | | | llvm-svn: 73541
* Support vector casts in more places, fixing a variety of assertionDan Gohman2009-06-151-0/+55
failures. To support this, add some utility functions to Type to help support vector/scalar-independent code. Change ConstantInt::get and ConstantFP::get to support vector types, and add an overload to ConstantInt::get that uses a static IntegerType type, for convenience. Introduce a new getConstant method for ScalarEvolution, to simplify common use cases. llvm-svn: 73431
OpenPOWER on IntegriCloud