summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] rearrange shuffle-of-binops logic; NFCSanjay Patel2018-06-221-17/+12
| | | | | | | | The commutative matcher makes things more complicated here, and I'm planning an enhancement where this form is more readable. llvm-svn: 335343
* [InstCombine] fix shuffle-of-binops bugSanjay Patel2018-06-211-2/+8
| | | | | | | | | With non-commutative binops, we could be using the same variable value as operand 0 in 1 binop and operand 1 in the other, so we have to check for that possibility and bail out. llvm-svn: 335312
* [InstCombine] fold vector select of binops with constant ops to 1 binop ↵Sanjay Patel2018-06-211-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (PR37806) This is the simplest case from PR37806: https://bugs.llvm.org/show_bug.cgi?id=37806 If we have a common variable operand used in a pair of binops with vector constants that are vector selected together, then we can constant shuffle the constant vectors to eliminate the shuffle instruction. This has some tricky parts that are hopefully addressed in the tests and their respective comments: 1. If the shuffle mask contains an undef element, then that lane of the result is undef: http://llvm.org/docs/LangRef.html#shufflevector-instruction Therefore, we can replace the constant in that lane with an undef value except for div/rem. With div/rem, an undef in the divisor would cause the whole op to be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'. 2. Intersect the wrapping and FMF of the original binops for the new binop. There should be no extra poison or fast-math potential in the new binop that wasn't possible in the original code. 3. Disregard other uses. Given that we're eliminating uses (shortening the dependency chain), I think that's always the right IR canonicalization. But I purposely chose the udiv test to demonstrate the scenario where both intermediate values have other uses because that seems likely worse for codegen with an expensive math op. This seems like a very rare possibility to me, so I don't think it requires a backend patch first. Differential Revision: https://reviews.llvm.org/D48401 llvm-svn: 335283
* [InstCombine] Gracefully handle out of range extractelement indicesSimon Pilgrim2017-12-271-3/+5
| | | | | | | | InstSimplify is responsible for handling these, but we shouldn't just assert here. Reduced from oss-fuzz #4808 test case llvm-svn: 321489
* Reintroduce r320049, r320014 and r319894.Igor Laevsky2017-12-131-0/+4
| | | | | | OpenGL issues should be fixed by now. llvm-svn: 320568
* Revert r320049, r320014 and r319894Igor Laevsky2017-12-121-4/+0
| | | | | | | They were causing failures of the piglit OpenGL tests with AMD GPUs using the Mesa radeonsi driver. llvm-svn: 320466
* [InstCombine] Don't crash on out of bounds index in the insertelementIgor Laevsky2017-12-071-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D40390 llvm-svn: 320049
* [InstCombine] use 'auto' with 'dyn_cast'; NFCSanjay Patel2017-11-271-3/+2
| | | | llvm-svn: 319067
* [Transforms] Fix some Clang-tidy modernize and Include What You Use ↵Eugene Zelenko2017-10-241-3/+26
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 316503
* [InstCombine] fix formatting; NFCSanjay Patel2017-10-091-9/+7
| | | | llvm-svn: 315223
* [InstCombine] remove extract-of-select vector transform (2nd try)Sanjay Patel2017-09-251-33/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 1st attempt at this: https://reviews.llvm.org/rL314117 was reverted at: https://reviews.llvm.org/rL314118 because of bot fails for clang tests that were checking optimized IR. That should be fixed with: https://reviews.llvm.org/rL314144 ...so try again. Original commit message: The transform to convert an extract-of-a-select-of-vectors was added at: https://reviews.llvm.org/rL194013 And a question about the validity of this transform was raised in the review: https://reviews.llvm.org/D1539: ...but not answered AFAICT> Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with the original commit, but they are not regressing even after we remove the transform in this patch. The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do those transforms as canonicalizations. The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301: https://bugs.llvm.org/show_bug.cgi?id=33301 ...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see. Differential Revision: https://reviews.llvm.org/D38006 llvm-svn: 314147
* revert r314117 because there are bogus clang tests that depend on the optimizerSanjay Patel2017-09-251-0/+33
| | | | llvm-svn: 314118
* [InstCombine] remove extract-of-select vector transformSanjay Patel2017-09-251-33/+0
| | | | | | | | | | | | | | | | | | | | | | | The transform to convert an extract-of-a-select-of-vectors was added at: rL194013 And a question about the validity of this transform was raised in the review: https://reviews.llvm.org/D1539: ...but not answered AFAICT> Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with the original commit, but they are not regressing even after we remove the transform in this patch. The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do those transforms as canonicalizations. The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301: https://bugs.llvm.org/show_bug.cgi?id=33301 ...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see. Differential Revision: https://reviews.llvm.org/D38006 llvm-svn: 314117
* [InstCombine] improve demanded vector elements analysis of insertelementSanjay Patel2017-08-311-3/+1
| | | | | | | | | | | | | Recurse instead of returning on the first found optimization. Also, return early in the caller instead of continuing because that allows another round of simplification before we might potentially lose undef information from a shuffle mask by eliminating the shuffle. As noted in the review, we could probably do better and be more efficient by moving all of demanded elements into a separate pass, but this is yet another quick fix to instcombine. Differential Revision: https://reviews.llvm.org/D37236 llvm-svn: 312248
* [InstCombine] Fold insert sequence if first ins has multiple users.Florian Hahn2017-08-301-6/+18
| | | | | | | | | | | | | | | | | | | | | | | Summary: If the first insertelement instruction has multiple users and inserts at position 0, we can re-use this instruction when folding a chain of insertelement instructions. As we need to generate the first insertelement instruction anyways, this should be a strict improvement. We could get rid of the restriction of inserting at position 0 by creating a different shufflemask, but it is probably worth to keep the first insertelement instruction with position 0, as this is easier to do efficiently than at other positions I think. Reviewers: grosser, mkuper, fpetrogalli, efriedma Reviewed By: fpetrogalli Subscribers: gareevroman, llvm-commits Differential Revision: https://reviews.llvm.org/D37064 llvm-svn: 312110
* [InstCombine] Make InstCombine's IRBuilder be passed by reference everywhereCraig Topper2017-07-071-24/+24
| | | | | | | | Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference. This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do. llvm-svn: 307451
* [InstCombine] Pass a proper context instruction to all of the calls into ↵Craig Topper2017-06-091-3/+4
| | | | | | | | | | | | | | | | InstSimplify Summary: This matches the behavior we already had for compares and makes us consistent everywhere. Reviewers: dberlin, hfinkel, spatel Reviewed By: dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33604 llvm-svn: 305049
* [InstCombine] Fix extractelement use before defSven van Haastregt2017-06-051-1/+1
| | | | | | | | | | | | This fixes a bug that can cause extractelements with operands that haven't been defined yet to be inserted at a wrong point when optimising insertelements. Patch by Karl Hylen. Differential Revision: https://reviews.llvm.org/D33449 llvm-svn: 304701
* InstCombine: Use the new SimplifyQuery versions of Simplify*. Use ↵Daniel Berlin2017-04-261-4/+4
| | | | | | AssumptionCache, DominatorTree, TargetLibraryInfo everywhere. llvm-svn: 301464
* InstCombine: Use the InstSimplify hook for shufflevectorZvi Rackover2017-04-041-5/+4
| | | | | | | | | | | | | | Summary: Start using the recently added InstSimplify hook for shuffles in the respective InstCombine visitor. Reviewers: spatel, RKSimon, craig.topper, majnemer Reviewed By: majnemer Subscribers: majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D31526 llvm-svn: 299412
* [InstCombine] canonicalize insertelement of scalar constant ahead of ↵Sanjay Patel2017-03-221-0/+33
| | | | | | | | | | | | | | insertelement of variable insertelement (insertelement X, Y, IdxC1), ScalarC, IdxC2 --> insertelement (insertelement X, ScalarC, IdxC2), Y, IdxC1 As noted in the code comment and seen in the test changes, the motivation is that by pulling constant insertion up, we may be able to constant fold some insertelement instructions. Differential Revision: https://reviews.llvm.org/D31196 llvm-svn: 298520
* InstCombine: fix extraction when performing vector/array punningEugene Leviant2017-02-171-1/+1
| | | | | | Differential revision: https://reviews.llvm.org/D29491 llvm-svn: 295429
* [InstCombine] Use getVectorNumElements instead of explicitly casting to ↵Craig Topper2016-12-291-8/+7
| | | | | | VectorType and calling getNumElements. NFC llvm-svn: 290707
* [InstCombine] Canonicalize insert splat sequences into an insert + shuffleMichael Kuperstein2016-12-281-0/+57
| | | | | | | | | | | This adds a combine that canonicalizes a chain of inserts which broadcasts a value into a single insert + a splat shufflevector. This fixes PR31286. Differential Revision: https://reviews.llvm.org/D27992 llvm-svn: 290641
* Revert @llvm.assume with operator bundles (r289755-r289757)Daniel Jasper2016-12-191-1/+1
| | | | | | | This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086
* Remove the AssumptionCacheHal Finkel2016-12-151-1/+1
| | | | | | | | | After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756
* [InstCombine] avoid infinite loop from shuffle-extract-insert sequence (PR30923)Sanjay Patel2016-11-101-0/+8
| | | | | | | | | | | | Removing the limitation in visitInsertElementInst() causes several regressions because we're not prepared to fold sequences of shuffles or inserts and extracts separated by shuffles. Fixing that appears to be a difficult mission because we are purposely trying to avoid creating shuffles with arbitrary shuffle masks because some targets may choke on those. https://llvm.org/bugs/show_bug.cgi?id=30923 llvm-svn: 286423
* [InstCombine] Fix for PR29124: reduce insertelements to shufflevectorAlexey Bataev2016-09-231-44/+89
| | | | | | | | | | | | | | | | | | | | | | | | | If inserting more than one constant into a vector: define <4 x float> @foo(<4 x float> %x) { %ins1 = insertelement <4 x float> %x, float 1.0, i32 1 %ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2 ret <4 x float> %ins2 } InstCombine could reduce that to a shufflevector: define <4 x float> @goo(<4 x float> %x) { %shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3> ret <4 x float> %shuf } Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e. shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> -> insertelement <4 x float> %v, float 1.0, 1 Differential Revision: https://reviews.llvm.org/D24182 llvm-svn: 282237
* [InsttCombine] fold insertelement of constant into shuffle with constant ↵Sanjay Patel2016-09-021-0/+76
| | | | | | | | | | | | | | | | | operand (PR29126) The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards shrinking that to a single shufflevector. Note that the transform is intentionally limited to shuffles that are equivalent to vector selects to avoid creating arbitrary shuffle masks that may not lower well. This should solve PR29126: https://llvm.org/bugs/show_bug.cgi?id=29126 Differential Revision: https://reviews.llvm.org/D23886 llvm-svn: 280504
* InstCombine: Replace some never-null pointers with references. NFCJustin Bogner2016-08-051-1/+1
| | | | llvm-svn: 277792
* [InstCombine] scalarizePHI should not assume the code it sees has been CSE'dMichael Kuperstein2016-06-061-12/+26
| | | | | | | | | | | | | | scalarizePHI only looked for phis that have exactly two uses - the "latch" use, and an extract. Unfortunately, we can not assume all equivalent extracts are CSE'd, since InstCombine itself may create an extract which is a duplicate of an existing one. This extends it to handle several distinct extracts from the same index. This should fix at least some of the performance regressions from PR27988. Differential Revision: http://reviews.llvm.org/D20983 llvm-svn: 271961
* Fix an issue where fast math flags were dropped during scalarization.Owen Anderson2016-03-011-2/+4
| | | | | | | Most portions of InstCombine properly propagate fast math flags, but apparently the vector scalarization section was overlooked. llvm-svn: 262376
* function names start with a lowercase letter; NFCSanjay Patel2016-02-011-21/+21
| | | | llvm-svn: 259425
* [InstCombine] avoid an insertelement transformation that induces the ↵Sanjay Patel2016-01-291-1/+17
| | | | | | | | | | | opposite extractelement fold (PR26354) We would infinite loop because we created a shufflevector that was wider than needed and then failed to combine that with the insertelement. When subsequently visiting the extractelement from that shuffle, we see that it's unnecessary, delete it, and trigger another visit to the insertelement. llvm-svn: 259236
* [InstCombine] insert a new shuffle in a safe place (PR25999)Sanjay Patel2016-01-081-10/+7
| | | | | | | | Limit this transform to a basic block and guard against PHIs. Hopefully, this fixes the remaining failures in PR25999: https://llvm.org/bugs/show_bug.cgi?id=25999 llvm-svn: 257133
* [InstCombine] insert a new shuffle before its uses (PR26015)Sanjay Patel2016-01-051-8/+21
| | | | | | | | | | | | | | | | Although this solves the test case in PR26015: https://llvm.org/bugs/show_bug.cgi?id=26015 And may solve PR25999: https://llvm.org/bugs/show_bug.cgi?id=25999 ...I suspect this is not the best solution. I think we want to insert the new shuffle just ahead of the earliest ExtractElementInst that we're replacing, but I don't know how that should be implemented. Differential Revision: http://reviews.llvm.org/D15878 llvm-svn: 256857
* [InstCombine] transform more extract/insert pairs into shuffles (PR2109)Sanjay Patel2015-12-241-3/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an extension of the shuffle combining from r203229: http://reviews.llvm.org/rL203229 The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in. The motivation is to finally solve PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109 For that example, the IR becomes: %1 = bitcast <2 x i32>* %P to <2 x float>* %ld1 = load <2 x float>, <2 x float>* %1, align 8 %2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5> ret <4 x float> %i2 And x86 SSE output improves from: movq (%rdi), %xmm1 ## xmm1 = mem[0],zero movdqa %xmm1, %xmm2 shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3] shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0] shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2] shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0] shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0] retq To the almost optimal: movhpd (%rdi), %xmm0 Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples. Differential Revision: http://reviews.llvm.org/D15096 llvm-svn: 256394
* fix typos in comments; NFCSanjay Patel2015-11-291-6/+8
| | | | llvm-svn: 254266
* function names start with a lower case letter; NFCSanjay Patel2015-11-171-20/+20
| | | | llvm-svn: 253348
* use range-based for loop; NFCISanjay Patel2015-11-161-2/+2
| | | | llvm-svn: 253256
* InstCombine: Remove ilist iterator implicit conversions, NFCDuncan P. N. Exon Smith2015-10-131-2/+1
| | | | | | | Stop relying on implicit conversions of ilist iterators in LLVMInstCombine. No functionality change intended. llvm-svn: 250183
* don't repeat function names in comments; NFCSanjay Patel2015-09-091-6/+5
| | | | llvm-svn: 247154
* [InstSimplify] Teach InstSimplify how to simplify extractelementDavid Majnemer2015-07-131-58/+9
| | | | llvm-svn: 242008
* [InstCombine] Use DataLayout to determine vector element widthDavid Majnemer2015-04-031-3/+2
| | | | | | | | | InstCombine didn't realize that it needs to use DataLayout to determine how wide pointers are. This lead to assertion failures. This fixes PR23113. llvm-svn: 234046
* [opaque pointer type] more gep API migrationsDavid Blaikie2015-03-141-1/+2
| | | | | | | | | | Adding nullptr to all the IRBuilder stuff because it's the first thing that fails to build when testing without the back-compat functions, so I'll keep having to re-add these locally for each chunk of migration I do. Might as well check them in to save me the churn. Eventually I'll have to migrate these too, but I'm going breadth-first. llvm-svn: 232270
* DataLayout is mandatory, update the API to reflect it with references.Mehdi Amini2015-03-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Now that the DataLayout is a mandatory part of the module, let's start cleaning the codebase. This patch is a first attempt at doing that. This patch is not exactly NFC as for instance some places were passing a nullptr instead of the DataLayout, possibly just because there was a default value on the DataLayout argument to many functions in the API. Even though it is not purely NFC, there is no change in the validation. I turned as many pointer to DataLayout to references, this helped figuring out all the places where a nullptr could come up. I had initially a local version of this patch broken into over 30 independant, commits but some later commit were cleaning the API and touching part of the code modified in the previous commits, so it seemed cleaner without the intermediate state. Test Plan: Reviewers: echristo Subscribers: llvm-commits From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231740
* InstCombine: extract instead of shuffle when performing vector/array type ↵JF Bastien2015-02-251-5/+116
| | | | | | | | | | | | | punning Summary: SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is. Test Plan: make check Reviewers: jvoung, chandlerc Subscribers: llvm-commits llvm-svn: 230560
* [PM] Rename InstCombine.h to InstCombineInternal.h in preparation forChandler Carruth2015-01-221-1/+1
| | | | | | | | | | | | | | | | creating a non-internal header file for the InstCombine pass. I thought about calling this InstCombiner.h or in some way more clearly associating it with the InstCombiner clas that it is primarily defining, but there are several other utility interfaces defined within this for InstCombine. If, in the course of refactoring, those end up moving elsewhere or going away, it might make more sense to make this the combiner's header alone. Naturally, this is a bikeshed to a certain degree, so feel free to lobby for a different shade of paint if this name just doesn't suit you. llvm-svn: 226783
* fixed some typosSanjay Patel2014-07-071-4/+4
| | | | llvm-svn: 212495
* Fix type of shuffle resulted from shuffle merge.Serge Pavlov2014-05-131-6/+4
| | | | | | This fix resolves PR19730. llvm-svn: 208666
OpenPOWER on IntegriCloud