summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Vectorize
Commit message (Collapse)AuthorAgeFilesLines
...
* [NFC][LV][LoopUtil] Move LoopVectorizationLegality to its own fileHideki Saito2018-04-293-1456/+1075
| | | | | | | | | | | | | | | | | | | | | Summary: This is a follow up to D45420 (included here since it is still under review and this change is dependent on that) and D45072 (committed). Actual change for this patch is LoopVectorize* and cmakefile. All others are all from D45420. LoopVectorizationLegality is an analysis and thus really belongs to Analysis tree. It is modular enough and it is reusable enough ---- we can further improve those aspects once uses outside of LV picks up. Hopefully, this will make it easier for people familiar with vectorization theory, but not necessarily LV itself to contribute, by lowering the volume of code they should deal with. We probably should start adding some code in LV to check its own capability (i.e., vectorization is legal but LV is not ready to handle it) and then bail out. Reviewers: rengolin, fhahn, hfinkel, mkuper, aemerson, mssimpso, dcaballe, sguggill Reviewed By: rengolin, dcaballe Subscribers: egarcia, rogfer01, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45552 llvm-svn: 331139
* [LV] Common duplicate vector load/store address calculation (NFC)Daniel Neilson2018-04-271-32/+18
| | | | | | | | Summary: Commoning some obviously copy/paste code in InnerLoopVectorizer::vectorizeMemoryInstruction llvm-svn: 331076
* [LV][VPlan] Detect outer loops for explicit vectorization.Diego Caballero2018-04-242-29/+394
| | | | | | | | | | | | | | | | | Patch #2 from VPlan Outer Loop Vectorization Patch Series #1 (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). This patch introduces the basic infrastructure to detect, legality check and process outer loops annotated with hints for explicit vectorization. All these changes are protected under the feature flag -enable-vplan-native-path. This should make this patch NFC for the existing inner loop vectorizer. Reviewers: hfinkel, mkuper, rengolin, fhahn, aemerson, mssimpso. Differential Revision: https://reviews.llvm.org/D42447 llvm-svn: 330739
* [LoadStoreVectorize] Ignore interleaved invariant loads.Benjamin Kramer2018-04-241-5/+13
| | | | | | | | | The memory location an invariant load is using can never be clobbered by any store, so it's safe to move the load ahead of the store. Differential Revision: https://reviews.llvm.org/D46011 llvm-svn: 330725
* LoadStoreVectorizer crashes due to unsized typeStanislav Mekhanoshin2018-04-171-2/+4
| | | | | | | | | When we skip bitcasts while looking for GEP in LoadSoreVectorizer we should also verify that the type is sized otherwise we assert Differential Revision: https://reviews.llvm.org/D45709 llvm-svn: 330221
* [SLP] Use getExtractWithExtendCost() to compute the scalar cost of ↵Haicheng Wu2018-04-161-1/+17
| | | | | | | | | | | | | extractelement/ext pair We use getExtractWithExtendCost to calculate the cost of extractelement and s|zext together when computing the extract cost after vectorization, but we calculate the cost of extractelement and s|zext separately when computing the scalar cost which is larger than it should be. Differential Revision: https://reviews.llvm.org/D45469 llvm-svn: 330143
* [LV] Introduce TTI::getMinimumVFKrzysztof Parzyszek2018-04-131-0/+7
| | | | | | | | | | | The function getMinimumVF(ElemWidth) will return the minimum VF for a vector with elements of size ElemWidth bits. This value will only apply to targets for which TTI::shouldMaximizeVectorBandwidth returns true. The value of 0 indicates that there is no minimum VF. Differential Revision: https://reviews.llvm.org/D45271 llvm-svn: 330062
* Fix for the buildbot failure. Now-unused private field TTI deleted.Hideki Saito2018-04-101-6/+2
| | | | llvm-svn: 329649
* [NFC][LV] Move InterleaveInfo from Legal to CostModelHideki Saito2018-04-091-57/+56
| | | | | | | | | | | | | | | | | | | | | | Summary: Another clean up, following D43208. Interleaved memory access analysis/optimization has nothing to do with vectorization legality. It doesn't really belong there. On the other hand, cost model certainly has to know about it. In principle, vectorization should proceed like Legality ==> Optimization ==> CostModel ==> CodeGen, and this change just does that, by moving the interleaved access analysis/decision out of Legal, and run it just before CostModel object is created. After this, I can move LoopVectorizationLegality and Hints/Requirements classes into it's own header file, making it shareable within Transform tree. I have the patch already but I don't want to mix with this change. Eventual goal is to move to Analysis tree, but I first need to move RecurrenceDescriptor/InductionDescriptor from Transform/Util/LoopUtil.* to Analysis. Reviewers: rengolin, hfinkel, mkuper, dcaballe, sguggill, fhahn, aemerson Reviewed By: rengolin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45072 llvm-svn: 329645
* [SLP] Fixed formatting, NFC.Alexey Bataev2018-04-031-1/+2
| | | | llvm-svn: 329091
* [SLP] Fix PR36481: vectorize reassociated instructions.Alexey Bataev2018-04-031-92/+228
| | | | | | | | | | | | | | | | | | Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Patch does not support reordering of the repeated instruction, this must be handled in the separate patch. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 329085
* Recommit "[SLP] Fix issues with debug output in the SLP vectorizer."Alexey Bataev2018-04-031-3/+4
| | | | | | | | | | | | | The primary issue here is that using NDEBUG alone isn't enough to guard debug printing -- instead the DEBUG() macro needs to be used so that the specific pass debug logging check is employed. Without this, every asserts-enabled build was printing out information when it hit this. I also fixed another place where we had multiple statements in a DEBUG macro to use {}s to be a bit cleaner. And I fixed a place that used errs() rather than dbgs(). llvm-svn: 329082
* Revert "[SLP] Fix PR36481: vectorize reassociated instructions."Benjamin Kramer2018-04-031-230/+95
| | | | | | This reverts commit r328980 and r329046. Makes the vectorizer crash. llvm-svn: 329071
* [SLP] Fix issues with debug output in the SLP vectorizer.Chandler Carruth2018-04-031-10/+10
| | | | | | | | | | | | | The primary issue here is that using NDEBUG alone isn't enough to guard debug printing -- instead the DEBUG() macro needs to be used so that the specific pass debug logging check is employed. Without this, every asserts-enabled build was printing out information when it hit this. I also fixed another place where we had multiple statements in a DEBUG macro to use {}s to be a bit cleaner. And I fixed a place that used `errs()` rather than `dbgs()`. llvm-svn: 329046
* [SLP] Distinguish "demanded and shrinkable" from "demanded and not ↵Haicheng Wu2018-04-031-17/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shrinkable" values when determining the minimum bitwidth We use two approaches for determining the minimum bitwidth. * Demanded bits * Value tracking If demanded bits doesn't result in a narrower type, we then try value tracking. We need this if we want to root SLP trees with the indices of getelementptr instructions since all the bits of the indices are demanded. But there is a missing piece though. We need to be able to distinguish "demanded and shrinkable" from "demanded and not shrinkable". For example, the bits of %i in %i = sext i32 %e1 to i64 %gep = getelementptr inbounds i64, i64* %p, i64 %i are demanded, but we can shrink %i's type to i32 because it won't change the result of the getelementptr. On the other hand, in %tmp15 = sext i32 %tmp14 to i64 %tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0 it doesn't make sense to shrink %tmp15 and we can skip the value tracking. Ideas are from Matthew Simpson! Differential Revision: https://reviews.llvm.org/D44868 llvm-svn: 329035
* [SLP] Fix PR36481: vectorize reassociated instructions.Alexey Bataev2018-04-021-92/+227
| | | | | | | | | | | | | | | Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 328980
* [LV] Add TTI::shouldMaximizeVectorBandwidth to allow enabling it per targetKrzysztof Parzyszek2018-03-271-1/+2
| | | | | | | | The default implementation returns false and keeps the current behavior. Differential Revision: https://reviews.llvm.org/D44735 llvm-svn: 328632
* [SLP] Stop counting cost of gather sequences with multiple usesMatthew Simpson2018-03-231-1/+22
| | | | | | | | | | | | | | | When building the SLP tree, we look for reuse among the vectorized tree entries. However, each gather sequence is represented by a unique tree entry, even though the sequence may be identical to another one. This means, for example, that a gather sequence with two uses will be counted twice when computing the cost of the tree. We should only count the cost of the definition of a gather sequence rather than its uses. During code generation, the redundant gather sequences are emitted, but we optimize them away with CSE. So it looks like this problem just affects the cost model. Differential Revision: https://reviews.llvm.org/D44742 llvm-svn: 328316
* Fix a couple of layering violations in TransformsDavid Blaikie2018-03-211-1/+1
| | | | | | | | | | | | | Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering. Transforms depends on Transforms/Utils, not the other way around. So remove the header and the "createStripGCRelocatesPass" function declaration (& definition) that is unused and motivated this dependency. Move Transforms/Utils/Local.h into Analysis because it's used by Analysis/MemoryBuiltins.cpp. llvm-svn: 328165
* [LV] Let recordVectorLoopValueForInductionCast to check if IV was created ↵Andrei Elovikov2018-03-201-15/+37
| | | | | | | | | | | | | | | | | | | | | | from the cast. Summary: It turned out to be error-prone to expect the callers to handle that - better to leave the decision to this routine and make the required data to be explicitly passed to the function. This handles the case that was missed in the r322473 and fixes the assert mentioned in PR36524. Reviewers: dorit, mssimpso, Ayal, dcaballe Reviewed By: dcaballe Subscribers: Ka-Ka, hiraditya, dneilson, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D43812 llvm-svn: 327960
* [LV] Test commit. Removing white space.Diego Caballero2018-03-151-1/+1
| | | | | | This is just to check that I have commit access privilege. llvm-svn: 327656
* [CleanUp] Remove NumInstructions field from LoopVectorizer's RegisterUsage ↵Matt Davis2018-03-141-7/+0
| | | | | | | | | | | | | | | | | | | | | struct. Summary: This variable is largely going unused; aside from reporting number of instructions for in DEBUG builds. The only use of NumInstructions is in debug output to represent the LoopSize. That value can be can be misleading as it also includes metadata instructions (e.g., DBG_VALUE) which have no real impact. If we do choose to keep this around, we probably should guard it by a DEBUG macro, as it's not used in production builds. Reviewers: majnemer, congh, rengolin Reviewed By: rengolin Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D44495 llvm-svn: 327589
* [SLP] clean some formatsHaicheng Wu2018-03-131-3/+3
| | | | llvm-svn: 327433
* [NFC] Consolidate six getPointerOperand() utility functions into one placeRenato Golin2018-03-092-46/+28
| | | | | | | | | | | | | | | | | There are six separate instances of getPointerOperand() utility. LoopVectorize.cpp has one of them, and I don't want to create a 7th one while I'm trying to move LoopVectorizationLegality into a separate file (eventual objective is to move it to Analysis tree). See http://lists.llvm.org/pipermail/llvm-dev/2018-February/120999.html for llvm-dev discussions Closes D43323. Patch by Hideki Saito <hideki.saito@intel.com>. llvm-svn: 327173
* [LV] Fix vectorizer's isUniform() abuse triggers assert in SCEVRenato Golin2018-03-091-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes PR36311. See more detailed analysis in https://bugs.llvm.org/show_bug.cgi?id=36311. isUniform() information is recomputed after LV started transforming the underlying IR and that triggered an assert in SCEV. From vectorizer's architectural perspective, such information, while still useful in vector code gen, should not be recomputed after the start of transforming the LLVM IR. Instead, we should collect and cache such information during the analysis phase of LV and use the cached info during code gen. From the symptom perspective, this assert as it stands right now is not very useful. Legality already rejected loops that would trigger the assert. As such, commenting out the assert is NFC from vectorizer's functionality perspective. On top of that, just above the assertion, we check for unit-strided load/store or gather scatter. Addresses can't be uniform below that check. From vectorization theory point of view, we don't have to reject all cases of stores to uniform addresses. Eventually, we should support safe/profitable cases. This patch resolves the issue by removing the useless assertion that is invoking LAA's isUniform() that requires up-to-date DomTree ---- once vector code gen starts modifying CFG, we don't have an up-to-date DomTree. Patch by Hideki Saito <hideki.saito@intel.com>. llvm-svn: 327109
* [AMDGPU] Increased vector length for global/constant loads.Farhana Aleen2018-03-071-2/+10
| | | | | | | | | | | | | | | Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44179 llvm-svn: 326910
* Revert "[AMDGPU] Widened vector length for global/constant address space."Farhana Aleen2018-03-071-10/+2
| | | | | | This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03. llvm-svn: 326907
* [AMDGPU] Widened vector length for global/constant address space.Farhana Aleen2018-03-071-2/+10
| | | | llvm-svn: 326904
* [LoadStoreVectorizer] Differentiate between <1 x T> and TSven van Haastregt2018-03-071-0/+1
| | | | | | | | | | | The LoadStoreVectorizer thought that <1 x T> and T were the same types when merging stores, leading to a crash later. Patch by Erik Hogeman. Differential Revision: https://reviews.llvm.org/D44014 llvm-svn: 326884
* [LV][CFG] Add irreducible CFG detection for outer loopsFlorian Hahn2018-03-021-23/+8
| | | | | | | | | | | | | | | | This patch adds support for detecting outer loops with irreducible control flow in LV. Current detection uses SCCs and only works for innermost loops. This patch adds a utility function that works on any CFG, given its RPO traversal and its LoopInfoBase. This function is a generalization of isIrreducibleCFG from lib/CodeGen/ShrinkWrap.cpp. The code in lib/CodeGen/ShrinkWrap.cpp is also updated to use the new generic utility function. Patch by Diego Caballero <diego.caballero@intel.com> Differential Revision: https://reviews.llvm.org/D40874 llvm-svn: 326568
* [Dominators] Remove verifyDomTree and add some verifying for Post Dom TreesDavid Green2018-02-281-1/+1
| | | | | | | | | | | | Removes verifyDomTree, using assert(verify()) everywhere instead, and changes verify a little to always run IsSameAsFreshTree first in order to print good output when we find errors. Also adds verifyAnalysis for PostDomTrees, which will allow checking of PostDomTrees it the same way we check DomTrees and MachineDomTrees. Differential Revision: https://reviews.llvm.org/D41298 llvm-svn: 326315
* [LV] Move isLegalMasked* functions from Legality to CostModelRenato Golin2018-02-261-101/+124
| | | | | | | | | | | | | | | | | | | | | | | All SIMD architectures can emulate masked load/store/gather/scatter through element-wise condition check, scalar load/store, and insert/extract. Therefore, bailing out of vectorization as legality failure, when they return false, is incorrect. We should proceed to cost model and determine profitability. This patch is to address the vectorizer's architectural limitation described above. As such, I tried to keep the cost model and vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning should be done separately. Please see http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for RFC and the discussions. Closes D43208. Patch by: Hideki Saito <hideki.saito@intel.com> llvm-svn: 326079
* [SLP] Allow vectorization of reversed loads.Alexey Bataev2018-02-141-6/+20
| | | | | | | | | | | | | | Summary: Reversed loads are handled as gathering. But we can just reshuffle these values. Patch adds support for vectorization of reversed loads. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43022 llvm-svn: 325134
* Adding a width of the GEP index to the Data Layout.Elena Demikhovsky2018-02-141-1/+2
| | | | | | | | | | | | | | | | | | Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout. p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits. The index size parameter is optional, if not specified, it is equal to the pointer size. Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width. It works fine if you can convert pointer to integer for address calculation and all registered targets do this. But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout. http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account. This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size. Differential Revision: https://reviews.llvm.org/D42123 llvm-svn: 325102
* [SLP] Take user instructions cost into consideration in insertelement ↵Alexey Bataev2018-02-121-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | vectorization. Summary: For better vectorization result we should take into consideration the cost of the user insertelement instructions when we try to vectorize sequences that build the whole vector. I.e. if we have the following scalar code: ``` <Scalar code> insertelement <ScalarCode>, ... ``` we should consider the cost of the last `insertelement ` instructions as the cost of the scalar code. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42657 llvm-svn: 324893
* [LV] Fix analyzeInterleaving when -pass-remarks enabledMircea Trofin2018-02-101-1/+6
| | | | | | | | | | | | | | | | | | | | | | | Summary: If -pass-remarks=loop-vectorize, atomic ops will be seen by analyzeInterleaving(), even though canVectorizeMemory() == false. This is because we are requesting extra analysis instead of bailing out. In such a case, we end up with a Group in both Load- and StoreGroups, and then we'll try to access freed memory when traversing LoadGroups after having had released the Group when iterating over StoreGroups. The fix is to include mayWriteToMemory() when validating that two instructions are the same kind of memory operation. Reviewers: mssimpso, davidxl Reviewed By: davidxl Subscribers: hsaito, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D43064 llvm-svn: 324786
* Verify profile data confirms large loop trip counts.Mircea Trofin2018-02-071-4/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Loops with inequality comparers, such as: // unsigned bound for (unsigned i = 1; i < bound; ++i) {...} have getSmallConstantMaxTripCount report a large maximum static trip count - in this case, 0xffff fffe. However, profiling info may show that the trip count is much smaller, and thus counter-recommend vectorization. This change: - flips loop-vectorize-with-block-frequency on by default. - validates profiled loop frequency data supports vectorization, when static info appears to not counter-recommend it. Absence of profile data means we rely on static data, just as we've done so far. Reviewers: twoh, mkuper, davidxl, tejohnson, Ayal Reviewed By: davidxl Subscribers: bkramer, llvm-commits Differential Revision: https://reviews.llvm.org/D42946 llvm-svn: 324543
* [SLPVectorizer][NFC] Make a loop more readable.Clement Courbet2018-02-071-7/+5
| | | | llvm-svn: 324482
* [LV] Use Demanded Bits and ValueTracking for reduction type-shrinkingChad Rosier2018-02-041-4/+14
| | | | | | | | | | | | | | | The type-shrinking logic in reduction detection, although narrow in scope, is also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies the approach to rely on the demanded bits and value tracking analyses, if available. We currently perform type-shrinking separately for reductions and other instructions in the loop. Long-term, we should probably think about computing minimal bit widths in a more complete way for the loops we want to vectorize. PR35734 Differential Revision: https://reviews.llvm.org/D42309 llvm-svn: 324195
* Add missing includesDavid Blaikie2018-02-021-0/+3
| | | | llvm-svn: 324040
* [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.Alexey Bataev2018-01-291-133/+382
| | | | | | | | | | | | | | | | | Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323662
* Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements ↵Alexey Bataev2018-01-271-369/+131
| | | | | | | | as shuffle." This reverts commit r323530 to fix possible problems in users code. llvm-svn: 323581
* Revert "[SLP] Removed the warning about unused variable, NFC."Alexey Bataev2018-01-271-1/+1
| | | | | | This reverts commit r323533 to fix possible problems in users code. llvm-svn: 323580
* [SLP] Removed the warning about unused variable, NFC.Alexey Bataev2018-01-261-1/+1
| | | | llvm-svn: 323533
* [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.Alexey Bataev2018-01-261-131/+369
| | | | | | | | | | | | | | | | | Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323530
* [NFC] fix trivial typos in comments and documentsHiroshi Inoue2018-01-261-1/+1
| | | | | | "in in" -> "in", "on on" -> "on" etc. llvm-svn: 323508
* Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements ↵Alexey Bataev2018-01-251-350/+129
| | | | | | | | as shuffle." This reverts commit r323441 to fix buildbots. llvm-svn: 323447
* [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.Alexey Bataev2018-01-251-129/+350
| | | | | | | | | | | | | | | | | Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323441
* Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements ↵Alexey Bataev2018-01-251-352/+130
| | | | | | | | as shuffle." This reverts commit r323430 to fix buildbots. llvm-svn: 323432
* [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.Alexey Bataev2018-01-251-130/+352
| | | | | | | | | | | | | | | | | Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323430
OpenPOWER on IntegriCloud