summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Vectorize
Commit message (Collapse)AuthorAgeFilesLines
* [LV] Transform truncations of non-primary induction variablesMatthew Simpson2017-03-271-11/+10
| | | | | | | | | | | | The vectorizer tries to replace truncations of induction variables with new induction variables having the smaller type. After r295063, this optimization was applied to all integer induction variables, including non-primary ones. When optimizing the truncation of a non-primary induction variable, we still need to transform the new induction so that it has the correct start value. This should fix PR32419. Reference: https://bugs.llvm.org/show_bug.cgi?id=32419 llvm-svn: 298882
* Revert r298620: [LV] Vectorize GEPsIvan Krasin2017-03-241-117/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes) LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413 Original change description: [LV] Vectorize GEPs This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Original Differential Revision: https://reviews.llvm.org/D30710 llvm-svn: 298735
* [LV] Vectorize GEPsMatthew Simpson2017-03-231-67/+117
| | | | | | | | | | | | | | | | | | | | | This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Differential Revision: https://reviews.llvm.org/D30710 llvm-svn: 298620
* [LV] Delete unneeded scalar GEP creation codeMatthew Simpson2017-03-231-33/+1
| | | | | | | | | | | | | The code for generating scalar base pointers in vectorizeMemoryInstruction is not needed. We currently scalarize all GEPs and maintain the scalarized values in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as that in scalarizeInstruction. The test cases that changed as a result of this patch changed because we were able to reuse the scalarized GEP that we previously generated instead of cloning a new one. Differential Revision: https://reviews.llvm.org/D30587 llvm-svn: 298615
* Test commit accessYi Kong2017-03-211-10/+10
| | | | | | Remove some trailing whitespaces. llvm-svn: 298379
* [LV] Refactor cross-iteration phi's back-patching; NFCGil Rapaport2017-03-141-232/+244
| | | | | | | | | | | | | | This patch refactors the PHisToFix loop as follows: - The loop itself now resides in its own method. - The new method iterates on scalar-loop's header; the PHIsToFix map formerly propagated as an output parameter and filled during phi widening is removed. - The code handling reductions is moved into its own method, similar to the existing fixFirstOrderRecurrence(). Differential Revision: https://reviews.llvm.org/D30755 llvm-svn: 297740
* [LV] Refactor Cost Model's selectVectorizationFactor(); NFCAyal Zaks2017-03-141-73/+132
| | | | | | | | | | | Refactoring Cost Model's selectVectorizationFactor() so that it handles only the selection of the best VF from a pre-computed range of candidate VF's, extracting early-exit criteria and the computation of a MaxVF upper-bound to other methods, all driven by a newly introduced LoopVectorizationPlanner. Differential Revision: https://reviews.llvm.org/D30653 llvm-svn: 297737
* [TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improvedJonas Paulsson2017-03-143-26/+33
| | | | | | | | | | | | | | | | | | | | | getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705
* [LV] Set memcheck metadata also for VF==1Gil Rapaport2017-03-131-5/+1
| | | | | | | | This commit is a follow-up on r297580. It fixes the FIXME added temporarily by that commit to keep the removal of Unroller's specialized version of scalarizeInstruction() an NFC. See https://reviews.llvm.org/D30715 for details. llvm-svn: 297610
* [LV] A unified scalarizeInstruction() for Vectorizer and Unroller; NFCGil Rapaport2017-03-121-66/+8
| | | | | | | | | | | | | | | | Unroller's specialized scalarizeInstruction() is mostly duplicating Vectorizer's variant. OTOH Vectorizer's scalarizeInstruction() already supports the special case of VF==1 except for avoiding mask-bit extraction in that case. This patch removes Unroller's specialized version in favor of a unified method. The only functional difference between the two variants seems to be setting memcheck metadata for loads and stores only in Vectorizer's variant, which is a bug in Unroller. To keep this patch an NFC the unified method doesn't set memcheck metadata for VF==1. Differential Revision: https://reviews.llvm.org/D30715 llvm-svn: 297580
* Test commit.Ayal Zaks2017-03-121-0/+1
| | | | llvm-svn: 297579
* [SLP] Revert everything that has to do with memory access sorting.Michael Kuperstein2017-03-101-122/+57
| | | | | | | | | | | | This reverts r293386, r294027, r294029 and r296411. Turns out the SLP tree isn't actually a "tree" and we don't handle accessing the same packet of loads in several different orders well, causing miscompiles. Revert until we can fix this properly. llvm-svn: 297493
* Add support for DenseMap/DenseSet count and find using const pointersDaniel Berlin2017-03-101-3/+3
| | | | | | | | | | | | | | Summary: Similar to SmallPtrSet, this makes find and count work with both const referneces and const pointers. Reviewers: dblaikie Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30713 llvm-svn: 297424
* [SLP] Mark values in Dot that need to be extractedAdam Nemet2017-03-091-3/+9
| | | | llvm-svn: 297361
* [SLP] Visualize SLP trees with -view-slp-treeAdam Nemet2017-03-081-62/+167
| | | | | | | | | | | | | | | | | Analyzing larger trees is extremely difficult with the current debug output so this adds GraphTraits and DOTGraphTraits on top of the VectorizableTree data structure. We can now display the SLP trees with Graphviz as in https://reviews.llvm.org/F3132765. I decorated the graph where a value needs to be gathered for one reason or another. These are the red nodes. There are other improvement I am planning to make as I work through my case here. For example, I would also like to mark nodes that need to be extracted. Differential Revision: https://reviews.llvm.org/D30731 llvm-svn: 297303
* [LV] Select legal insert point when fixing first-order recurrencesMatthew Simpson2017-03-081-7/+9
| | | | | | | | | | Because IRBuilder performs constant-folding, it's not guaranteed that an instruction in the original loop map to an instruction in the vector loop. It could map to a constant vector instead. The handling of first-order recurrences was incorrectly making this assumption when setting the IRBuilder's insert point. llvm-svn: 297302
* [LV] Consider users that are memory accesses in uniforms expansion stepMatthew Simpson2017-03-071-1/+3
| | | | | | | | | | | When expanding the set of uniform instructions beyond the seed instructions (e.g., consecutive pointers), we mark a new instruction uniform if all its loop-varying users are uniform. We should also allow users that are consecutive or interleaved memory accesses. This fixes cases where we have an instruction that is used as the pointer operand of a consecutive access but also used by a non-memory instruction that later becomes uniform as part of the expansion. llvm-svn: 297179
* [SLP] Revert r296863 due to miscompiles.Michael Kuperstein2017-03-061-79/+72
| | | | | | Details and reproducer are on the email thread for r296863. llvm-svn: 297103
* [SLP] Fixes the bug due to absence of in order uses of scalars which needs ↵Mohammad Shahid2017-03-031-72/+79
| | | | | | | | | | | | | | | | to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for external use of vectorizable scalars based on shuffle mask. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d llvm-svn: 296863
* [LV] Considier non-consecutive but vectorizable accesses for VF selectionMatthew Simpson2017-03-021-3/+10
| | | | | | | | | | | | | | | | When computing the smallest and largest types for selecting the maximum vectorization factor, we currently ignore loads and stores of pointer types if the memory access is non-consecutive. We do this because such accesses must be scalarized regardless of vectorization factor, and thus shouldn't be considered when determining the factor. This patch makes this check less aggressive by also considering non-consecutive accesses that may be vectorized, such as interleaved accesses. Because we don't know at the time of the check if an accesses will certainly be vectorized (this is a cost model decision given a particular VF), we consider all accesses that can potentially be vectorized. Differential Revision: https://reviews.llvm.org/D30305 llvm-svn: 296747
* Revert r296575 "[SLP] Fixes the bug due to absence of in order uses of ↵Hans Wennborg2017-03-011-65/+70
| | | | | | | | scalars which needs to be available" It caused miscompiles, e.g. in Chromium (PR32109). llvm-svn: 296654
* [SLP] Preserve IR flags when vectorizing horizontal reductions.Alexey Bataev2017-03-011-2/+5
| | | | | | | | | | | | | | | | | | Summary: The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math) when converting scalar horizontal reductions instructions into vectors, just like for other vectorized instructions. It doe not include IR propagation for extra arguments, we need to handle original scalar operations for extra args to propagate correct flags. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30418 llvm-svn: 296614
* [SLP] Preserve IR flags for extra args.Alexey Bataev2017-03-011-9/+11
| | | | | | | | | | | | | | | Summary: We should preserve IR flags for extra args. These IR flags should be taken from original scalar operations, not from the reduction operations. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30447 llvm-svn: 296613
* [SLP] Fix for PR32038: extra add of PHI node when it is not required.Alexey Bataev2017-03-011-17/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If horizontal reduction tree starts from the binary operation that is used in PHI node, but this PHI is not used in horizontal reduction, we may end up with extra addition of this PHI node after vectorization. Here is an example: ``` %phi = phi i32 [ %tmp, %end], ... ... %tmp = add i32 %tmp1, %tmp2 end: ``` after vectorization we always have something like: ``` %phi = phi i32 [ %tmp, %end], ... ... %red = extractelement <8 x 32> %vec.red, 0 %tmp = add i32 %red, %phi end: ``` even if `%phi` is not used in reduction tree. Patch considers these PHI nodes as extra arguments and considers them in the final result iff they really used in reduction. Reviewers: mkuper, hfinkel, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30409 llvm-svn: 296606
* [LV] These remark should have been missed remarksAdam Nemet2017-03-011-2/+2
| | | | | | | | The practice in LV is that we emit analysis remarks and then finally report either a missed or applied remark on the final decision whether vectorization is taking place. On this code path, we were closing with an analysis remark. llvm-svn: 296578
* [SLP] Fixes the bug due to absence of in order uses of scalars which needs ↵Mohammad Shahid2017-03-011-70/+65
| | | | | | | | | | | | | | | to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Id1e287f073fa4959713ba545fa4254db5da8b40d llvm-svn: 296575
* Revert "(HEAD, origin/master, origin/HEAD, master) [LV] These should missed ↵Adam Nemet2017-02-281-2/+2
| | | | | | | | | | remarks" This reverts commit r296544. This got committed by accident. llvm-svn: 296546
* [LV] These should missed remarksAdam Nemet2017-02-281-2/+2
| | | | llvm-svn: 296544
* [LV] Merge floating-point and integer induction widening codeMatthew Simpson2017-02-241-65/+92
| | | | | | | | | | | This patch merges the existing floating-point induction variable widening code into the integer induction variable widening code, creating a single set of functions for both kinds of inductions. The primary motivation for doing this is to enable vector phi node creation for floating-point induction variables. Differential Revision: https://reviews.llvm.org/D30211 llvm-svn: 296145
* [LAA] Remove unused LoopAccessReportAdam Nemet2017-02-231-16/+0
| | | | | | | The need for this removed when I converted everything to use the opt-remark classes directly with the streaming interface. llvm-svn: 296017
* [LV] Remove unused VectorizationReportAdam Nemet2017-02-231-15/+0
| | | | | | | The need for this removed when I converted everything to use the opt-remark classes directly with the streaming interface. llvm-svn: 296016
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrongAlexey Bataev2017-02-231-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295972
* Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"Alexey Bataev2017-02-231-21/+14
| | | | | | This reverts commit 7c5141e577d9efd1c8e3087566a38ce6b3a41a84. llvm-svn: 295957
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrongAlexey Bataev2017-02-231-14/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295956
* Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"Alexey Bataev2017-02-231-19/+13
| | | | | | This reverts commit d83c81ee6a8dea662808ac22b396d1bb0595c89d. llvm-svn: 295951
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrongAlexey Bataev2017-02-231-13/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295949
* LoadStoreVectorizer: Split even sized illegal chains properlyMatt Arsenault2017-02-231-3/+6
| | | | | | | | | | | | | | | | | | | | Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933
* Revert r295868 because it breaks a different SLP lit test.Michael Kuperstein2017-02-221-18/+13
| | | | llvm-svn: 295906
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong resultAlexey Bataev2017-02-221-13/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295868
* [LoopVectorize] Added address space check when analysing interleaved accessesKarl-Johan Karlsson2017-02-221-14/+19
| | | | | | | | | | | | | | | | | | | | Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 This reverts r295042 (re-applies r295038) with an additional fix for the buildbot problem. llvm-svn: 295858
* [SLP] Remove unused initial value from the variable, NFC.Alexey Bataev2017-02-221-1/+1
| | | | llvm-svn: 295826
* [SLP] nullptr'ize initial value in `findBuildAggregate()`, NFC.Alexey Bataev2017-02-201-1/+1
| | | | | | Initial value of V is sett nullptr, as it is not used. llvm-svn: 295642
* [SLP] Rework `findBuildAggregate()` from ercursive form to iterative, NFC.Alexey Bataev2017-02-201-9/+12
| | | | | | | | | | Reviewers: mkuper Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30103 llvm-svn: 295641
* [LV] Remove constant restriction for vector phi creationMatthew Simpson2017-02-171-44/+37
| | | | | | | | | | | | We previously only created a vector phi node for an induction variable if its step had a constant integer type. However, the step actually only needs to be loop-invariant. We only handle inductions having loop-invariant steps, so this patch should enable vector phi node creation for all integer induction variables that will be vectorized. Differential Revision: https://reviews.llvm.org/D29956 llvm-svn: 295456
* [LV] Rename Induction to PrimaryInduction. NFC.Michael Kuperstein2017-02-141-12/+12
| | | | llvm-svn: 295111
* Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps"Matthew Simpson2017-02-141-10/+47
| | | | | | | | | | | This reapplies commit r294967 with a fix for the execution time regressions caught by the clang-cmake-aarch64-quick bot. We now extend the truncate optimization to non-primary induction variables only if the truncate isn't already free. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 295063
* [SLP] Fix for PR31879: vectorize repeated scalar ops that don't get putAlexey Bataev2017-02-141-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | back into a vector Previously the cost of the existing ExtractElement/ExtractValue instructions was considered as a dead cost only if it was detected that they have only one use. But these instructions may be considered dead also if users of the instructions are also going to be vectorized, like: ``` %x0 = extractelement <2 x float> %x, i32 0 %x1 = extractelement <2 x float> %x, i32 1 %x0x0 = fmul float %x0, %x0 %x1x1 = fmul float %x1, %x1 %add = fadd float %x0x0, %x1x1 ``` This can be transformed to ``` %1 = fmul <2 x float> %x, %x %2 = extractelement <2 x float> %1, i32 0 %3 = extractelement <2 x float> %1, i32 1 %add = fadd float %2, %3 ``` because though `%x0` and `%x1` have 2 users each other, these users are part of the vectorized tree and we can consider these `extractelement` instructions as dead. Differential Revision: https://reviews.llvm.org/D29900 llvm-svn: 295056
* Revert "[LoopVectorize] Added address space check when analysing interleaved ↵Karl-Johan Karlsson2017-02-141-20/+14
| | | | | | | | | accesses" This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed. I'm reverting to investigate. llvm-svn: 295042
* [LoopVectorize] Added address space check when analysing interleaved accessesKarl-Johan Karlsson2017-02-141-14/+20
| | | | | | | | | | | | | | | | | Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 llvm-svn: 295038
* Revert "[LV] Extend trunc optimization to all IVs with constant integer steps"Matthew Simpson2017-02-131-20/+12
| | | | | | | | This reverts commit r294967. This patch caused execution time slowdowns in a few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot. I'm reverting to investigate. llvm-svn: 294973
OpenPOWER on IntegriCloud