summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Vectorize
Commit message (Collapse)AuthorAgeFilesLines
...
* [SLP] Revert everything that has to do with memory access sorting.Michael Kuperstein2017-03-101-122/+57
| | | | | | | | | | | | This reverts r293386, r294027, r294029 and r296411. Turns out the SLP tree isn't actually a "tree" and we don't handle accessing the same packet of loads in several different orders well, causing miscompiles. Revert until we can fix this properly. llvm-svn: 297493
* Add support for DenseMap/DenseSet count and find using const pointersDaniel Berlin2017-03-101-3/+3
| | | | | | | | | | | | | | Summary: Similar to SmallPtrSet, this makes find and count work with both const referneces and const pointers. Reviewers: dblaikie Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30713 llvm-svn: 297424
* [SLP] Mark values in Dot that need to be extractedAdam Nemet2017-03-091-3/+9
| | | | llvm-svn: 297361
* [SLP] Visualize SLP trees with -view-slp-treeAdam Nemet2017-03-081-62/+167
| | | | | | | | | | | | | | | | | Analyzing larger trees is extremely difficult with the current debug output so this adds GraphTraits and DOTGraphTraits on top of the VectorizableTree data structure. We can now display the SLP trees with Graphviz as in https://reviews.llvm.org/F3132765. I decorated the graph where a value needs to be gathered for one reason or another. These are the red nodes. There are other improvement I am planning to make as I work through my case here. For example, I would also like to mark nodes that need to be extracted. Differential Revision: https://reviews.llvm.org/D30731 llvm-svn: 297303
* [LV] Select legal insert point when fixing first-order recurrencesMatthew Simpson2017-03-081-7/+9
| | | | | | | | | | Because IRBuilder performs constant-folding, it's not guaranteed that an instruction in the original loop map to an instruction in the vector loop. It could map to a constant vector instead. The handling of first-order recurrences was incorrectly making this assumption when setting the IRBuilder's insert point. llvm-svn: 297302
* [LV] Consider users that are memory accesses in uniforms expansion stepMatthew Simpson2017-03-071-1/+3
| | | | | | | | | | | When expanding the set of uniform instructions beyond the seed instructions (e.g., consecutive pointers), we mark a new instruction uniform if all its loop-varying users are uniform. We should also allow users that are consecutive or interleaved memory accesses. This fixes cases where we have an instruction that is used as the pointer operand of a consecutive access but also used by a non-memory instruction that later becomes uniform as part of the expansion. llvm-svn: 297179
* [SLP] Revert r296863 due to miscompiles.Michael Kuperstein2017-03-061-79/+72
| | | | | | Details and reproducer are on the email thread for r296863. llvm-svn: 297103
* [SLP] Fixes the bug due to absence of in order uses of scalars which needs ↵Mohammad Shahid2017-03-031-72/+79
| | | | | | | | | | | | | | | | to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for external use of vectorizable scalars based on shuffle mask. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d llvm-svn: 296863
* [LV] Considier non-consecutive but vectorizable accesses for VF selectionMatthew Simpson2017-03-021-3/+10
| | | | | | | | | | | | | | | | When computing the smallest and largest types for selecting the maximum vectorization factor, we currently ignore loads and stores of pointer types if the memory access is non-consecutive. We do this because such accesses must be scalarized regardless of vectorization factor, and thus shouldn't be considered when determining the factor. This patch makes this check less aggressive by also considering non-consecutive accesses that may be vectorized, such as interleaved accesses. Because we don't know at the time of the check if an accesses will certainly be vectorized (this is a cost model decision given a particular VF), we consider all accesses that can potentially be vectorized. Differential Revision: https://reviews.llvm.org/D30305 llvm-svn: 296747
* Revert r296575 "[SLP] Fixes the bug due to absence of in order uses of ↵Hans Wennborg2017-03-011-65/+70
| | | | | | | | scalars which needs to be available" It caused miscompiles, e.g. in Chromium (PR32109). llvm-svn: 296654
* [SLP] Preserve IR flags when vectorizing horizontal reductions.Alexey Bataev2017-03-011-2/+5
| | | | | | | | | | | | | | | | | | Summary: The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math) when converting scalar horizontal reductions instructions into vectors, just like for other vectorized instructions. It doe not include IR propagation for extra arguments, we need to handle original scalar operations for extra args to propagate correct flags. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30418 llvm-svn: 296614
* [SLP] Preserve IR flags for extra args.Alexey Bataev2017-03-011-9/+11
| | | | | | | | | | | | | | | Summary: We should preserve IR flags for extra args. These IR flags should be taken from original scalar operations, not from the reduction operations. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30447 llvm-svn: 296613
* [SLP] Fix for PR32038: extra add of PHI node when it is not required.Alexey Bataev2017-03-011-17/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If horizontal reduction tree starts from the binary operation that is used in PHI node, but this PHI is not used in horizontal reduction, we may end up with extra addition of this PHI node after vectorization. Here is an example: ``` %phi = phi i32 [ %tmp, %end], ... ... %tmp = add i32 %tmp1, %tmp2 end: ``` after vectorization we always have something like: ``` %phi = phi i32 [ %tmp, %end], ... ... %red = extractelement <8 x 32> %vec.red, 0 %tmp = add i32 %red, %phi end: ``` even if `%phi` is not used in reduction tree. Patch considers these PHI nodes as extra arguments and considers them in the final result iff they really used in reduction. Reviewers: mkuper, hfinkel, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30409 llvm-svn: 296606
* [LV] These remark should have been missed remarksAdam Nemet2017-03-011-2/+2
| | | | | | | | The practice in LV is that we emit analysis remarks and then finally report either a missed or applied remark on the final decision whether vectorization is taking place. On this code path, we were closing with an analysis remark. llvm-svn: 296578
* [SLP] Fixes the bug due to absence of in order uses of scalars which needs ↵Mohammad Shahid2017-03-011-70/+65
| | | | | | | | | | | | | | | to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Id1e287f073fa4959713ba545fa4254db5da8b40d llvm-svn: 296575
* Revert "(HEAD, origin/master, origin/HEAD, master) [LV] These should missed ↵Adam Nemet2017-02-281-2/+2
| | | | | | | | | | remarks" This reverts commit r296544. This got committed by accident. llvm-svn: 296546
* [LV] These should missed remarksAdam Nemet2017-02-281-2/+2
| | | | llvm-svn: 296544
* [LV] Merge floating-point and integer induction widening codeMatthew Simpson2017-02-241-65/+92
| | | | | | | | | | | This patch merges the existing floating-point induction variable widening code into the integer induction variable widening code, creating a single set of functions for both kinds of inductions. The primary motivation for doing this is to enable vector phi node creation for floating-point induction variables. Differential Revision: https://reviews.llvm.org/D30211 llvm-svn: 296145
* [LAA] Remove unused LoopAccessReportAdam Nemet2017-02-231-16/+0
| | | | | | | The need for this removed when I converted everything to use the opt-remark classes directly with the streaming interface. llvm-svn: 296017
* [LV] Remove unused VectorizationReportAdam Nemet2017-02-231-15/+0
| | | | | | | The need for this removed when I converted everything to use the opt-remark classes directly with the streaming interface. llvm-svn: 296016
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrongAlexey Bataev2017-02-231-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295972
* Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"Alexey Bataev2017-02-231-21/+14
| | | | | | This reverts commit 7c5141e577d9efd1c8e3087566a38ce6b3a41a84. llvm-svn: 295957
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrongAlexey Bataev2017-02-231-14/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295956
* Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"Alexey Bataev2017-02-231-19/+13
| | | | | | This reverts commit d83c81ee6a8dea662808ac22b396d1bb0595c89d. llvm-svn: 295951
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrongAlexey Bataev2017-02-231-13/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295949
* LoadStoreVectorizer: Split even sized illegal chains properlyMatt Arsenault2017-02-231-3/+6
| | | | | | | | | | | | | | | | | | | | Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933
* Revert r295868 because it breaks a different SLP lit test.Michael Kuperstein2017-02-221-18/+13
| | | | llvm-svn: 295906
* [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong resultAlexey Bataev2017-02-221-13/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295868
* [LoopVectorize] Added address space check when analysing interleaved accessesKarl-Johan Karlsson2017-02-221-14/+19
| | | | | | | | | | | | | | | | | | | | Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 This reverts r295042 (re-applies r295038) with an additional fix for the buildbot problem. llvm-svn: 295858
* [SLP] Remove unused initial value from the variable, NFC.Alexey Bataev2017-02-221-1/+1
| | | | llvm-svn: 295826
* [SLP] nullptr'ize initial value in `findBuildAggregate()`, NFC.Alexey Bataev2017-02-201-1/+1
| | | | | | Initial value of V is sett nullptr, as it is not used. llvm-svn: 295642
* [SLP] Rework `findBuildAggregate()` from ercursive form to iterative, NFC.Alexey Bataev2017-02-201-9/+12
| | | | | | | | | | Reviewers: mkuper Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30103 llvm-svn: 295641
* [LV] Remove constant restriction for vector phi creationMatthew Simpson2017-02-171-44/+37
| | | | | | | | | | | | We previously only created a vector phi node for an induction variable if its step had a constant integer type. However, the step actually only needs to be loop-invariant. We only handle inductions having loop-invariant steps, so this patch should enable vector phi node creation for all integer induction variables that will be vectorized. Differential Revision: https://reviews.llvm.org/D29956 llvm-svn: 295456
* [LV] Rename Induction to PrimaryInduction. NFC.Michael Kuperstein2017-02-141-12/+12
| | | | llvm-svn: 295111
* Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps"Matthew Simpson2017-02-141-10/+47
| | | | | | | | | | | This reapplies commit r294967 with a fix for the execution time regressions caught by the clang-cmake-aarch64-quick bot. We now extend the truncate optimization to non-primary induction variables only if the truncate isn't already free. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 295063
* [SLP] Fix for PR31879: vectorize repeated scalar ops that don't get putAlexey Bataev2017-02-141-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | back into a vector Previously the cost of the existing ExtractElement/ExtractValue instructions was considered as a dead cost only if it was detected that they have only one use. But these instructions may be considered dead also if users of the instructions are also going to be vectorized, like: ``` %x0 = extractelement <2 x float> %x, i32 0 %x1 = extractelement <2 x float> %x, i32 1 %x0x0 = fmul float %x0, %x0 %x1x1 = fmul float %x1, %x1 %add = fadd float %x0x0, %x1x1 ``` This can be transformed to ``` %1 = fmul <2 x float> %x, %x %2 = extractelement <2 x float> %1, i32 0 %3 = extractelement <2 x float> %1, i32 1 %add = fadd float %2, %3 ``` because though `%x0` and `%x1` have 2 users each other, these users are part of the vectorized tree and we can consider these `extractelement` instructions as dead. Differential Revision: https://reviews.llvm.org/D29900 llvm-svn: 295056
* Revert "[LoopVectorize] Added address space check when analysing interleaved ↵Karl-Johan Karlsson2017-02-141-20/+14
| | | | | | | | | accesses" This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed. I'm reverting to investigate. llvm-svn: 295042
* [LoopVectorize] Added address space check when analysing interleaved accessesKarl-Johan Karlsson2017-02-141-14/+20
| | | | | | | | | | | | | | | | | Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 llvm-svn: 295038
* Revert "[LV] Extend trunc optimization to all IVs with constant integer steps"Matthew Simpson2017-02-131-20/+12
| | | | | | | | This reverts commit r294967. This patch caused execution time slowdowns in a few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot. I'm reverting to investigate. llvm-svn: 294973
* [LV] Extend trunc optimization to all IVs with constant integer stepsMatthew Simpson2017-02-131-12/+20
| | | | | | | | | | | | | | This patch extends the optimization of truncations whose operand is an induction variable with a constant integer step. Previously we were only applying this optimization to the primary induction variable. However, the cost model assumes the optimization is applied to the truncation of all integer induction variables (even regardless of step type). The transformation is now applied to the other induction variables, and I've updated the cost model to ensure it is better in sync with the transformation we actually perform. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 294967
* [SLP] Fix for PR31690: Allow using of extra values in horizontalAlexey Bataev2017-02-131-18/+130
| | | | | | | | | | | | | | | | | | | | | | | reductions. Currently, LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also, it supports a vectorization of instructions with some extra arguments, like: ``` float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } ``` Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D29727 llvm-svn: 294934
* Encode duplication factor from loop vectorization and loop unrolling to ↵Dehao Chen2017-02-101-6/+12
| | | | | | | | | | | | | | | | | | | | | discriminator. Summary: This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations. The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default. Reviewers: probinson, aprantl, davidxl, hfinkel, echristo Reviewed By: hfinkel Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26420 llvm-svn: 294782
* [LV] Remove type restriction for vector phi creationMatthew Simpson2017-02-101-2/+1
| | | | | | | | | We previously only created a vector phi node for an induction variable if its type matched the type of the canonical induction variable. Differential Revision: https://reviews.llvm.org/D29776 llvm-svn: 294755
* [Loop Vectorizer] Cost-based decision for vectorization form of memory ↵Elena Demikhovsky2017-02-081-269/+485
| | | | | | | | | | | | | | | | | | instruction. Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction. The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution. This patch includes the following changes: - Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision. - Arrays of Uniform and Scalar values are moved from Legality to Cost Model. - Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form. - Vectorization of memory instruction is performed according to the CM decision. Differential Revision: https://reviews.llvm.org/D27919 llvm-svn: 294503
* [SLP] Revert "Allow using of extra values in horizontal reductions."Michael Kuperstein2017-02-061-67/+12
| | | | | | | | This breaks when one of the extra values is also a scalar that participates in the same vectorization tree which we'll end up reducing. llvm-svn: 294245
* [SLP] Make sortMemAccesses explicitly return an error. NFC.Michael Kuperstein2017-02-031-12/+15
| | | | llvm-svn: 294029
* [SLP] Fix for PR31690: Allow using of extra values in horizontal reductions.Alexey Bataev2017-02-031-12/+67
| | | | | | | | | | | | | | | | | | | | | Currently LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also it supports a vectorization of instructions with some extra arguments, like: float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D28961 llvm-svn: 293994
* [LV] Also port failure remarks to new OptimizationRemarkEmitter APIAdam Nemet2017-02-021-6/+10
| | | | llvm-svn: 293866
* [LV] Move interleaved access helper functions to VectorUtils (NFC)Matthew Simpson2017-02-011-99/+3
| | | | | | | | | | | | This patch moves some helper functions related to interleaved access vectorization out of LoopVectorize.cpp and into VectorUtils.cpp. We would like to use these functions in a follow-on patch that improves interleaved load and store lowering in (ARM/AArch64)ISelLowering.cpp. One of the functions was already duplicated there and has been removed. Differential Revision: https://reviews.llvm.org/D29398 llvm-svn: 293788
* [LoopVectorize] Improve getVectorCallCost() getScalarizationOverhead() call.Jonas Paulsson2017-01-301-19/+8
| | | | | | | | | | | | | | By calling getScalarizationOverhead with the CallInst instead of the types of its arguments, we make sure that only unique call arguments are added to the scalarization cost. getScalarizationOverhead() is extended to handle calls by only passing on the actual call arguments (which is not all the operands). This also eliminates a wrapper function with the same name. review: Hal Finkel llvm-svn: 293459
OpenPOWER on IntegriCloud