summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Vectorize
Commit message (Collapse)AuthorAgeFilesLines
...
* [SLP] General improvements of SLP vectorization process.Alexey Bataev2017-08-071-106/+109
| | | | | | | | | | | | | | | Summary: Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310255
* [SLPVectorizer] Add extra parameter to setInsertPointAfterBundle to handle ↵Dinar Temirbulatov2017-08-051-23/+54
| | | | | | | | different opcodes, NFCI. Differential Revision: https://reviews.llvm.org/D35769 llvm-svn: 310183
* LV: Don't insert runtime ptr checks on divergent targetsMatt Arsenault2017-08-021-0/+12
| | | | llvm-svn: 309890
* [SLPVectorizer] Generalize interface of functions, NFC.Alexey Bataev2017-08-021-11/+15
| | | | llvm-svn: 309816
* [SLP] Fix for PR31880: shuffle and vectorize repeated scalar ops on ↵Alexey Bataev2017-08-021-0/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | extracted elements Summary: Currently most of the time vectors of extractelement instructions are treated as scalars that must be gathered into vectors. But in some cases, like when we have extractelement instructions from single vector with different constant indeces or from 2 vectors of the same size, we can treat this operations as shuffle of a single vector or blending of 2 vectors. ``` define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) { %x0 = extractelement <2 x i8> %x, i32 0 %y1 = extractelement <2 x i8> %y, i32 1 %x0x0 = mul i8 %x0, %x0 %y1y1 = mul i8 %y1, %y1 %ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0 %ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1 ret <2 x i8> %ins2 } ``` can be converted to something like ``` define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) { %1 = shufflevector <2 x i8> %x, <2 x i8> %y, <2 x i32> <i32 0, i32 3> %2 = mul <2 x i8> %1, %1 ret <2 x i8> %2 } ``` Currently this type of conversion is considered as high cost transformation. Reviewers: mzolotukhin, delena, mkuper, hfinkel, RKSimon Subscribers: ashahid, RKSimon, spatel, llvm-commits Differential Revision: https://reviews.llvm.org/D30200 llvm-svn: 309812
* [SLPVectorizer] Unbreak the build with -Werror.Davide Italiano2017-07-311-3/+3
| | | | | | | GCC was complaining about `&&` within `||` without explicit parentheses. NFCI. llvm-svn: 309606
* [SLP] Initial rework for min/max horizontal reduction vectorization, NFC.Alexey Bataev2017-07-311-47/+158
| | | | | | | | | | | | Summary: All getReductionCost() functions are renamed to getArithmeticReductionCost() + added basic infrastructure to handle non-binary reduction operations. Reviewers: spatel, mzolotukhin, Ayal, mkuper, gilr, hfinkel Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D29402 llvm-svn: 309566
* [Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC.Alexey Bataev2017-07-311-2/+4
| | | | llvm-svn: 309563
* [LV] Avoid redundant operations manipulating masksAyal Zaks2017-07-311-36/+37
| | | | | | | | | | | | | | | | The Loop Vectorizer generates redundant operations when manipulating masks: AND with true, OR with false, compare equal to true. Instead of relying on a subsequent pass to clean them up, this patch avoids generating them. Use null (no-mask) to represent all-one full masks, instead of a constant all-one vector, following the convention of masked gathers and scatters. Preparing for a follow-up VPlan patch in which these mask manipulating operations are modeled using recipes. Differential Revision: https://reviews.llvm.org/D35725 llvm-svn: 309558
* [SLP] Allow vectorization of the instruction from the same basic blocks ↵Alexey Bataev2017-07-281-3/+8
| | | | | | | | | | | | | | | | only, NFC. Summary: After some changes in SLP vectorizer we missed some additional checks to limit the instructions for vectorization. We should not perform analysis of the instructions if the parent of instruction is not the same as the parent of the first instruction in the tree or it was analyzed already. Subscribers: mzolotukhin Differential Revision: https://reviews.llvm.org/D34881 llvm-svn: 309425
* [SLP] Outline code for the check that instruction users are part ofAlexey Bataev2017-07-271-4/+11
| | | | | | vectorization tree, NFC. llvm-svn: 309284
* [SLPVectorizer] Replace E->Scalars to VL0 at vectorizeTree and move comment, ↵Dinar Temirbulatov2017-07-211-4/+3
| | | | | | NFCI. llvm-svn: 308750
* [SLPVectorizer] buildTree_rec replace cast<Instruction>(VL[0]) to VL0, NFCI.Dinar Temirbulatov2017-07-211-4/+4
| | | | llvm-svn: 308745
* [SLPVectorizer] Change canReuseExtract function parameter Opcode from ↵Dinar Temirbulatov2017-07-211-19/+15
| | | | | | unsigned to Value *, NFCI. llvm-svn: 308739
* [LV] Test once if vector trip count is zero, instead of twiceAyal Zaks2017-07-191-44/+20
| | | | | | | | | | | | | | | | | | | | | | | | | Generate a single test to decide if there are enough iterations to jump to the vectorized loop, or else go to the scalar remainder loop. This test compares the Scalar Trip Count: if STC < VF * UF go to the scalar loop. If requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the rest can be used to form vector iterations. So in this case the test checks instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the scalar loop if so. Otherwise the vector loop is entered for at-least one vector iteration. This test covers the case where incrementing the backedge-taken count will overflow leading to an incorrect trip count of zero. In this (rare) case we will also avoid the vector loop and jump to the scalar loop. This patch simplifies the existing tests and effectively removes the basic-block originally named "min.iters.checked", leaving the single test in block "vector.ph". Original observation and initial patch by Evgeny Stupachenko. Differential Revision: https://reviews.llvm.org/D34150 llvm-svn: 308421
* Remove unnecessary cast. NFCI.Simon Pilgrim2017-07-171-1/+1
| | | | llvm-svn: 308166
* [SLPVectorizer] Add an extra parameter to tryScheduleBundle function, NFCI.Dinar Temirbulatov2017-07-151-6/+6
| | | | llvm-svn: 308081
* [SLPVectorizer] Add an extra parameter to alreadyVectorized function, NFCI.Dinar Temirbulatov2017-07-141-8/+8
| | | | llvm-svn: 307996
* [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop ↵Michael Kuperstein2017-07-121-2/+7
| | | | | | | | | conditions. This fixes PR33706. Differential Revision: https://reviews.llvm.org/D35227 llvm-svn: 307837
* [SLPVectorizer] Revert change in cancelScheduling with referencing to ↵Dinar Temirbulatov2017-07-111-1/+1
| | | | | | FirstInBundle, NFCI. llvm-svn: 307667
* [SLPVectorizer] Add an extra parameter to cancelScheduling function, NFCI.Dinar Temirbulatov2017-07-051-22/+23
| | | | llvm-svn: 307158
* Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by ↵Teresa Johnson2017-07-011-1/+1
| | | | | | | | | default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. llvm-svn: 306936
* re-commit r306336: Enable vectorizer-maximize-bandwidth by default.Teresa Johnson2017-07-011-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306935
* revert r306336 for breaking ppc test.Teresa Johnson2017-07-011-1/+1
| | | | llvm-svn: 306934
* Enable vectorizer-maximize-bandwidth by default.Teresa Johnson2017-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306933
* [SLPVectorizer] Add isOdd() helper function, NFCI.Dinar Temirbulatov2017-06-301-2/+7
| | | | llvm-svn: 306887
* [LV] Sink casts to unravel first order recurrenceAyal Zaks2017-06-301-1/+17
| | | | | | | | | | | Check if a single cast is preventing handling a first-order-recurrence Phi, because the scheduling constraints it imposes on the first-order-recurrence shuffle are infeasible; but they can be made feasible by moving the cast downwards. Record such casts and move them when vectorizing the loop. Differential Revision: https://reviews.llvm.org/D33058 llvm-svn: 306884
* [LV] Optimize for size when vectorizing loops with tiny trip countAyal Zaks2017-06-301-29/+30
| | | | | | | | | | | | | | | | | It may be detrimental to vectorize loops with very small trip count, as various costs of the vectorized loop body as well as enclosing overheads including runtime tests and scalar iterations may outweigh the gains of vectorizing. The current cost model measures the cost of the vectorized loop body only, expecting it will amortize other costs, and loops with known or expected very small trip counts are not vectorized at all. This patch allows loops with very small trip counts to be vectorized, but under OptForSize constraints, which ensure the cost of the loop body is dominant, having no runtime guards nor scalar iterations. Patch inspired by D32451. Differential Revision: https://reviews.llvm.org/D34373 llvm-svn: 306803
* Remove the BBVectorize pass.Chandler Carruth2017-06-303-3285/+1
| | | | | | | | | | | | | It served us well, helped kick-start much of the vectorization efforts in LLVM, etc. Its time has come and past. Back in 2014: http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html Time to actually let go and move forward. =] I've updated the release notes both about the removal and the deprecation of the corresponding C API. llvm-svn: 306797
* Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by ↵Daniel Jasper2017-06-301-1/+1
| | | | | | | | | default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. llvm-svn: 306792
* [SLPVectorizer] Moving Entry->NeedToGather check out of inner loop, Dinar Temirbulatov2017-06-291-4/+4
| | | | | | since it is invariant there. NFCI. llvm-svn: 306749
* [SLPVectorizer] Introducing getTreeEntry() helper function [NFC]Dinar Temirbulatov2017-06-291-34/+33
| | | | | | Differential Revision: https://reviews.llvm.org/D34756 llvm-svn: 306655
* [LV] Fix PR33613 - retain order of insertelement per partAyal Zaks2017-06-281-6/+7
| | | | | | | | | | | | r306381 caused PR33613, by reversing the order in which insertelements were generated per unroll part. This patch fixes PR33613 by retraining this order, placing each set of insertelements per part immediately after the last scalar being packed for this part. Includes a test case derived from PR33613. Reference: https://bugs.llvm.org/show_bug.cgi?id=33613 Differential Revision: https://reviews.llvm.org/D34760 llvm-svn: 306575
* re-commit r306336: Enable vectorizer-maximize-bandwidth by default.Dehao Chen2017-06-271-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306473
* Recommitting 306331.Ayal Zaks2017-06-271-287/+300
| | | | | | | Undoing revert 306338 after fixed bug: add metadata to the load instead of the reverse shuffle added to it, retaining the original ValueMap implementation. llvm-svn: 306381
* revert r306336 for breaking ppc test.Dehao Chen2017-06-261-1/+1
| | | | llvm-svn: 306344
* reverting 306331.Ayal Zaks2017-06-261-293/+286
| | | | | | Causes TBAA metadata to be generates on reverse shuffles, investigating. llvm-svn: 306338
* Enable vectorizer-maximize-bandwidth by default.Dehao Chen2017-06-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306336
* [LV] Changing the interface of ValueMap, NFC.Ayal Zaks2017-06-261-286/+293
| | | | | | | | | | | | Instead of providing access to the internal MapStorage holding all Values associated with a given Key, used for setting or resetting them all together, ValueMap keeps its MapStorage internal; its new interface allows getting, setting or resetting a single Value, per part or per part-and-lane. Follows the discussion in https://reviews.llvm.org/D32871. Differential Revision: https://reviews.llvm.org/D34473 llvm-svn: 306331
* Revert "Enable vectorizer-maximize-bandwidth by default."Diana Picus2017-06-221-1/+1
| | | | | | This reverts commit r305960 because it broke self-hosting on AArch64. llvm-svn: 305990
* Enable vectorizer-maximize-bandwidth by default.Dehao Chen2017-06-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 305960
* Improve profile-guided heuristics to use estimated trip count.Taewook Oh2017-06-191-27/+20
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Existing heuristic uses the ratio between the function entry frequency and the loop invocation frequency to find cold loops. However, even if the loop executes frequently, if it has a small trip count per each invocation, vectorization is not beneficial. On the other hand, even if the loop invocation frequency is much smaller than the function invocation frequency, if the trip count is high it is still beneficial to vectorize the loop. This patch uses estimated trip count computed from the profile metadata as a primary metric to determine coldness of the loop. If the estimated trip count cannot be computed, it falls back to the original heuristics. Reviewers: Ayal, mssimpso, mkuper, danielcdh, wmi, tejohnson Reviewed By: tejohnson Subscribers: tejohnson, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D32451 llvm-svn: 305729
* Remove brackets, NFC.Dinar Temirbulatov2017-06-191-4/+2
| | | | llvm-svn: 305706
* [LoopVectorize] Don't preserve nsw/nuw flags on shrunken ops.George Burgess IV2017-06-091-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we're shrinking a binary operation, it may be the case that the new operations wraps where the old didn't. If this happens, the behavior should be well-defined. So, we can't always carry wrapping flags with us when we shrink operations. If we do, we get incorrect optimizations in cases like: void foo(const unsigned char *from, unsigned char *to, int n) { for (int i = 0; i < n; i++) to[i] = from[i] - 128; } which gets optimized to: void foo(const unsigned char *from, unsigned char *to, int n) { for (int i = 0; i < n; i++) to[i] = from[i] | 128; } Because: - InstCombine turned `sub i32 %from.i, 128` into `add nuw nsw i32 %from.i, 128`. - LoopVectorize vectorized the add to be `add nuw nsw <16 x i8>` with a vector full of `i8 128`s - InstCombine took advantage of the fact that the newly-shrunken add "couldn't wrap", and changed the `add` to an `or`. InstCombine seems happy to figure out whether we can add nuw/nsw on its own, so I just decided to drop the flags. There are already a number of places in LoopVectorize where we rely on InstCombine to clean up. llvm-svn: 305053
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-062-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* [LV] Make scalarizeInstruction() non-virtual. NFC.Ayal Zaks2017-06-041-2/+1
| | | | | | | | | | Following the request made in https://reviews.llvm.org/D32871, scalarizeInstruction() which is no longer overridden by InnerLoopUnroller is hereby made non-virtual in InnerLoopVectorizer. Should have been part of r297580 originally. llvm-svn: 304685
* Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.Galina Kistanova2017-06-032-0/+3
| | | | llvm-svn: 304636
* [SLP] Improve comments and naming of functions/variables/members, NFC.Alexey Bataev2017-06-031-91/+59
| | | | | | | | | Fixed some comments, added an additional description of the algorithms, improved readability of the code. Differential revision: https://reviews.llvm.org/D33320 llvm-svn: 304616
* Revert "[SLP] Improve comments and naming of functions/variables/members, NFC."Alexey Bataev2017-06-021-59/+91
| | | | | | This reverts commit 6e311de8b907aa20da9a1a13ab07c3ce2ef4068a. llvm-svn: 304609
* [SLP] Improve comments and naming of functions/variables/members, NFC.Alexey Bataev2017-06-021-91/+59
| | | | | | | | | | | | | | Summary: Fixed some comments, added an additional description of the algorithms, improved readability of the code. Reviewers: anemet Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33320 llvm-svn: 304593
OpenPOWER on IntegriCloud