summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
* [LV] Fix the vector code generation for first order recurrenceAnna Thomas2017-04-132-12/+49
| | | | | | | | | | | | | | | | | | | Summary: In first order recurrences where phi's are used outside the loop, we should generate an additional vector.extract of the second last element from the vectorized phi update. This is because we require the phi itself (which is the value at the second last iteration of the vector loop) and not the phi's update within the loop. Also fix the code gen when we just unroll, but don't vectorize. Fixes PR32396. Reviewers: mssimpso, mkuper, anemet Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D31979 llvm-svn: 300238
* [SystemZ] Fix more target specific testsRenato Golin2017-04-121-0/+2
| | | | llvm-svn: 300081
* [InstCombine] morph an existing instruction instead of creating a new oneSanjay Patel2017-04-121-2/+2
| | | | | | | | | | | | One potential way to make InstCombine (very slightly?) faster is to recycle instructions when possible instead of creating new ones. It's not explicitly stated AFAIK, but we don't consider this an "InstSimplify". We could, however, make a new layer to house transforms like this if that makes InstCombine more manageable (just throwing out an idea; not sure how much opportunity is actually here). Differential Revision: https://reviews.llvm.org/D31863 llvm-svn: 300067
* [LoopVectorizer] Improve handling of branches during cost estimation.Jonas Paulsson2017-04-121-0/+38
| | | | | | | | | | | | | | | The cost for a branch after vectorization is very different depending on if the vectorizer will if-convert the block (branch is eliminated), or if scalarized and predicated blocks will be produced (branch duplicated before each block). There is also the case of remaining scalar branches, such as the back-edge branch. This patch handles these cases differently with TTI based cost estimates. Review: Matthew Simpson https://reviews.llvm.org/D31175 llvm-svn: 300058
* [LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore()Jonas Paulsson2017-04-121-0/+33
| | | | | | | | | | | | | | | | | | | Since SystemZ supports vector element load/store instructions, there is no need for extracts/inserts if a vector load/store gets scalarized. This patch lets Target specify that it supports such instructions by means of a new TTI hook that defaults to false. The use for this is in the LoopVectorizer getScalarizationOverhead() method, which will with this patch produce a smaller sum for a vector load/store on SystemZ. New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll Review: Adam Nemet https://reviews.llvm.org/D30680 llvm-svn: 300056
* [SystemZ] TargetTransformInfo cost functions implemented.Jonas Paulsson2017-04-121-0/+70
| | | | | | | | | | | | | | | | getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052
* [LV] Avoid vectorizing first order recurrence when phi uses are outside loopAnna Thomas2017-04-112-3/+28
| | | | | | | | | | | | | | | | | | In the vectorization of first order recurrence, we vectorize such that the last element in the vector will be the one extracted to pass into the scalar remainder loop. However, this is not true when there is a phi (other than the primary induction variable) is used outside the loop. In such a case, we need the value from the second last iteration (i.e. the phi value), not the last iteration (which would be the phi update). I've added a test case for this. Also see PR32396. A follow up patch would generate the correct code gen for such cases, and turn this vectorization on. Differential Revision: https://reviews.llvm.org/D31910 Reviewers: mssimpso llvm-svn: 299985
* [LV] Move first order recurrence test to common folder. NFCAnna Thomas2017-04-111-0/+0
| | | | llvm-svn: 299969
* Add address space mangling to lifetime intrinsicsMatt Arsenault2017-04-101-12/+12
| | | | | | In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876
* Reapply r298620: [LV] Vectorize GEPsMatthew Simpson2017-04-074-107/+247
| | | | | | | | | | | | | This patch reapplies r298620. The original patch was reverted because of two issues. First, the patch exposed a bug in InstCombine that caused the Chromium builds to fail (PR32414). This issue was fixed in r299017. Second, the patch introduced a bug in the vectorizer's scalars analysis that caused test suite builds to fail on SystemZ. The scalars analysis was too aggressive and marked a memory instruction scalar, even though it was going to be vectorized. This issue has been fixed in the current patch and several new test cases for the scalars analysis have been added. llvm-svn: 299770
* [LV] Make test case more robustMatthew Simpson2017-04-051-33/+49
| | | | | | | | | | This test case depends on the loop being vectorized without forcing the vectorization factor. If the profitability ever changes in the future (due to cost model improvements), the test may no longer work as intended. Instead of checking the resulting IR, we should just check the instruction costs. The costs will be computed regardless if vectorization is profitable. llvm-svn: 299545
* [LV] Transform truncations of non-primary induction variablesMatthew Simpson2017-03-271-0/+45
| | | | | | | | | | | | The vectorizer tries to replace truncations of induction variables with new induction variables having the smaller type. After r295063, this optimization was applied to all integer induction variables, including non-primary ones. When optimizing the truncation of a non-primary induction variable, we still need to transform the new induction so that it has the correct start value. This should fix PR32419. Reference: https://bugs.llvm.org/show_bug.cgi?id=32419 llvm-svn: 298882
* Revert r298620: [LV] Vectorize GEPsIvan Krasin2017-03-243-104/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes) LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413 Original change description: [LV] Vectorize GEPs This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Original Differential Revision: https://reviews.llvm.org/D30710 llvm-svn: 298735
* [LV] Add regression test for r297610Gil Rapaport2017-03-231-0/+36
| | | | | | | | | The new test asserts that scalarized memory operations get memcheck metadata added even if the loop is only unrolled. Differential Revision: https://reviews.llvm.org/D30972 llvm-svn: 298641
* [LV] Vectorize GEPsMatthew Simpson2017-03-233-107/+104
| | | | | | | | | | | | | | | | | | | | | This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Differential Revision: https://reviews.llvm.org/D30710 llvm-svn: 298620
* [LV] Delete unneeded scalar GEP creation codeMatthew Simpson2017-03-232-3/+2
| | | | | | | | | | | | | The code for generating scalar base pointers in vectorizeMemoryInstruction is not needed. We currently scalarize all GEPs and maintain the scalarized values in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as that in scalarizeInstruction. The test cases that changed as a result of this patch changed because we were able to reuse the scalarized GEP that we previously generated instead of cloning a new one. Differential Revision: https://reviews.llvm.org/D30587 llvm-svn: 298615
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-1/+1
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* [TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improvedJonas Paulsson2017-03-142-4/+4
| | | | | | | | | | | | | | | | | | | | | getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705
* AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not ↵Changpeng Fang2017-03-092-0/+30
| | | | | | | | | | | | vectorized. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D30719 llvm-svn: 297328
* [LV] Select legal insert point when fixing first-order recurrencesMatthew Simpson2017-03-081-0/+22
| | | | | | | | | | Because IRBuilder performs constant-folding, it's not guaranteed that an instruction in the original loop map to an instruction in the vector loop. It could map to a constant vector instead. The handling of first-order recurrences was incorrectly making this assumption when setting the IRBuilder's insert point. llvm-svn: 297302
* [LV] Make the test case for PR30183 less fragileMatthew Simpson2017-03-081-22/+31
| | | | | | | | This patch also renames the PR number the test points to. The previous reference was PR29559, but that bug was somehow deleted and recreated under PR30183. llvm-svn: 297295
* [LV] Add missing check labels to tests and reformatMatthew Simpson2017-03-081-115/+94
| | | | llvm-svn: 297294
* [LV] Consider users that are memory accesses in uniforms expansion stepMatthew Simpson2017-03-071-0/+50
| | | | | | | | | | | When expanding the set of uniform instructions beyond the seed instructions (e.g., consecutive pointers), we mark a new instruction uniform if all its loop-varying users are uniform. We should also allow users that are consecutive or interleaved memory accesses. This fixes cases where we have an instruction that is used as the pointer operand of a consecutive access but also used by a non-memory instruction that later becomes uniform as part of the expansion. llvm-svn: 297179
* [ARM/AArch64] Update costs for interleaved accesses with wide typesMatthew Simpson2017-03-022-44/+89
| | | | | | | | | After r296750, we're able to match interleaved accesses having types wider than 128 bits. This patch updates the associated TTI costs. Differential Revision: https://reviews.llvm.org/D29675 llvm-svn: 296751
* [LV] Considier non-consecutive but vectorizable accesses for VF selectionMatthew Simpson2017-03-021-0/+33
| | | | | | | | | | | | | | | | When computing the smallest and largest types for selecting the maximum vectorization factor, we currently ignore loads and stores of pointer types if the memory access is non-consecutive. We do this because such accesses must be scalarized regardless of vectorization factor, and thus shouldn't be considered when determining the factor. This patch makes this check less aggressive by also considering non-consecutive accesses that may be vectorized, such as interleaved accesses. Because we don't know at the time of the check if an accesses will certainly be vectorized (this is a cost model decision given a particular VF), we consider all accesses that can potentially be vectorized. Differential Revision: https://reviews.llvm.org/D30305 llvm-svn: 296747
* [LV] These remark should have been missed remarksAdam Nemet2017-03-011-1/+1
| | | | | | | | The practice in LV is that we emit analysis remarks and then finally report either a missed or applied remark on the final decision whether vectorization is taking place. On this code path, we were closing with an analysis remark. llvm-svn: 296578
* [AVX-512] Fix the execution domain for AVX-512 integer broadcasts.Craig Topper2017-02-261-1/+1
| | | | llvm-svn: 296290
* [LV] Merge floating-point and integer induction widening codeMatthew Simpson2017-02-241-63/+56
| | | | | | | | | | | This patch merges the existing floating-point induction variable widening code into the integer induction variable widening code, creating a single set of functions for both kinds of inductions. The primary motivation for doing this is to enable vector phi node creation for floating-point induction variables. Differential Revision: https://reviews.llvm.org/D30211 llvm-svn: 296145
* [LV] Update floating-point induction test checks (NFC)Matthew Simpson2017-02-221-57/+129
| | | | llvm-svn: 295885
* [LV] Add scalar floating-point induction test (NFC)Matthew Simpson2017-02-221-0/+58
| | | | llvm-svn: 295862
* [LoopVectorize] Added address space check when analysing interleaved accessesKarl-Johan Karlsson2017-02-221-0/+37
| | | | | | | | | | | | | | | | | | | | Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 This reverts r295042 (re-applies r295038) with an additional fix for the buildbot problem. llvm-svn: 295858
* Increases full-unroll threshold.Dehao Chen2017-02-181-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge): Performance: 403 0.11% 433 0.51% 445 0.48% 447 3.50% 453 1.49% 464 0.75% Code size: 403 0.56% 433 0.96% 445 2.16% 447 2.96% 453 0.94% 464 8.02% The compiler time overhead is similar with code size. Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc Reviewed By: hfinkel, chandlerc Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D28368 llvm-svn: 295538
* [LV] Remove constant restriction for vector phi creationMatthew Simpson2017-02-171-12/+89
| | | | | | | | | | | | We previously only created a vector phi node for an induction variable if its step had a constant integer type. However, the step actually only needs to be loop-invariant. We only handle inductions having loop-invariant steps, so this patch should enable vector phi node creation for all integer induction variables that will be vectorized. Differential Revision: https://reviews.llvm.org/D29956 llvm-svn: 295456
* Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps"Matthew Simpson2017-02-143-2/+64
| | | | | | | | | | | This reapplies commit r294967 with a fix for the execution time regressions caught by the clang-cmake-aarch64-quick bot. We now extend the truncate optimization to non-primary induction variables only if the truncate isn't already free. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 295063
* Revert "[LoopVectorize] Added address space check when analysing interleaved ↵Karl-Johan Karlsson2017-02-141-37/+0
| | | | | | | | | accesses" This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed. I'm reverting to investigate. llvm-svn: 295042
* [LoopVectorize] Added address space check when analysing interleaved accessesKarl-Johan Karlsson2017-02-141-0/+37
| | | | | | | | | | | | | | | | | Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 llvm-svn: 295038
* Revert "[LV] Extend trunc optimization to all IVs with constant integer steps"Matthew Simpson2017-02-132-34/+2
| | | | | | | | This reverts commit r294967. This patch caused execution time slowdowns in a few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot. I'm reverting to investigate. llvm-svn: 294973
* [LV] Extend trunc optimization to all IVs with constant integer stepsMatthew Simpson2017-02-132-2/+34
| | | | | | | | | | | | | | This patch extends the optimization of truncations whose operand is an induction variable with a constant integer step. Previously we were only applying this optimization to the primary induction variable. However, the cost model assumes the optimization is applied to the truncation of all integer induction variables (even regardless of step type). The transformation is now applied to the other induction variables, and I've updated the cost model to ensure it is better in sync with the transformation we actually perform. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 294967
* [LV/LoopAccess] Check statically if an unknown dependence distance can be Dorit Nuzman2017-02-122-4/+104
| | | | | | | | | | | | | | | | | | | | | | | proven larger than the loop-count This fixes PR31098: Try to resolve statically data-dependences whose compile-time-unknown distance can be proven larger than the loop-count, instead of resorting to runtime dependence checking (which are not always possible). For vectorization it is sufficient to prove that the dependence distance is >= VF; But in some cases we can prune unknown dependence distances early, and even before selecting the VF, and without a runtime test, by comparing the distance against the loop iteration count. Since the vectorized code will be executed only if LoopCount >= VF, proving distance >= LoopCount also guarantees that distance >= VF. This check is also equivalent to the Strong SIV Test. Reviewers: mkuper, anemet, sanjoy Differential Revision: https://reviews.llvm.org/D28044 llvm-svn: 294892
* [ARM] Make f16 interleaved accesses expensive.Ahmed Bougacha2017-02-111-0/+31
| | | | | | | | | | | | | There are no vldN/vstN f16 variants, even with +fullfp16. We could use the i16 variants, but, in practice, even with +fullfp16, the f16 sequence leading to the i16 shuffle usually gets scalarized. We'd need to improve our support for f16 codegen before getting there. Teach the cost model to consider f16 interleaved operations as expensive. Otherwise, we are all but guaranteed to end up with a large block of scalarized vector code. llvm-svn: 294819
* Encode duplication factor from loop vectorization and loop unrolling to ↵Dehao Chen2017-02-101-0/+70
| | | | | | | | | | | | | | | | | | | | | discriminator. Summary: This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations. The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default. Reviewers: probinson, aprantl, davidxl, hfinkel, echristo Reviewed By: hfinkel Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26420 llvm-svn: 294782
* [LV] Remove type restriction for vector phi creationMatthew Simpson2017-02-101-4/+12
| | | | | | | | | We previously only created a vector phi node for an induction variable if its type matched the type of the canonical induction variable. Differential Revision: https://reviews.llvm.org/D29776 llvm-svn: 294755
* [Loop Vectorizer] Cost-based decision for vectorization form of memory ↵Elena Demikhovsky2017-02-083-1/+81
| | | | | | | | | | | | | | | | | | instruction. Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction. The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution. This patch includes the following changes: - Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision. - Arrays of Uniform and Scalar values are moved from Legality to Cost Model. - Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form. - Vectorization of memory instruction is performed according to the CM decision. Differential Revision: https://reviews.llvm.org/D27919 llvm-svn: 294503
* [X86] Remove PCOMMIT instruction support since Intel has deprecated this ↵Craig Topper2017-02-081-2/+2
| | | | | | | | instruction with no plans to release products with it. Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction llvm-svn: 294405
* [LV] Add new ARM/AArch64 interleaved access cost model tests (NFC)Matthew Simpson2017-02-072-0/+153
| | | | llvm-svn: 294342
* [LV] Simplify ARM/AArch64 interleaved access cost model tests (NFC)Matthew Simpson2017-02-072-89/+76
| | | | | | | | This patch removes unneeded instructions from the existing ARM/AArch64 interleaved access cost model tests. I'll be adding a similar set of tests in a follow-on patch to increase coverage. llvm-svn: 294336
* [LV] Also port failure remarks to new OptimizationRemarkEmitter APIAdam Nemet2017-02-021-0/+57
| | | | llvm-svn: 293866
* [LV] Fix an issue where forming LCSSA in the place that we did wouldChandler Carruth2017-01-261-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | change the set of uniform instructions in the loop causing an assert failure. The problem is that the legalization checking also builds data structures mapping various facts about the loop body. The immediate cause was the set of uniform instructions. If these then change when LCSSA is formed, the data structures would already have been built and become stale. The included test case triggered an assert in loop vectorize that was reduced out of the new PM's pipeline. The solution is to form LCSSA early enough that no information is cached across the changes made. The only really obvious position is outside of the main logic to vectorize the loop. This also has the advantage of removing one case where forming LCSSA could mutate the loop but we wouldn't track that as a "Changed" state. If it is significantly advantageous to do some legalization checking prior to this, we can do a more careful positioning but it seemed best to just back off to a safe position first. llvm-svn: 293168
* [X86] enable memory interleaving for X86\SLM arch. Mohammed Agabaria2017-01-251-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D28547 llvm-svn: 293040
* [LV] Run loop-simplify and LCSSA explicitly instead of "requiring" themMichael Kuperstein2017-01-192-8/+58
| | | | | | | | | | | | This changes the vectorizer to explicitly use the loopsimplify and lcssa utils, instead of "requiring" the transformations as if they were analyses. This is not NFC, since it changes the LCSSA behavior - we no longer run LCSSA for all loops, but rather only for the loops we expect to modify. Differential Revision: https://reviews.llvm.org/D28868 llvm-svn: 292456
OpenPOWER on IntegriCloud