summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
* [LV] Ensure reverse interleaved group GEPs remain uniformMatthew Simpson2016-09-021-2/+8
| | | | | | | | | | | | | | For uniform instructions, we're only required to generate a scalar value for the first vector lane of each unroll iteration. Thus, if we have a reverse interleaved group, computing the member index off the scalar GEP corresponding to the last vector lane of its pointer operand technically makes the GEP non-uniform. We should compute the member index off the first scalar GEP instead. I've added the updated member index computation to the existing reverse interleaved group test. llvm-svn: 280497
* [LoopVectorizer] Predicate instructions in blocks with several incoming edgesMichael Kuperstein2016-08-302-4/+62
| | | | | | | | | | We don't need to limit predication to blocks that have a single incoming edge, we just need to use the right mask. This fixes PR30172. Differential Revision: https://reviews.llvm.org/D24009 llvm-svn: 280148
* [LV] Move insertelement sequence after scalar definitionsMatthew Simpson2016-08-292-16/+58
| | | | | | | | | | | | | | After r279649 when getting a vector value from VectorLoopValueMap, we create an insertelement sequence on-demand if the value has been scalarized instead of vectorized. We previously inserted this insertelement sequence before the value's first vector user. However, this insert location is problematic if that user is the phi node of a first-order recurrence. With this patch, we move the insertelement sequence after the last scalar instruction we created when scalarizing the value. Thus, the value's vector definition in the new loop will immediately follow its scalar definitions. This should fix PR30183. Reference: https://llvm.org/bugs/show_bug.cgi?id=30183 llvm-svn: 280001
* [Loop Vectorizer] Fixed memory confilict checks.Elena Demikhovsky2016-08-282-11/+11
| | | | | | | | | Fixed a bug in run-time checks for possible memory conflicts inside loop. The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size". Differential revision: https://reviews.llvm.org/D23176 llvm-svn: 279930
* [LV] Unify vector and scalar mapsMatthew Simpson2016-08-244-89/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch unifies the data structures we use for mapping instructions from the original loop to their corresponding instructions in the new loop. Previously, we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap. WidenMap maintained the vector values each instruction from the old loop was represented with, and ScalarIVMap maintained the scalar values each scalarized induction variable was represented with. With this patch, all values created for the new loop are maintained in VectorLoopValueMap. The change allows for several simplifications. Previously, when an instruction was scalarized, we had to insert the scalar values into vectors in order to maintain the mapping in WidenMap. Then, if a user of the scalarized value was also scalar, we had to extract the scalar values from the temporary vector we created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions. If a scalarized value is used by a scalar instruction, the scalar value is used directly. However, if the scalarized value is needed by a vector instruction, we generate the needed insertelement instructions on-demand. A common idiom in several locations in the code (including the scalarization code), is to first get the vector values an instruction from the original loop maps to, and then extract a particular scalar value. This patch adds getScalarValue for this purpose along side getVectorValue as an interface into VectorLoopValueMap. These functions work together to return the requested values if they're available or to produce them if they're not. The mapping has also be made less permissive. Entries can be added to VectorLoopValue map with the new initVector and initScalar functions. getVectorValue has been modified to return a constant reference to the mapped entries. There's no real functional change with this patch; however, in some cases we will generate slightly different code. For example, instead of an insertelement sequence following the definition of an instruction, it will now precede the first use of that instruction. This can be seen in the test case changes. Differential Revision: https://reviews.llvm.org/D23169 llvm-svn: 279649
* [Loop Vectorizer] Support predication of div/remGil Rapaport2016-08-243-27/+267
| | | | | | | | | | div/rem instructions in basic blocks that require predication currently prevent vectorization. This patch extends the existing mechanism for predicating stores to handle other instructions and leverages it to predicate divs and rems. Differential Revision: https://reviews.llvm.org/D22918 llvm-svn: 279620
* [LoopVectorize] Detect loops in the innermost loop before creating ↵Tim Shen2016-08-121-0/+71
| | | | | | | | | | | | | InnerLoopVectorizer InnerLoopVectorizer shouldn't handle a loop with cycles inside the loop body, even if that cycle isn't a natural loop. Fixes PR28541. Differential Revision: https://reviews.llvm.org/D22952 llvm-svn: 278573
* [ValueTracking] Teach computeKnownBits about [su]min/maxDavid Majnemer2016-08-062-7/+7
| | | | | | | Reasoning about a select in terms of a min or max allows us to derive a tigher bound on the result. llvm-svn: 277914
* [LV, X86] Be more optimistic about vectorizing shifts.Michael Kuperstein2016-08-041-0/+23
| | | | | | | | | | | | | | | Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 llvm-svn: 277782
* [LoopVectorize] Change comment for isOutOfScope in collectLoopUniforms, NFCWei Mi2016-08-021-0/+27
| | | | | | | | | Update comment for isOutOfScope and add a testcase for uniform value being used out of scope. Differential Revision: https://reviews.llvm.org/D23073 llvm-svn: 277515
* [LV] Generate both scalar and vector integer induction variablesMatthew Simpson2016-08-022-81/+182
| | | | | | | | | | | | | | | | This patch enables the vectorizer to generate both scalar and vector versions of an integer induction variable for a given loop. Previously, we only generated a scalar induction variable if we knew all its users were going to be scalar. Otherwise, we generated a vector induction variable. In the case of a loop with both scalar and vector users of the induction variable, we would generate the vector induction variable and extract scalar values from it for the scalar users. With this patch, we now generate both versions of the induction variable when there are both scalar and vector users and select which version to use based on whether the user is scalar or vector. Differential Revision: https://reviews.llvm.org/D22869 llvm-svn: 277474
* [LV] Untangle the concepts of uniform and scalarMatthew Simpson2016-08-021-0/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch refactors the logic in collectLoopUniforms and collectValuesToIgnore, untangling the concepts of "uniform" and "scalar". It adds isScalarAfterVectorization along side isUniformAfterVectorization to distinguish the two. Known scalar values include those that are uniform, getelementptr instructions that won't be vectorized, and induction variables and induction variable update instructions whose users are all known to be scalar. This patch includes the following functional changes: - In collectLoopUniforms, we mark uniform the pointer operands of interleaved accesses. Although non-consecutive, these pointers are treated like consecutive pointers during vectorization. - In collectValuesToIgnore, we insert a value into VecValuesToIgnore if it isScalarAfterVectorization rather than isUniformAfterVectorization. This differs from the previous functionaly in that we now add getelementptr instructions that will not be vectorized into VecValuesToIgnore. This patch also removes the ValuesNotWidened set used for induction variable scalarization since, after the above changes, it is now equivalent to isScalarAfterVectorization. Differential Revision: https://reviews.llvm.org/D22867 llvm-svn: 277460
* [AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately ↵Igor Breger2016-08-021-0/+76
| | | | | | | | dataWidth check. Differential Revision: http://reviews.llvm.org/D23055 llvm-svn: 277435
* [AVX-512] Fix a test missed in r277327.Craig Topper2016-08-011-1/+1
| | | | llvm-svn: 277330
* Initial support for vectorization using svml (short vector math library).Matt Masten2016-07-291-0/+185
| | | | | | Differential Revision: https://reviews.llvm.org/D19544 llvm-svn: 277166
* Fix the assertion error in collectLoopUniforms caused by empty Worklist ↵Wei Mi2016-07-271-0/+19
| | | | | | | | | | before expanding. Contributed-by: David Callahan Differential Revision: https://reviews.llvm.org/D22886 llvm-svn: 276943
* [Loop Vectorizer] Handling loops FP induction variables.Elena Demikhovsky2016-07-242-0/+304
| | | | | | | | | | | | | | | | Allowed loop vectorization with secondary FP IVs. Like this: float *A; float x = init; for (int i=0; i < N; ++i) { A[i] = x; x -= fp_inc; } The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute. Differential Revision: https://reviews.llvm.org/D21330 llvm-svn: 276554
* [LV] Move vector int induction update to end of latchMatthew Simpson2016-07-213-14/+15
| | | | | | | | | | | This patch moves the update instruction for vectorized integer induction phi nodes to the end of the latch block. This ensures consistent placement of all induction updates across all the kinds of int inductions we create (scalar, splat vector, or vector phi). Differential Revision: https://reviews.llvm.org/D22416 llvm-svn: 276339
* [OptDiag,LV] Add hotness attribute to applied-optimization remarksAdam Nemet2016-07-211-4/+4
| | | | | | | Test coverage is provided by modifying the function in the FP-math testcase that we are allowed to vectorize. llvm-svn: 276223
* [OptDiag,LV] Add hotness attribute to the derived analysis remarksAdam Nemet2016-07-201-0/+113
| | | | | | | | This includes FPCompute and Aliasing. Testcase is based on no_fpmath.ll. llvm-svn: 276211
* [OptDiag,LV] Add hotness attribute to analysis remarksAdam Nemet2016-07-201-0/+201
| | | | | | | | The earlier change added hotness attribute to missed-optimization remarks. This follows up with the analysis remarks (the ones explaining the reason for the missed optimization). llvm-svn: 276192
* [LV] Add hotness attribute to missed-optimization remarksAdam Nemet2016-07-201-0/+213
| | | | | | | The new OptimizationRemarkEmitter analysis pass is hooked up to both new and old PM passes. llvm-svn: 276080
* Recommit the patch "Use uniforms set to populate VecValuesToIgnore".Wei Mi2016-07-195-17/+96
| | | | | | | | | | | | | | | | | | For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 The recommit fixed the testcase global_alias.ll. llvm-svn: 275936
* Revert rL275912.Wei Mi2016-07-184-93/+14
| | | | llvm-svn: 275915
* Use uniforms set to populate VecValuesToIgnore.Wei Mi2016-07-184-14/+93
| | | | | | | | | | | | | | | | For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 llvm-svn: 275912
* [LV] Allow interleaved accesses in loops with predicated blocksMatthew Simpson2016-07-141-0/+164
| | | | | | | | | | This patch allows the formation of interleaved access groups in loops containing predicated blocks. However, the predicated accesses are prevented from forming groups. Differential Revision: https://reviews.llvm.org/D19694 llvm-svn: 275471
* [LV] Avoid unnecessary IV scalar-to-vector-to-scalar conversionsMatthew Simpson2016-07-142-24/+49
| | | | | | | | | | | | This patch prevents increases in the number of instructions, pre-instcombine, due to induction variable scalarization. An increase in instructions can lead to an increase in the compile-time required to simplify the induction variables. We now maintain a new map for scalarized induction variables to prevent us from converting between the scalar and vector forms. This patch should resolve compile-time regressions seen after r274627. llvm-svn: 275419
* [LV] Remove wrong assumption about LCSSAMichael Kuperstein2016-07-121-0/+25
| | | | | | | | | The LCSSA pass itself will not generate several redundant PHI nodes in a single exit block. However, such redundant PHI nodes don't violate LCSSA form, and may be introduced by passes that preserve LCSSA, and/or preserved by the LCSSA pass itself. So, assuming a single PHI node per exit block is not safe. llvm-svn: 275217
* [X86] Make some cast costs more preciseMichael Kuperstein2016-07-111-2/+2
| | | | | | | | | Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106
* [PM] Port LoopVectorize to the new PM.Sean Silva2016-07-091-0/+1
| | | | llvm-svn: 275000
* Fixed a bug in vectorizing GEP before gather/scatter intrinsic.Elena Demikhovsky2016-07-072-3/+221
| | | | | | | | | | Vectorizing GEP was incorrect and broke SSA in some cases. The patch fixes PR27997 https://llvm.org/bugs/show_bug.cgi?id=27997. Differential revision: http://reviews.llvm.org/D22035 llvm-svn: 274735
* [TTI] The cost model should not assume vector casts get completely scalarizedMichael Kuperstein2016-07-061-3/+3
| | | | | | | | | | | | | | | | The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 llvm-svn: 274642
* [LV] Don't widen trivial induction variablesMatthew Simpson2016-07-064-16/+198
| | | | | | | | | | | | | | | | | | | | We currently always vectorize induction variables. However, if an induction variable is only used for counting loop iterations or computing addresses with getelementptr instructions, we don't need to do this. Vectorizing these trivial induction variables can create vector code that is difficult to simplify later on. This is especially true when the unroll factor is greater than one, and we create vector arithmetic when computing step vectors. With this patch, we check if an induction variable is only used for counting iterations or computing addresses, and if so, scalarize the arithmetic when computing step vectors instead. This allows for greater simplification. This patch addresses the suboptimal pointer arithmetic sequence seen in PR27881. Reference: https://llvm.org/bugs/show_bug.cgi?id=27881 Differential Revision: http://reviews.llvm.org/D21620 llvm-svn: 274627
* SLPVectorizer: Move propagateMetadata to VectorUtilsMatt Arsenault2016-06-301-0/+25
| | | | | | | | This will be re-used by the LoadStoreVectorizer. Fix handling of range metadata and testcase by Justin Lebar. llvm-svn: 274281
* Refine the set of UniformAfterVectorization instructions.Wei Mi2016-06-301-0/+50
| | | | | | | | | | Except the seed uniform instructions (conditional branch and consecutive ptr instructions), dependencies to be added into uniform set should only be used by existing uniform instructions or intructions outside of current loop. Differential Revision: http://reviews.llvm.org/D21755 llvm-svn: 274262
* Reverted patch 273864Elena Demikhovsky2016-06-292-44/+0
| | | | llvm-svn: 274115
* Support arbitrary addrspace pointers in masked load/store intrinsicsArtur Pilipenko2016-06-281-20/+95
| | | | | | | | | | | | | | This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 274043
* Revert -r273892 "Support arbitrary addrspace pointers in masked load/store ↵Artur Pilipenko2016-06-271-95/+20
| | | | | | intrinsics" since some of the clang tests don't expect to see the updated signatures. llvm-svn: 273895
* Support arbitrary addrspace pointers in masked load/store intrinsicsArtur Pilipenko2016-06-271-20/+95
| | | | | | | | | | | | | | This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 273892
* Removed extra test from the prev commit.Elena Demikhovsky2016-06-271-34/+0
| | | | llvm-svn: 273865
* Fixed consecutive memory access detection in Loop Vectorizer.Elena Demikhovsky2016-06-273-0/+78
| | | | | | | | | | | | | | | | It did not handle correctly cases without GEP. The following loop wasn't vectorized: for (int i=0; i<len; i++) *to++ = *from++; I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1. Re-commit rL273257 - revision: http://reviews.llvm.org/D20789 llvm-svn: 273864
* [LV] Preserve order of dependences in interleaved accesses analysisMatthew Simpson2016-06-241-0/+305
| | | | | | | | | | | | | | | | | | | | | | | | | | The interleaved access analysis currently assumes that the inserted run-time pointer aliasing checks ensure the absence of dependences that would prevent its instruction reordering. However, this is not the case. Issues can arise from how code generation is performed for interleaved groups. For a load group, all loads in the group are essentially moved to the location of the first load in program order, and for a store group, all stores in the group are moved to the location of the last store. For groups having members involved in a dependence relation with any other instruction in the loop, this reordering can violate the dependence. This patch teaches the interleaved access analysis how to avoid breaking such dependences, and should fix PR27626. An assumption of the original analysis was that the accesses had been collected in "program order". The analysis was then simplified by visiting the accesses bottom-up. However, this ordering was never guaranteed for anything other than single basic block loops. Thus, this patch also enforces the desired ordering. Reference: https://llvm.org/bugs/show_bug.cgi?id=27626 Differential Revision: http://reviews.llvm.org/D19984 llvm-svn: 273687
* reverted the prev commit due to assertion failureElena Demikhovsky2016-06-211-43/+0
| | | | llvm-svn: 273258
* Fixed consecutive memory access detection in Loop Vectorizer.Elena Demikhovsky2016-06-211-0/+43
| | | | | | | | | | | | | | | It did not handle correctly cases without GEP. The following loop wasn't vectorized: for (int i=0; i<len; i++) *to++ = *from++; I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1. Differential revision: http://reviews.llvm.org/D20789 llvm-svn: 273257
* Recommit [LV] Enable vectorization of loops where the IV has an external useMichael Kuperstein2016-06-152-32/+110
| | | | | | | | | | | | | r272715 broke libcxx because it did not correctly handle cases where the last iteration of one IV is the second-to-last iteration of another. Original commit message: Vectorizing loops with "escaping" IVs has been disabled since r190790, due to PR17179. This re-enables it, with support for external use of both "post-increment" (last iteration) and "pre-increment" (second-to-last iteration) IVs. llvm-svn: 272742
* Reverting r272715 since it broke libcxx.Michael Kuperstein2016-06-142-84/+32
| | | | llvm-svn: 272730
* [LV] Enable vectorization of loops where the IV has an external useMichael Kuperstein2016-06-142-32/+84
| | | | | | | | | | | Vectorizing loops with "escaping" IVs has been disabled since r190790, due to PR17179. This re-enables it, with support for external use of both "post-increment" (last iteration) and "pre-increment" (second-to-last iteration) IVs. Differential Revision: http://reviews.llvm.org/D21048 llvm-svn: 272715
* Reapply "[TTI] Refine default cost for interleaved load groups with gaps"Matthew Simpson2016-06-101-0/+42
| | | | | | | | This reapplies commit r272385 with a fix. The build was failing when compiled with gcc, but not with clang. With the fix, we now get the data layout from the current TTI implementation, which will hopefully solve the issue. llvm-svn: 272395
* Revert "[TTI] Refine default cost for interleaved load groups with gaps"Matthew Simpson2016-06-101-42/+0
| | | | | | | This reverts commit r272385. This commit broke the build. I'm temporarily reverting to investigate. llvm-svn: 272391
* [TTI] Refine default cost for interleaved load groups with gapsMatthew Simpson2016-06-101-0/+42
| | | | | | | | | | | | This patch refines the default cost for interleaved load groups having gaps. If a load group has gaps, the legalized instructions corresponding to the unused elements will be dead. Thus, we don't need to account for them in the cost model. Instead, we only need to account for the fraction of legalized loads that will actually be used. Differential Revision: http://reviews.llvm.org/D20873 llvm-svn: 272385
OpenPOWER on IntegriCloud