summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
...
* [LV] Remove wrong assumption about LCSSAMichael Kuperstein2016-07-121-0/+25
| | | | | | | | | The LCSSA pass itself will not generate several redundant PHI nodes in a single exit block. However, such redundant PHI nodes don't violate LCSSA form, and may be introduced by passes that preserve LCSSA, and/or preserved by the LCSSA pass itself. So, assuming a single PHI node per exit block is not safe. llvm-svn: 275217
* [X86] Make some cast costs more preciseMichael Kuperstein2016-07-111-2/+2
| | | | | | | | | Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106
* [PM] Port LoopVectorize to the new PM.Sean Silva2016-07-091-0/+1
| | | | llvm-svn: 275000
* Fixed a bug in vectorizing GEP before gather/scatter intrinsic.Elena Demikhovsky2016-07-072-3/+221
| | | | | | | | | | Vectorizing GEP was incorrect and broke SSA in some cases. The patch fixes PR27997 https://llvm.org/bugs/show_bug.cgi?id=27997. Differential revision: http://reviews.llvm.org/D22035 llvm-svn: 274735
* [TTI] The cost model should not assume vector casts get completely scalarizedMichael Kuperstein2016-07-061-3/+3
| | | | | | | | | | | | | | | | The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 llvm-svn: 274642
* [LV] Don't widen trivial induction variablesMatthew Simpson2016-07-064-16/+198
| | | | | | | | | | | | | | | | | | | | We currently always vectorize induction variables. However, if an induction variable is only used for counting loop iterations or computing addresses with getelementptr instructions, we don't need to do this. Vectorizing these trivial induction variables can create vector code that is difficult to simplify later on. This is especially true when the unroll factor is greater than one, and we create vector arithmetic when computing step vectors. With this patch, we check if an induction variable is only used for counting iterations or computing addresses, and if so, scalarize the arithmetic when computing step vectors instead. This allows for greater simplification. This patch addresses the suboptimal pointer arithmetic sequence seen in PR27881. Reference: https://llvm.org/bugs/show_bug.cgi?id=27881 Differential Revision: http://reviews.llvm.org/D21620 llvm-svn: 274627
* SLPVectorizer: Move propagateMetadata to VectorUtilsMatt Arsenault2016-06-301-0/+25
| | | | | | | | This will be re-used by the LoadStoreVectorizer. Fix handling of range metadata and testcase by Justin Lebar. llvm-svn: 274281
* Refine the set of UniformAfterVectorization instructions.Wei Mi2016-06-301-0/+50
| | | | | | | | | | Except the seed uniform instructions (conditional branch and consecutive ptr instructions), dependencies to be added into uniform set should only be used by existing uniform instructions or intructions outside of current loop. Differential Revision: http://reviews.llvm.org/D21755 llvm-svn: 274262
* Reverted patch 273864Elena Demikhovsky2016-06-292-44/+0
| | | | llvm-svn: 274115
* Support arbitrary addrspace pointers in masked load/store intrinsicsArtur Pilipenko2016-06-281-20/+95
| | | | | | | | | | | | | | This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 274043
* Revert -r273892 "Support arbitrary addrspace pointers in masked load/store ↵Artur Pilipenko2016-06-271-95/+20
| | | | | | intrinsics" since some of the clang tests don't expect to see the updated signatures. llvm-svn: 273895
* Support arbitrary addrspace pointers in masked load/store intrinsicsArtur Pilipenko2016-06-271-20/+95
| | | | | | | | | | | | | | This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 273892
* Removed extra test from the prev commit.Elena Demikhovsky2016-06-271-34/+0
| | | | llvm-svn: 273865
* Fixed consecutive memory access detection in Loop Vectorizer.Elena Demikhovsky2016-06-273-0/+78
| | | | | | | | | | | | | | | | It did not handle correctly cases without GEP. The following loop wasn't vectorized: for (int i=0; i<len; i++) *to++ = *from++; I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1. Re-commit rL273257 - revision: http://reviews.llvm.org/D20789 llvm-svn: 273864
* [LV] Preserve order of dependences in interleaved accesses analysisMatthew Simpson2016-06-241-0/+305
| | | | | | | | | | | | | | | | | | | | | | | | | | The interleaved access analysis currently assumes that the inserted run-time pointer aliasing checks ensure the absence of dependences that would prevent its instruction reordering. However, this is not the case. Issues can arise from how code generation is performed for interleaved groups. For a load group, all loads in the group are essentially moved to the location of the first load in program order, and for a store group, all stores in the group are moved to the location of the last store. For groups having members involved in a dependence relation with any other instruction in the loop, this reordering can violate the dependence. This patch teaches the interleaved access analysis how to avoid breaking such dependences, and should fix PR27626. An assumption of the original analysis was that the accesses had been collected in "program order". The analysis was then simplified by visiting the accesses bottom-up. However, this ordering was never guaranteed for anything other than single basic block loops. Thus, this patch also enforces the desired ordering. Reference: https://llvm.org/bugs/show_bug.cgi?id=27626 Differential Revision: http://reviews.llvm.org/D19984 llvm-svn: 273687
* reverted the prev commit due to assertion failureElena Demikhovsky2016-06-211-43/+0
| | | | llvm-svn: 273258
* Fixed consecutive memory access detection in Loop Vectorizer.Elena Demikhovsky2016-06-211-0/+43
| | | | | | | | | | | | | | | It did not handle correctly cases without GEP. The following loop wasn't vectorized: for (int i=0; i<len; i++) *to++ = *from++; I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1. Differential revision: http://reviews.llvm.org/D20789 llvm-svn: 273257
* Recommit [LV] Enable vectorization of loops where the IV has an external useMichael Kuperstein2016-06-152-32/+110
| | | | | | | | | | | | | r272715 broke libcxx because it did not correctly handle cases where the last iteration of one IV is the second-to-last iteration of another. Original commit message: Vectorizing loops with "escaping" IVs has been disabled since r190790, due to PR17179. This re-enables it, with support for external use of both "post-increment" (last iteration) and "pre-increment" (second-to-last iteration) IVs. llvm-svn: 272742
* Reverting r272715 since it broke libcxx.Michael Kuperstein2016-06-142-84/+32
| | | | llvm-svn: 272730
* [LV] Enable vectorization of loops where the IV has an external useMichael Kuperstein2016-06-142-32/+84
| | | | | | | | | | | Vectorizing loops with "escaping" IVs has been disabled since r190790, due to PR17179. This re-enables it, with support for external use of both "post-increment" (last iteration) and "pre-increment" (second-to-last iteration) IVs. Differential Revision: http://reviews.llvm.org/D21048 llvm-svn: 272715
* Reapply "[TTI] Refine default cost for interleaved load groups with gaps"Matthew Simpson2016-06-101-0/+42
| | | | | | | | This reapplies commit r272385 with a fix. The build was failing when compiled with gcc, but not with clang. With the fix, we now get the data layout from the current TTI implementation, which will hopefully solve the issue. llvm-svn: 272395
* Revert "[TTI] Refine default cost for interleaved load groups with gaps"Matthew Simpson2016-06-101-42/+0
| | | | | | | This reverts commit r272385. This commit broke the build. I'm temporarily reverting to investigate. llvm-svn: 272391
* [TTI] Refine default cost for interleaved load groups with gapsMatthew Simpson2016-06-101-0/+42
| | | | | | | | | | | | This patch refines the default cost for interleaved load groups having gaps. If a load group has gaps, the legalized instructions corresponding to the unused elements will be dead. Thus, we don't need to account for them in the cost model. Instead, we only need to account for the fraction of legalized loads that will actually be used. Differential Revision: http://reviews.llvm.org/D20873 llvm-svn: 272385
* [LV] Use vector phis for some secondary induction variablesMichael Kuperstein2016-06-092-8/+57
| | | | | | | | | | | | | | Previously, we materialized secondary vector IVs from the primary scalar IV, by offseting the primary to match the correct start value, and then broadcasting it - inside the loop body. Instead, we can use a real vector IV, like we do for the primary. This enables using vector IVs for secondary integer IVs whose type matches the type of the primary. Differential Revision: http://reviews.llvm.org/D20932 llvm-svn: 272283
* Quick fix for the test from rL272014 "[LAA] Improve non-wrapping pointerAndrey Turetskiy2016-06-071-1/+1
| | | | | | | | detection by handling loop-invariant case" (s couple of buildbots failed). Patch by Roman Shirokiy. llvm-svn: 272019
* [LAA] Improve non-wrapping pointer detection by handling loop-invariant case.Andrey Turetskiy2016-06-071-0/+65
| | | | | | | | | | | | | This fixes PR26314. This patch adds new helper “isNoWrap” with detection of loop-invariant pointer case. Patch by Roman Shirokiy. Ref: https://llvm.org/bugs/show_bug.cgi?id=26314 Differential Revision: http://reviews.llvm.org/D17268 llvm-svn: 272014
* [InstCombine] scalarizePHI should not assume the code it sees has been CSE'dMichael Kuperstein2016-06-061-1/+1
| | | | | | | | | | | | | | scalarizePHI only looked for phis that have exactly two uses - the "latch" use, and an extract. Unfortunately, we can not assume all equivalent extracts are CSE'd, since InstCombine itself may create an extract which is a duplicate of an existing one. This extends it to handle several distinct extracts from the same index. This should fix at least some of the performance regressions from PR27988. Differential Revision: http://reviews.llvm.org/D20983 llvm-svn: 271961
* Revert "Claim NoAlias if two GEPs index different fields of the same struct"Daniel Berlin2016-06-011-77/+81
| | | | | | This reverts commit 2d5d6493f43eb68493a3852b8c226ac9fafdc7eb. llvm-svn: 271422
* Claim NoAlias if two GEPs index different fields of the same structDaniel Berlin2016-06-011-81/+77
| | | | | | | | | | | | | | Patch by Taewook Oh Summary: Patch for Bug 27478. Make BasicAliasAnalysis claims NoAlias if two GEPs index different fields of the same structure. Reviewers: hfinkel, dberlin Subscribers: dberlin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20665 llvm-svn: 271415
* [LV] For some IVs, use vector phis instead of widening in the loop bodyMichael Kuperstein2016-06-018-20/+85
| | | | | | | | | | | | | Previously, whenever we needed a vector IV, we would create it on the fly, by splatting the scalar IV and adding a step vector. Instead, we can create a real vector IV. This tends to save a couple of instructions per iteration. This only changes the behavior for the most basic case - integer primary IVs with a constant step. Differential Revision: http://reviews.llvm.org/D20315 llvm-svn: 271410
* Move test to X86 directory: I think it depends on X86 TTI.Tim Northover2016-05-271-0/+0
| | | | llvm-svn: 271019
* Vectorizer: track non-fast FP instructions through phis when finding reductions.Tim Northover2016-05-271-0/+75
| | | | | | | | When we traced through a phi node looking for floating-point reductions, we forgot whether we'd ever seen an instruction without fast-math flags (that would block vectorization). This propagates it through to the end. llvm-svn: 271015
* Look for a loop's starting location in the llvm.loop metadataHal Finkel2016-05-251-0/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Getting accurate locations for loops is important, because those locations are used by the frontend to generate optimization remarks. Currently, optimization remarks for loops often appear on the wrong line, often the first line of the loop body instead of the loop itself. This is confusing because that line might itself be another loop, or might be somewhere else completely if the body was inlined function call. This happens because of the way we find the loop's starting location. First, we look for a preheader, and if we find one, and its terminator has a debug location, then we use that. Otherwise, we look for a location on an instruction in the loop header. The fallback heuristic is not bad, but will almost always find the beginning of the body, and not the loop statement itself. The preheader location search often fails because there's often not a preheader, and even when there is a preheader, depending on how it was formed, it sometimes carries the location of some preceeding code. I don't see any good theoretical way to fix this problem. On the other hand, this seems like a straightforward solution: Put the debug location in the loop's llvm.loop metadata. A companion Clang patch will cause Clang to insert llvm.loop metadata with appropriate locations when generating debugging information. With these changes, our loop remarks have much more accurate locations. Differential Revision: http://reviews.llvm.org/D19738 llvm-svn: 270771
* [x86] avoid code explosion from LoopVectorizer for gather loop (PR27826) Sanjay Patel2016-05-251-0/+41
| | | | | | | | | | | | | | By making pointer extraction from a vector more expensive in the cost model, we avoid the vectorization of a loop that is very likely to be memory-bound: https://llvm.org/bugs/show_bug.cgi?id=27826 There are still bugs related to this, so we may need a more general solution to avoid vectorizing obviously memory-bound loops when we don't have HW gather support. Differential Revision: http://reviews.llvm.org/D20601 llvm-svn: 270729
* Recommit r255691 since PR26509 has been fixed.Wei Mi2016-05-192-1/+72
| | | | llvm-svn: 270113
* [LAA] Rename forwarding conflict detection option (NFC)Matthew Simpson2016-05-161-1/+1
| | | | | | | This patch renames the option enabling the store-to-load forwarding conflict detection optimization. This change was requested in the review of D20241. llvm-svn: 269668
* [LV] Ensure safe VF for loops with interleaved accessesMatthew Simpson2016-05-161-0/+56
| | | | | | | | | | | | | The selection of the vectorization factor currently doesn't consider interleaved accesses. The vectorization factor is based on the maximum safe dependence distance computed by LAA. However, for loops with interleaved groups, we should instead base the vectorization factor on the maximum safe dependence distance divided by the maximum interleave factor of all the interleaved groups. Interleaved accesses not in a group will be scalarized. Differential Revision: http://reviews.llvm.org/D20241 llvm-svn: 269659
* [InstCombine] canonicalize* LE/GE vector integer comparisons to LT/GT ↵Sanjay Patel2016-05-131-1/+1
| | | | | | | | | | | | | | | (PR26701, PR26819) *We don't currently handle the edge case constants (min/max values), so it's not a complete canonicalization. To fully solve the motivating bugs, we need to enhance this to recognize a zero vector too because that's a ConstantAggregateZero which is a ConstantData, not a ConstantVector or a ConstantDataVector. Differential Revision: http://reviews.llvm.org/D17859 llvm-svn: 269426
* Revert "[VectorUtils] Query number of sign bits to allow more truncations"James Molloy2016-05-101-36/+0
| | | | | | | | This was a fairly simple patch but on closer inspection was seriously flawed and caused PR27690. This reverts commit r268921. llvm-svn: 269051
* [LoopVectorize] Handling induction variable with non-constant step.Elena Demikhovsky2016-05-101-0/+124
| | | | | | | | | | | | | | | | | | | | | | | Allow vectorization when the step is a loop-invariant variable. This is the loop example that is getting vectorized after the patch: int int_inc; int bar(int init, int *restrict A, int N) { int x = init; for (int i=0;i<N;i++){ A[i] = x; x += int_inc; } return x; } "x" is an induction variable with *loop-invariant* step. But it is not a primary induction. Primary induction variable with non-constant step is not handled yet. Differential Revision: http://reviews.llvm.org/D19258 llvm-svn: 269023
* [LV] Hint at the new loop distribution pragma in optimization remarkAdam Nemet2016-05-091-0/+74
| | | | | | | | | | When we encounter unsafe memory dependencies, loop distribution could help. Even though, the diagnostics is in LAA, it's only currently emitted in the vectorizer. llvm-svn: 268987
* [VectorUtils] Query number of sign bits to allow more truncationsJames Molloy2016-05-091-0/+36
| | | | | | When deciding if a vector calculation can be done in a smaller bitwidth, use sign bit information from ValueTracking to add more information and allow more truncations. llvm-svn: 268921
* [LV] Identify more induction PHIs by coercing expressions to AddRecExprsSilviu Baranga2016-05-051-0/+75
| | | | | | | | | | | | | | | | | | Summary: Some PHIs can have expressions that are not AddRecExprs due to the presence of sext/zext instructions. In order to prevent the Loop Vectorizer from bailing out when encountering these PHIs, we now coerce the SCEV expressions to AddRecExprs using SCEV predicates (when possible). We only do this when the alternative would be to not vectorize. Reviewers: mzolotukhin, anemet Subscribers: mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17153 llvm-svn: 268633
* [LoopVectorize] Add operand bundles to vectorized functionsDavid Majnemer2016-04-291-0/+45
| | | | | | | Also, do not crash when calculating a cost model for loop-invariant token values. llvm-svn: 268003
* [PR25281] Remove AAResultsWrapper from preserved analyses of loop vectorizer.Michael Zolotukhin2016-04-291-0/+59
| | | | | | | We don't preserve AAResults, because, for one, we don't preserve SCEV-AA. That fixes PR25281. llvm-svn: 267980
* [LoopVectorize] Keep hints from original loop on the vector loopHal Finkel2016-04-291-0/+30
| | | | | | | | | | | | | | | | We need to keep loop hints from the original loop on the new vector loop. Failure to do this meant that, for example: void foo(int *b) { #pragma clang loop unroll(disable) for (int i = 0; i < 16; ++i) b[i] = 1; } this loop would be unrolled. Why? Because we'd vectorize it, thus dropping the hints that unrolling should be disabled, and then we'd unroll it. llvm-svn: 267970
* [LV] Reallow positive-stride interleaved load groups with gapsMatthew Simpson2016-04-271-6/+99
| | | | | | | | | | | | | | We previously disallowed interleaved load groups that may cause us to speculatively access memory out-of-bounds (r261331). We did this by ensuring each load group had an access corresponding to the first and last member. Instead of bailing out for these interleaved groups, this patch enables us to peel off the last vector iteration, ensuring that we execute at least one iteration of the scalar remainder loop. This solution was proposed in the review of the previous patch. Differential Revision: http://reviews.llvm.org/D19487 llvm-svn: 267751
* Masked Store in Loop Vectorizer - bugfixElena Demikhovsky2016-04-261-0/+46
| | | | | | | | Fixed a bug in loop vectorization with conditional store. Differential Revision: http://reviews.llvm.org/D19532 llvm-svn: 267597
* [LoopVectorize] Don't consider conditional-load dereferenceability for ↵Hal Finkel2016-04-261-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | marked parallel loops I really thought we were doing this already, but we were not. Given this input: void Test(int *res, int *c, int *d, int *p) { for (int i = 0; i < 16; i++) res[i] = (p[i] == 0) ? res[i] : res[i] + d[i]; } we did not vectorize the loop. Even with "assume_safety" the check that we don't if-convert conditionally-executed loads (to protect against data-dependent deferenceability) was not elided. One subtlety: As implemented, it will still prefer to use a masked-load instrinsic (given target support) over the speculated load. The choice here seems architecture specific; the best option depends on how expensive the masked load is compared to a regular load. Ideally, using the masked load still reduces unnecessary memory traffic, and so should be preferred. If we'd rather do it the other way, flipping the order of the checks is easy. The LangRef is updated to make explicit that llvm.mem.parallel_loop_access also implies that if conversion is okay. Differential Revision: http://reviews.llvm.org/D19512 llvm-svn: 267514
* [ARM] AArch32 v8 NEON is still not IEEE-754 compliantRenato Golin2016-04-181-14/+8
| | | | llvm-svn: 266603
OpenPOWER on IntegriCloud