summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
...
* Small refactor on VectorizerHint for deduplicationRenato Golin2014-09-011-0/+30
| | | | | | | | | | | | | | | | | | | | Previously, the hint mechanism relied on clean up passes to remove redundant metadata, which still showed up if running opt at low levels of optimization. That also has shown that multiple nodes of the same type, but with different values could still coexist, even if temporary, and cause confusion if the next pass got the wrong value. This patch makes sure that, if metadata already exists in a loop, the hint mechanism will never append a new node, but always replace the existing one. It also enhances the algorithm to cope with more metadata types in the future by just adding a new type, not a lot of code. Re-applying again due to MSVC 2013 being minimum requirement, and this patch having C++11 that MSVC 2012 didn't support. Fixes PR20655. llvm-svn: 216870
* Allow vectorization of division by uniform power of 2.Karthik Bhat2014-08-251-0/+32
| | | | | | | | This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371
* Revert "Small refactor on VectorizerHint for deduplication"Renato Golin2014-08-191-30/+0
| | | | | | This reverts commit r215994 because MSVC 2012 can't cope with its C++11 goodness. llvm-svn: 215999
* Small refactor on VectorizerHint for deduplicationRenato Golin2014-08-191-0/+30
| | | | | | | | | | | | | | | Previously, the hint mechanism relied on clean up passes to remove redundant metadata, which still showed up if running opt at low levels of optimization. That also has shown that multiple nodes of the same type, but with different values could still coexist, even if temporary, and cause confusion if the next pass got the wrong value. This patch makes sure that, if metadata already exists in a loop, the hint mechanism will never append a new node, but always replace the existing one. It also enhances the algorithm to cope with more metadata types in the future by just adding a new type, not a lot of code. llvm-svn: 215994
* [LoopVectorizer] Enable support for floating-point subtraction reductionsJames Molloy2014-08-081-0/+22
| | | | llvm-svn: 215200
* Add diagnostics to the vectorizer cost model.Tyler Nowicki2014-08-021-0/+58
| | | | | | | | | | When the cost model determines vectorization is not possible/profitable these remarks print an analysis of that decision. Note that in selectVectorizationFactor() we can assume that OptForSize and ForceVectorization are mutually exclusive. Reviewed by Arnold Schwaighofer llvm-svn: 214599
* Improve the remark generated for -Rpass-missed.Tyler Nowicki2014-07-313-4/+4
| | | | | | | | The current remark is ambiguous and makes it sounds like explicitly specifying vectorization will allow the loop to be vectorized. This is not the case. The improved remark directs the user to -Rpass-analysis=loop-vectorize to determine the cause of the pass-miss. Reviewed by Arnold Schwaighofer` llvm-svn: 214445
* Improve the remark generated when a variable that is used outside the loop ↵Tyler Nowicki2014-07-311-1/+4
| | | | | | | | is not a reduction or induction variable. Reviewed by Arnold Schwaighofer llvm-svn: 214440
* Rename metadata llvm.loop.vectorize.unroll to llvm.loop.vectorize.interleave.Mark Heffernan2014-07-214-4/+4
| | | | llvm-svn: 213588
* [LoopVectorize] Use AA to partition potential dependency checksHal Finkel2014-07-209-9/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this change, the loop vectorizer did not make use of the alias analysis infrastructure. Instead, it performed memory dependence analysis using ScalarEvolution-based linear dependence checks within equivalence classes derived from the results of ValueTracking's GetUnderlyingObjects. Unfortunately, this meant that: 1. The loop vectorizer had logic that essentially duplicated that in BasicAA for aliasing based on identified objects. 2. The loop vectorizer could not partition the space of dependency checks based on information only easily available from within AA (TBAA metadata is currently the prime example). This means, for example, regardless of whether -fno-strict-aliasing was provided, the vectorizer would only vectorize this loop with a runtime memory-overlap check: void foo(int *a, float *b) { for (int i = 0; i < 1600; ++i) a[i] = b[i]; } This is suboptimal because the TBAA metadata already provides the information necessary to show that this check unnecessary. Of course, the vectorizer has a limit on the number of such checks it will insert, so in practice, ignoring TBAA means not vectorizing more-complicated loops that we should. This change causes the vectorizer to use an AliasSetTracker to keep track of the pointers in the loop. The resulting alias sets are then used to partition the space of dependency checks, and potential runtime checks; this results in more-efficient vectorizations. When pointer locations are added to the AliasSetTracker, two things are done: 1. The location size is set to UnknownSize (otherwise you'd not catch inter-iteration dependencies) 2. For instructions in blocks that would need to be predicated, TBAA is removed (because the metadata might have a control dependency on the condition being speculated). For non-predicated blocks, you can leave the TBAA metadata. This is safe because you can't have an iteration dependency on the TBAA metadata (if you did, and you unrolled sufficiently, you'd end up with the same pointer value used by two accesses that TBAA says should not alias, and that would yield undefined behavior). llvm-svn: 213486
* [LoopVectorize] Propagate known metadata to vectorized instructionsHal Finkel2014-07-191-0/+44
| | | | | | | | | | | | | There are some kinds of metadata that are safe to propagate from the scalar instructions to the vector instructions (fpmath and tbaa currently). Regarding TBAA, one might worry about propagating it on if-converted loads and stores, because the metadata might have had a control dependency on the condition, and thus actually aliased with some other non-speculated memory access when the condition was false. However, this would be caught by the runtime overlap checks. llvm-svn: 213452
* Emit warnings if vectorization is forced and fails.Tyler Nowicki2014-07-163-0/+103
| | | | | | | | | | | This patch modifies the existing DiagnosticInfo system to create a generic base class that is inherited to produce diagnostic-based warnings. This is used by the loop vectorizer to trigger a warning when vectorization is forced and fails. Several tests have been added to verify this behavior. Reviewed by: Arnold Schwaighofer llvm-svn: 213110
* When we sink an instruction, this can open up opportunity for the operands ↵Aditya Nandakumar2014-07-111-1/+1
| | | | | | to be sunk - add them to the worklist llvm-svn: 212847
* [X86] AVX512: Enable it in the Loop VectorizerAdam Nemet2014-07-091-0/+35
| | | | | | | | | | This lets us experiment with 512-bit vectorization without passing force-vector-width manually. The code generated for a simple integer memset loop is properly vectorized. Disassembly is still broken for it though :(. llvm-svn: 212634
* IR: Fold away compares between GV GEPs and GVsDavid Majnemer2014-07-041-1/+1
| | | | | | | | | A GEP of a non-weak global variable will not be equivalent to another non-weak global variable or a GEP of such a variable. Differential Revision: http://reviews.llvm.org/D4238 llvm-svn: 212360
* Add Rpass-missed and Rpass-analysis reports to the loop vectorizer. The ↵Tyler Nowicki2014-06-254-3/+328
| | | | | | | | remarks give the vector width of vectorized loops and a brief analysis of loops that fail to be vectorized. For example, an analysis will be generated for loops containing control flow that cannot be simplified to a select. The optimization remarks also give the debug location of expressions that cannot be vectorized, for example the location of an unvectorizable call. Reviewed by: Arnold Schwaighofer llvm-svn: 211721
* Rename loop unrolling and loop vectorizer metadata to have a common prefix.Eli Bendersky2014-06-258-17/+18
| | | | | | | | | | | | | | | | | | | [LLVM part] These patches rename the loop unrolling and loop vectorizer metadata such that they have a common 'llvm.loop.' prefix. Metadata name changes: llvm.vectorizer.* => llvm.loop.vectorizer.* llvm.loopunroll.* => llvm.loop.unroll.* This was a suggestion from an earlier review (http://reviews.llvm.org/D4090) which added the loop unrolling metadata. Patch by Mark Heffernan. llvm-svn: 211710
* Add new debug kind LocTrackingOnly.Diego Novillo2014-06-241-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This new debug emission kind supports emitting line location information in all instructions, but stops code generation from emitting debug info to the final output. This mode is useful when the backend wants to track source locations during code generation, but it does not want to produce debug info. This is currently used by optimization remarks (-pass-remarks, -pass-remarks-missed and -pass-remarks-analysis). To prevent debug info emission, DIBuilder never inserts the annotation 'llvm.dbg.cu' when LocTrackingOnly is enabled. Reviewers: echristo, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4234 llvm-svn: 211609
* LoopVectorizer: Fix a dominance issueArnold Schwaighofer2014-06-221-0/+34
| | | | | | | The induction variables start value needs to be defined before we branch (overflow check) to the scalar preheader where we used it. llvm-svn: 211460
* Reduce verbiage of lit.local.cfg filesAlp Toker2014-06-095-10/+5
| | | | | | We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496
* Use AArch64 instead of now removed ARM64 in test configsAlexey Samsonov2014-06-051-1/+1
| | | | llvm-svn: 210229
* Allow vectorization of intrinsics such as powi,cttz and ctlz in Loop and SLP ↵Karthik Bhat2014-05-301-0/+102
| | | | | | | | | | Vectorizer. This patch adds support to vectorize intrinsics such as powi, cttz and ctlz in Vectorizer. These intrinsics are different from other intrinsics as second argument to these function must be same in order to vectorize them and it should be represented as a scalar. Review: http://reviews.llvm.org/D3851#inline-32769 and http://reviews.llvm.org/D3937#inline-32857 llvm-svn: 209873
* LoopVectorizer: Add a check that the backedge taken count + 1 does not overflowArnold Schwaighofer2014-05-292-0/+28
| | | | | | | | | | | The loop vectorizer instantiates be-taken-count + 1 as the loop iteration count. If this expression overflows the generated code was invalid. In case of overflow the code now jumps to the scalar loop. Fixes PR17288. llvm-svn: 209854
* AArch64/ARM64: move ARM64 into AArch64's placeTim Northover2014-05-243-6/+0
| | | | | | | | | | | | | | | This commit starts with a "git mv ARM64 AArch64" and continues out from there, renaming the C++ classes, intrinsics, and other target-local objects for consistency. "ARM64" test directories are also moved, and tests that began their life in ARM64 use an arm64 triple, those from AArch64 use an aarch64 triple. Both should be equivalent though. This finishes the AArch64 merge, and everyone should feel free to continue committing as normal now. llvm-svn: 209577
* AArch64/ARM64: remove AArch64 from tree prior to renaming ARM64.Tim Northover2014-05-241-1/+1
| | | | | | | | | | | | | | | | I'm doing this in two phases for a better "git blame" record. This commit removes the previous AArch64 backend and redirects all functionality to ARM64. It also deduplicates test-lines and removes orphaned AArch64 tests. The next step will be "git mv ARM64 AArch64" and rewire most of the tests. Hopefully LLVM is still functional, though it would be even better if no-one ever had to care because the rename happens straight afterwards. llvm-svn: 209576
* [Test] Trim unnecessary .c and .cpp from config.suffix in lit.local.cfgAdam Nemet2014-05-122-2/+2
| | | | | | | | | | | | | | Tested by comparing make check VERBOSE=1 before and after to make sure no tests are missed. (VERBOSE=1 prints the list of tests.) Only one test :( remains where .cpp is required: tools/llvm-cov/range_based_for.cpp:// RUN: llvm-cov range_based_for.cpp | FileCheck %s --check-prefix=STDOUT The topic was discussed in this thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140428/214905.html llvm-svn: 208621
* Reorder shuffle and binary operation.Serge Pavlov2014-05-111-11/+6
| | | | | | | | | | | | | This patch enables transformations: BinOp(shuffle(v1), shuffle(v2)) -> shuffle(BinOp(v1, v2)) BinOp(shuffle(v1), const1) -> shuffle(BinOp, const2) They allow to eliminate extra shuffles in some cases. Differential Revision: http://reviews.llvm.org/D3525 llvm-svn: 208488
* Move late partial-unrolling thresholds into the processor definitionsHal Finkel2014-05-081-10/+10
| | | | | | | | | | | | | | | | | | | | | | The old method used by X86TTI to determine partial-unrolling thresholds was messy (because it worked by testing target features), and also would not correctly identify the target CPU if certain target features were disabled. After some discussions on IRC with Chandler et al., it was decided that the processor scheduling models were the right containers for this information (because it is often tied to special uop dispatch-buffer sizes). This does represent a small functionality change: - For generic x86-64 (which uses the SB model and, thus, will get some unrolling). - For AMD cores (because they still currently use the SB scheduling model) - For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump the default threshold to 50; we're working on a test case for this). Otherwise, nothing has changed for any other targets. The logic, however, has been moved into BasicTTI, so other targets may now also opt-in to this functionality simply by setting LoopMicroOpBufferSize in their processor model definitions. llvm-svn: 208289
* Fix vectorization remarks.Diego Novillo2014-04-291-0/+67
| | | | | | | | | This patch changes the vectorization remarks to also inform when vectorization is possible but not beneficial. Added tests to exercise some loop remarks. llvm-svn: 207574
* [OPENMP][LV][D3423] Respect Hints.Force meta-data for loops in LoopVectorizerZinovy Nis2014-04-292-0/+166
| | | | llvm-svn: 207512
* [CLNUP] Test commit. Remove newline.Zinovy Nis2014-04-241-2/+1
| | | | llvm-svn: 207089
* [LV] Statistics numbers for LoopVectorize introduced: a number of analyzed ↵Alexander Musman2014-04-231-0/+66
| | | | | | | | | | | loops & a number of vectorized loops. Use -stats to see how many loops were analyzed for possible vectorization and how many of them were actually vectorized. Patch by Zinovy Nis Differential Revision: http://reviews.llvm.org/D3438 llvm-svn: 206956
* Add missing config file for newly added test case introduced by r206563.Jiangning Liu2014-04-181-0/+6
| | | | llvm-svn: 206567
* This commit allows vectorized loops to be unrolled by a factor of 2 for AArch64.Jiangning Liu2014-04-182-0/+84
| | | | | | | | A new test case is also added for ARM64. Patched by Z.Zheng llvm-svn: 206563
* vect.omp.persistence.ll REQUIRES asserts due to -debug-only.NAKAMURA Takumi2014-04-151-0/+1
| | | | llvm-svn: 206271
* D3348 - [BUG] "Rotate Loop" pass kills "llvm.vectorizer.enable" metadataAlexey Bataev2014-04-151-0/+87
| | | | llvm-svn: 206266
* [LoopVectorizer] Count dependencies of consecutive pointers as uniformsHal Finkel2014-04-022-0/+55
| | | | | | | | | | | | | | | | | | | | | For the purpose of calculating the cost of the loop at various vectorization factors, we need to count dependencies of consecutive pointers as uniforms (which means that the VF = 1 cost is used for all overall VF values). For example, the TSVC benchmark function s173 has: ... %3 = add nsw i64 %indvars.iv, 16000 %arrayidx8 = getelementptr inbounds %struct.GlobalData* @global_data, i64 0, i32 0, i64 %3 ... and we must realize that the add will be a scalar in order to correctly deduce it to be profitable to vectorize this on PowerPC with VSX enabled. In fact, all dependencies of a consecutive pointer must be a scalar (uniform), and so we simply need to add all consecutive pointers to the worklist that currently detects collects uniforms. Fixes PR19296. llvm-svn: 205387
* Implement X86TTI::getUnrollingPreferencesHal Finkel2014-04-011-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | This provides an initial implementation of getUnrollingPreferences for x86. getUnrollingPreferences is used by the generic (concatenation) unroller, which is distinct from the unrolling done by the loop vectorizer. Many modern x86 cores have some kind of uop cache and loop-stream detector (LSD) used to efficiently dispatch small loops, and taking full advantage of this requires unrolling small loops (small here means 10s of uops). These caches also have limits on the number of taken branches in the loop, and so we also cap the loop unrolling factor based on the maximum "depth" of the loop. This is currently calculated with a partial DFS traversal (partial because it will stop early if the path length grows too much). This is still an approximation, and one that is both conservative (because it does not account for branches eliminated via block placement) and optimistic (because it is only recording the maximum depth over minimum paths). Nevertheless, because the loops that fit in these uop caches are so small, it is not clear how much the details matter. The original set of patches posted for review produced the following test-suite performance results (from the TSVC benchmark) at that time: ControlLoops-dbl - 13% speedup ControlLoops-flt - 15% speedup Reductions-dbl - 7.5% speedup llvm-svn: 205348
* Move partial/runtime unrolling late in the pipelineHal Finkel2014-03-311-1/+1
| | | | | | | | | | | | | | | | The generic (concatenation) loop unroller is currently placed early in the standard optimization pipeline. This is a good place to perform full unrolling, but not the right place to perform partial/runtime unrolling. However, most targets don't enable partial/runtime unrolling, so this never mattered. However, even some x86 cores benefit from partial/runtime unrolling of very small loops, and follow-up commits will enable this. First, we need to move partial/runtime unrolling late in the optimization pipeline (importantly, this is after SLP and loop vectorization, as vectorization can drastically change the size of a loop), while keeping the full unrolling where it is now. This change does just that. llvm-svn: 205264
* [X86] Adjust cost of FP_TO_UINT v4f64->v4i32 as wellAdam Nemet2014-03-311-0/+40
| | | | | | | | | Pretty obvious follow-on to r205159 to also handle conversion from double besides float. Fixes <rdar://problem/16373208> llvm-svn: 205253
* [X86] Adjust cost of FP_TO_UINT v8f32->v8i32Adam Nemet2014-03-301-0/+39
| | | | | | | | | | | | | | | There is no direct AVX instruction to convert to unsigned. I have some ideas how we may be able to do this with three vector instructions but the current backend just bails on this to get it scalarized. See the comment why we need to adjust the cost returned by BasicTTI. The test is a bit roundabout (and checks assembly rather than bit code) because I'd like it to work even if at some point we could vectorize this conversion. Fixes <rdar://problem/16371920> llvm-svn: 205159
* ARM64: initial backend importTim Northover2014-03-292-0/+91
| | | | | | | | | | | | This adds a second implementation of the AArch64 architecture to LLVM, accessible in parallel via the "arm64" triple. The plan over the coming weeks & months is to merge the two into a single backend, during which time thorough code review should naturally occur. Everything will be easier with the target in-tree though, hence this commit. llvm-svn: 205090
* [X86][Vectorizer Cost Model] Correct vectorization cost model for v2i64->v2f64Quentin Colombet2014-03-271-0/+26
| | | | | | | | | | and v4i64->v4f64. The new costs match what we did for SSE2 and reflect the reality of our codegen. <rdar://problem/16381225> llvm-svn: 204884
* add 'requires asserts' to test that needs itJim Grosbach2014-03-271-0/+1
| | | | llvm-svn: 204882
* X86: Correct vectorization cost model for v8f32->v8i8.Jim Grosbach2014-03-271-0/+24
| | | | | | | | Fix the cost model to reflect the reality of our codegen. rdar://16370633 llvm-svn: 204880
* LoopVectorizer: Preserve fast-math flagsArnold Schwaighofer2014-03-052-1/+27
| | | | | | Fixes PR19045. llvm-svn: 203008
* LoopVectorizer: Keep track of conditional store basic blocksArnold Schwaighofer2014-02-081-0/+40
| | | | | | | | | | | | | Before conditional store vectorization/unrolling we had only one vectorized/unrolled basic block. After adding support for conditional store vectorization this will not only be one block but multiple basic blocks. The last block would have the back-edge. I updated the code to use a vector of basic blocks instead of a single basic block and fixed the users to use the last entry in this vector. But, I forgot to add the basic blocks to this vector! Fixes PR18724. llvm-svn: 201028
* LoopVectorizer: Enable unrolling of conditional stores and the load/storeArnold Schwaighofer2014-02-021-0/+3
| | | | | | | | | unrolling heuristic per default Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed up some benchmarks while not causing any interesting regressions. llvm-svn: 200621
* ARMTTI: We don't have 16 allocatable scalar registersArnold Schwaighofer2014-02-011-0/+36
| | | | | | | This caused an regression on libquantum after enabling the new loop vectorizer unroll heuristics. llvm-svn: 200616
* [vectorizer] Tweak the way we do small loop runtime unrolling in theChandler Carruth2014-01-311-15/+42
| | | | | | | | | | | | | | loop vectorizer to not do so when runtime pointer checks are needed and share code with the new (not yet enabled) load/store saturation runtime unrolling. Also ensure that we only consider the runtime checks when the loop hasn't already been vectorized. If it has, the runtime check cost has already been paid. I've fleshed out a test case to cover the scalar unrolling as well as the vector unrolling and comment clearly why we are or aren't following the pattern. llvm-svn: 200530
OpenPOWER on IntegriCloud