summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Vectorize
Commit message (Collapse)AuthorAgeFilesLines
...
* [SLP] Vectorize loads of consecutive memory accesses, accessed in ↵Mohammad Shahid2017-01-281-57/+120
| | | | | | | | | | | | | non-consecutive (jumbled) way. The jumbled scalar loads will be sorted while building the tree and these accesses will be marked to generate shufflevector after the vectorized load with proper mask. Reviewers: hfinkel, mssimpso, mkuper Differential Revision: https://reviews.llvm.org/D26905 Change-Id: I9c0c8e6f91a00076a7ee1465440a3f6ae092f7ad llvm-svn: 293386
* [SLP] Refactoring of horizontal reduction analysis, NFC.Alexey Bataev2017-01-271-24/+25
| | | | | | | | | | | Some checks in SLP horizontal reduction analysis function are performed several times, though it is enough to perform these checks only once during an initial attempt at adding candidate for the reduction instruction/reduced value. Differential Revision: https://reviews.llvm.org/D29175 llvm-svn: 293274
* [LV] Fix an issue where forming LCSSA in the place that we did wouldChandler Carruth2017-01-261-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | change the set of uniform instructions in the loop causing an assert failure. The problem is that the legalization checking also builds data structures mapping various facts about the loop body. The immediate cause was the set of uniform instructions. If these then change when LCSSA is formed, the data structures would already have been built and become stale. The included test case triggered an assert in loop vectorize that was reduced out of the new PM's pipeline. The solution is to form LCSSA early enough that no information is cached across the changes made. The only really obvious position is outside of the main logic to vectorize the loop. This also has the advantage of removing one case where forming LCSSA could mutate the loop but we wouldn't track that as a "Changed" state. If it is significantly advantageous to do some legalization checking prior to this, we can do a more careful positioning but it seemed best to just back off to a safe position first. llvm-svn: 293168
* [TargetTransformInfo] Refactor and improve getScalarizationOverhead()Jonas Paulsson2017-01-261-33/+15
| | | | | | | | | | | | | | | | | Refactoring to remove duplications of this method. New method getOperandsScalarizationOverhead() that looks at the present unique operands and add extract costs for them. Old behaviour was to just add extract costs for one operand of the type always, which still happens in getArithmeticInstrCost() if no operands are provided by the caller. This is a good start of improving on this, but there are more places that can be improved by using getOperandsScalarizationOverhead(). Review: Hal Finkel https://reviews.llvm.org/D29017 llvm-svn: 293155
* [SLP] Improve horizontal vectorization for non-power-of-2 number ofAlexey Bataev2017-01-251-2/+4
| | | | | | | | | | | | | instructions. If number of instructions in horizontal reduction list is not power of 2 then only PowerOf2Floor(NumberOfInstructions) last elements are actually vectorized, other instructions remain scalar. Patch tries to vectorize the remaining elements either. Differential Revision: https://reviews.llvm.org/D28959 llvm-svn: 293042
* [SLP] Refactoring of HorizontalReduction class, NFC.Alexey Bataev2017-01-241-34/+20
| | | | | | | | | Removed data members ReduxWidth and MinVecRegSize + some C++11 stylish improvements. Differential Revision: https://reviews.llvm.org/D29010 llvm-svn: 292899
* [SLP] Make ReductionOpcode have the right (enum) type. NFC.Michael Kuperstein2017-01-211-13/+10
| | | | llvm-svn: 292703
* [SLP] Delete useless helper. NFC.Michael Kuperstein2017-01-211-14/+10
| | | | | | | The helper contained a branch for a special case that is unnecessary, and a cast. llvm-svn: 292698
* Test commit access, remove trailing whitespaceMikael Holmen2017-01-191-1/+1
| | | | llvm-svn: 292482
* [LV] Run loop-simplify and LCSSA explicitly instead of "requiring" themMichael Kuperstein2017-01-191-5/+13
| | | | | | | | | | | | This changes the vectorizer to explicitly use the loopsimplify and lcssa utils, instead of "requiring" the transformations as if they were analyses. This is not NFC, since it changes the LCSSA behavior - we no longer run LCSSA for all loops, but rather only for the loops we expect to modify. Differential Revision: https://reviews.llvm.org/D28868 llvm-svn: 292456
* [LV] Allow reductions that have several uses outside the loopMichael Kuperstein2017-01-181-5/+2
| | | | | | | | | | | We currently check whether a reduction has a single outside user. We don't really need to require that - we just need to make sure a single value is used externally. The number of external users of that value shouldn't actually matter. Differential Revision: https://reviews.llvm.org/D28830 llvm-svn: 292424
* [LV] Mark non-consecutive-like pointers non-uniformMatthew Simpson2017-01-171-0/+7
| | | | | | | | | | | If a memory instruction will be vectorized, but it's pointer operand is non-consecutive-like, the instruction is a gather or scatter operation. Its pointer operand will be non-uniform. This should fix PR31671. Reference: https://llvm.org/bugs/show_bug.cgi?id=31671 Differential Revision: https://reviews.llvm.org/D28819 llvm-svn: 292254
* [PM] Introduce an analysis set used to preserve all analyses overChandler Carruth2017-01-151-2/+2
| | | | | | | | | | | | | | | a function's CFG when that CFG is unchanged. This allows transformation passes to simply claim they preserve the CFG and analysis passes to check for the CFG being preserved to remove the fanout of all analyses being listed in all passes. I've gone through and removed or cleaned up as many of the comments reminding us to do this as I could. Differential Revision: https://reviews.llvm.org/D28627 llvm-svn: 292054
* Give comparator const call operatorEric Fiselier2017-01-151-1/+1
| | | | llvm-svn: 292043
* Remove unused lambda captures. NFCMalcolm Parsons2017-01-131-2/+2
| | | | llvm-svn: 291916
* [SLP] Remove bogus assert.Michael Kuperstein2017-01-111-4/+0
| | | | | | | | | | | | The removed assert seems bogus - it's perfectly legal for the roots of the vectorized subtrees to be equal even if the original scalar values aren't, if the original scalars happen to be equivalent. This fixes PR31599. Differential Revision: https://reviews.llvm.org/D28539 llvm-svn: 291692
* [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.Mohammed Agabaria2017-01-111-3/+4
| | | | | | | | | | | | updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
* [PM] Rewrite the loop pass manager to use a worklist and augmented runChandler Carruth2017-01-111-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arguments much like the CGSCC pass manager. This is a major redesign following the pattern establish for the CGSCC layer to support updates to the set of loops during the traversal of the loop nest and to support invalidation of analyses. An additional significant burden in the loop PM is that so many passes require access to a large number of function analyses. Manually ensuring these are cached, available, and preserved has been a long-standing burden in LLVM even with the help of the automatic scheduling in the old pass manager. And it made the new pass manager extremely unweildy. With this design, we can package the common analyses up while in a function pass and make them immediately available to all the loop passes. While in some cases this is unnecessary, I think the simplicity afforded is worth it. This does not (yet) address loop simplified form or LCSSA form, but those are the next things on my radar and I have a clear plan for them. While the patch is very large, most of it is either mechanically updating loop passes to the new API or the new testing for the loop PM. The code for it is reasonably compact. I have not yet updated all of the loop passes to correctly leverage the update mechanisms demonstrated in the unittests. I'll do that in follow-up patches along with improved FileCheck tests for those passes that ensure things work in more realistic scenarios. In many cases, there isn't much we can do with these until the loop simplified form and LCSSA form are in place. Differential Revision: https://reviews.llvm.org/D28292 llvm-svn: 291651
* [LV] Fix-up external IV users after updating dominator treeMatthew Simpson2017-01-091-7/+20
| | | | | | | | | | | | | This patch delays the fix-up step for external induction variable users until after the dominator tree has been properly updated. This should fix PR30742. The SCEVExpander in InductionDescriptor::transform can generate code in the wrong location if the dominator tree is not up-to-date. We should work towards keeping the dominator tree up-to-date throughout the transformation. Reference: https://llvm.org/bugs/show_bug.cgi?id=30742 Differential Revision: https://reviews.llvm.org/D28168 llvm-svn: 291462
* Remove unused method in LoopVectorize.cpp.Jonas Paulsson2017-01-091-7/+0
| | | | | | | computeInterleaveCount() is not defined/used and is therefore removed. Review: Davide Italiano llvm-svn: 291423
* Currently isLikelyComplexAddressComputation tries to figure out if the given ↵Mohammed Agabaria2017-01-051-42/+17
| | | | | | | | | | | | | stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106
* [LV] Sink tripcount query to where it's actually used. NFC.Michael Kuperstein2016-12-191-5/+4
| | | | llvm-svn: 290142
* Revert @llvm.assume with operator bundles (r289755-r289757)Daniel Jasper2016-12-193-26/+53
| | | | | | | This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086
* Reapply "[LV] Enable vectorization of loops with conditional stores by default"Matthew Simpson2016-12-161-1/+1
| | | | | | | | This patch reapplies r289863. The original patch was reverted because it exposed a bug causing the loop vectorizer to crash in the Python runtime on PPC. The underlying issue was fixed with r289958. llvm-svn: 289975
* [LV] Don't attempt to type-shrink scalarized instructionsMatthew Simpson2016-12-161-5/+21
| | | | | | | | | | After r288909, instructions feeding predicated instructions may be scalarized if profitable. Since these instructions will remain scalar, we shouldn't attempt to type-shrink them. We should only truncate vector types to their minimal bit widths. This bug was exposed by enabling the vectorization of loops containing conditional stores by default. llvm-svn: 289958
* Revert r289863: [LV] Enable vectorization of loops with conditionalChandler Carruth2016-12-161-1/+1
| | | | | | | | | | stores by default This uncovers a crasher in the loop vectorizer on PPC when building the Python runtime. I'll send the testcase to the review thread for the original commit. llvm-svn: 289934
* [LV] Enable vectorization of loops with conditional stores by defaultMatthew Simpson2016-12-151-1/+1
| | | | | | | | | This patch sets the default value of the "-enable-cond-stores-vec" command line option to "true". Differential Revision: https://reviews.llvm.org/D27814 llvm-svn: 289863
* Remove the AssumptionCacheHal Finkel2016-12-153-53/+26
| | | | | | | | | After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756
* [LV] Don't vectorize when we have a small static bound on trip countMichael Kuperstein2016-12-131-2/+2
| | | | | | | | | | We currently check if the exact trip count is known and is smaller than the "tiny loop" bound. We should be checking the maximum bound on the trip count instead. Differential Revision: https://reviews.llvm.org/D27690 llvm-svn: 289583
* [SLP] Fix sign-extends for type-shrinkingMatthew Simpson2016-12-121-15/+62
| | | | | | | | | | | | | | This patch ensures the correct minimum bit width during type-shrinking. Previously when type-shrinking, we always sign-extended values back to their original width. However, if we are going to sign-extend, and the sign bit is unknown, we have to increase the minimum bit width by one bit so the sign-extend will fill the upper bits correctly. If the sign bit is known to be zero, we can perform a zero-extend instead. This should fix PR31243. Reference: https://llvm.org/bugs/show_bug.cgi?id=31243 Differential Revision: https://reviews.llvm.org/D27466 llvm-svn: 289470
* [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.Alexey Bataev2016-12-081-66/+80
| | | | | | | | | | | | | | | When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 289043
* [LV] Scalarize operands of predicated instructionsMatthew Simpson2016-12-071-7/+210
| | | | | | | | | | | | | | | | | | | | | | This patch attempts to scalarize the operand expressions of predicated instructions if they were conditionally executed in the original loop. After scalarization, the expressions will be sunk inside the blocks created for the predicated instructions. The transformation essentially performs un-if-conversion on the operands. The cost model has been updated to determine if scalarization is profitable. It compares the cost of a vectorized instruction, assuming it will be if-converted, to the cost of the scalarized instruction, assuming that the instructions corresponding to each vector lane will be sunk inside a predicated block, possibly avoiding execution. If it's more profitable to scalarize the entire expression tree feeding the predicated instruction, the expression will be scalarized; otherwise, it will be vectorized. We only consider the cost of the entire expression to accurately estimate the cost of the required insertelement and extractelement instructions. Differential Revision: https://reviews.llvm.org/D26083 llvm-svn: 288909
* Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements."Renato Golin2016-12-021-74/+66
| | | | | | | | This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's builtins (twice: once in r288412 and once in r288497). We should investigate this offline. llvm-svn: 288508
* [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.Alexey Bataev2016-12-021-66/+74
| | | | | | | | | | | | | | | When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288497
* Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements."Artem Belevich2016-12-011-72/+66
| | | | | | This reverts r288412 which causes severe compile-time regression. llvm-svn: 288431
* [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.Alexey Bataev2016-12-011-66/+72
| | | | | | | | | | | | | | | When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288412
* [SLP] Fixed cost model for horizontal reduction.Alexey Bataev2016-12-011-1/+2
| | | | | | | | | | | | | | Currently when cost of scalar operations is evaluated the vector type is used for scalar operations. Patch fixes this issue and fixes evaluation of the vector operations cost. Several test showed that vector cost model is too optimistic. It allowed vectorization of 8 or less add/fadd operations, though scalar code is faster. Actually, only for 16 or more operations vector code provides better performance. Differential Revision: https://reviews.llvm.org/D26277 llvm-svn: 288398
* [SLPVectorizer] Improved support of partial tree vectorization.Alexey Bataev2016-11-291-87/+178
| | | | | | | | | | | Currently SLP vectorizer tries to vectorize a binary operation and dies immediately after unsuccessful the first unsuccessfull attempt. Patch tries to improve the situation, trying to vectorize all binary operations of all children nodes in the binop tree. Differential Revision: https://reviews.llvm.org/D25517 llvm-svn: 288115
* [LoadStoreVectorizer] Enable vectorization of stores in the presence of an ↵Alina Sbirlea2016-11-231-3/+25
| | | | | | | | | | | | | | | | | | aliasing load Summary: The "getVectorizablePrefix" method would give up if it found an aliasing load for a store chain. In practice, the aliasing load can be treated as a memory barrier and all stores that precede it are a valid vectorizable prefix. Issue found by volkan in D26962. Testcase is a pruned version of the one in the original patch. Reviewers: jlebar, arsenm, tstellarAMD Subscribers: mzolotukhin, wdng, nhaehnle, anna, volkan, llvm-commits Differential Revision: https://reviews.llvm.org/D27008 llvm-svn: 287781
* Fix spelling mistakes in Transforms comments. NFC.Simon Pilgrim2016-11-201-1/+1
| | | | | | Identified by Pedro Giffuni in PR27636. llvm-svn: 287488
* [CMake] NFC. Updating CMake dependency specificationsChris Bieneman2016-11-171-2/+3
| | | | | | This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system. llvm-svn: 287206
* [LoopVectorize] Fix for non-determinism in codegenMandeep Singh Grang2016-11-161-1/+1
| | | | | | | | | | | | Summary: This patch fixes issues in codegen uncovered due to https://reviews.llvm.org/D26718 Reviewers: mssimpso Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D26727 llvm-svn: 287135
* Fixed the lost FastMathFlags for CALL operations in SLPVectorizer.Vyacheslav Klochkov2016-11-161-0/+1
| | | | | | | Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26575 llvm-svn: 287064
* [LoopVectorizer] When estimating reg usage, unused insts may "end" another useRobert Lougher2016-11-151-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | The register usage algorithm incorrectly treats instructions whose value is not used within the loop (e.g. those that do not produce a value). The algorithm first calculates the usages within the loop. It iterates over the instructions in order, and records at which instruction index each use ends (in fact, they're actually recorded against the next index, as this is when we want to delete them from the open intervals). The algorithm then iterates over the instructions again, adding each instruction in turn to a list of open intervals. Instructions are then removed from the list of open intervals when they occur in the list of uses ended at the current index. The problem is, instructions which are not used in the loop are skipped. However, although they aren't used, the last use of a value may have been recorded against that instruction index. In this case, the use is not deleted from the open intervals, which may then bump up the estimated register usage. This patch fixes the issue by simply moving the "is used" check after the loop which erases the uses at the current index. Differential Revision: https://reviews.llvm.org/D26554 llvm-svn: 286969
* Test commit, remove trailing space.Florian Hahn2016-11-151-1/+1
| | | | | | This commit is used to test commit access. llvm-svn: 286957
* [LV] Stop saying "use -Rpass-analysis=loop-vectorize"Adam Nemet2016-11-111-2/+1
| | | | | | | | | | | | | | | | | | This is PR28376. Unfortunately given the current structure of optimization diagnostics we lack the capability to tell whether the user has passed -Rpass-analysis=loop-vectorize since this is local to the front-end (BackendConsumer::OptimizationRemarkHandler). So rather than printing this even if the user has already passed -Rpass-analysis, this patch just punts and stops recommending this option. I don't think that getting this right is worth the complexity. Differential Revision: https://reviews.llvm.org/D26563 llvm-svn: 286662
* Fixed the lost FastMathFlags for FCmp operations in SLPVectorizer.Vyacheslav Klochkov2016-11-111-3/+4
| | | | | | | Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26543 llvm-svn: 286626
* Reset debug loc to OldInduction in ↵Dehao Chen2016-11-071-1/+3
| | | | | | | | InnerLoopVectorizer::createInductionVariable. (NFC) This is to prevent SetInsertionPoint from setting debug loc to Latch->getTerminator(). llvm-svn: 286159
* Second attempt at r285517.Dorit Nuzman2016-10-311-11/+65
| | | | llvm-svn: 285568
* Revert r285517 due to build failures.Dorit Nuzman2016-10-301-59/+1
| | | | llvm-svn: 285518
OpenPOWER on IntegriCloud