summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Vectorize
Commit message (Collapse)AuthorAgeFilesLines
* [LV] Fold tail by masking - handle reductionsAyal Zaks2019-08-283-11/+57
| | | | | | | | | | | | Allow vectorizing loops that have reductions when tail is folded by masking. A select is introduced in VPlan, choosing between the last value carried by the loop-exit/live-out instruction of the reduction, and the penultimate value carried by the reduction phi, according to the "i < n" mask of fold-tail. This select replaces the last value as the live-out value of the loop. Differential Revision: https://reviews.llvm.org/D66720 llvm-svn: 370173
* Add a clarify comment for meaning of SafePointes [NFC]Philip Reames2019-08-261-1/+5
| | | | | | Extracted from D66688 as requested. llvm-svn: 369962
* [SLP] use range-for loops, fix formatting; NFCSanjay Patel2019-08-231-32/+32
| | | | | | | These are part of D57059, but that patch doesn't apply cleanly to trunk at this point, so we might as well remove some of the noise. llvm-svn: 369776
* [SLP] fix formatting; NFCSanjay Patel2019-08-231-4/+3
| | | | | | | These are part of D57059, but that patch doesn't apply cleanly to trunk at this point, so we might as well remove some of the noise. llvm-svn: 369769
* [SLP][NFC] Avoid repetitive calls to getSameOpcode()Dinar Temirbulatov2019-08-201-120/+176
| | | | | | | | We can avoid repetitive calls getSameOpcode() for already known tree elements by keeping MainOp and AltOp in TreeEntry. Differential Revision: https://reviews.llvm.org/D64700 llvm-svn: 369315
* [SLP] reduce duplicated code; NFCSanjay Patel2019-08-191-2/+4
| | | | llvm-svn: 369250
* [SLPVectorizer] Make the scheduler aware of the TreeEntry operands.Vasileios Porpodas2019-08-161-79/+171
| | | | | | | | | | | | | | | | | | | | | | Summary: The scheduler's dependence graph gets the use-def dependencies by accessing the operands of the instructions in a bundle. However, buildTree_rec() may change the order of the operands in TreeEntry, and the scheduler is currently not aware of this. This is not causing any functional issues currently, because reordering is restricted to the operands of a single instruction. Once we support operand reordering across multiple TreeEntries, as shown here: http://www.llvm.org/devmtg/2019-04/slides/Poster-Porpodas-Supernode_SLP.pdf , the scheduler will need to get the correct operands from TreeEntry and not from the individual instructions. In short, this patch: - Connects the scheduler's bundle with the corresponding TreeEntry. It introduces new TE and Lane fields in ScheduleData. - Moves the location where the operands of the TreeEntry are initialized. This used to take place in newTreeEntry() setting one operand at a time, but is now moved pre-order just before the recursion of buildTree_rec(). This is required because the scheduler needs to access both operands of the TreeEntry in tryScheduleBundle(). - Updates the scheduler to access the instruction operands through the TreeEntry operands instead of accessing the instruction operands directly. Reviewers: ABataev, RKSimon, dtemirbulatov, Ayal, dorit, hfinkel Reviewed By: ABataev Subscribers: hiraditya, llvm-commits, lebedev.ri, rcorcs Tags: #llvm Differential Revision: https://reviews.llvm.org/D62432 llvm-svn: 369131
* [SLPVectorizer] Silence null dereference warning. NFCI.Simon Pilgrim2019-08-161-0/+1
| | | | | | cppcheck + MSVC analyzer both over zealously warn that we might dereference a null Bundle pointer - add an assertion to check for null to silence the warning, plus its a good idea to check that we succeeded in finding a schedule bundle anyway.... llvm-svn: 369094
* [llvm] Migrate llvm::make_unique to std::make_uniqueJonas Devlieghere2019-08-152-6/+6
| | | | | | | | Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013
* [LV] fold-tail predication should be respected even with assume_safetyDorit Nuzman2019-08-152-5/+5
| | | | | | | | | | | | | | | assume_safety implies that loads under "if's" can be safely executed speculatively (unguarded, unmasked). However this assumption holds only for the original user "if's", not those introduced by the compiler, such as the fold-tail "if" that guards us from loading beyond the original loop trip-count. Currently the combination of fold-tail and assume-safety pragmas results in ignoring the fold-tail predicate that guards the loads, generating unmasked loads. This patch fixes this behavior. Differential Revision: https://reviews.llvm.org/D66106 Reviewers: Ayal, hsaito, fhahn llvm-svn: 368973
* [SLP][NFC] Use pointers to address to ScalarToTreeEntry elements, instead of ↵Dinar Temirbulatov2019-08-141-4/+4
| | | | | | indexes. llvm-svn: 368906
* [LV] Fold-tail flagDorit Nuzman2019-08-141-5/+13
| | | | | | | | | | | This is the compiler-flag equivalent of the Predicate pragma (https://reviews.llvm.org/D65197), to direct the vectorizer to fold the remainder-loop into the main-loop using predication. Differential Revision: https://reviews.llvm.org/D66108 Reviewers: Ayal, hsaito, fhahn, SjoerdMeije llvm-svn: 368801
* [LoopVectorize][X86] Clamp interleave factor if we have a known constant ↵Craig Topper2019-08-071-1/+9
| | | | | | | | | | | | trip count that is less than VF*interleave If we know the trip count, we should make sure the interleave factor won't cause the vectorized loop to exceed it. Improves one of the cases from PR42674 Differential Revision: https://reviews.llvm.org/D65896 llvm-svn: 368215
* Revert "[X86] Add more extract subvector cost model tests for smaller ↵Mitch Phillips2019-08-061-9/+0
| | | | | | | | | | | element sizes and smaller than 128-bit vectors." This reverts commit fc33e33776b7a7ce22e539f0ec2e3bfdb09ad361. This commit depends on the rolled back commit rL367901, and thus needs to be rolled back. llvm-svn: 368109
* [X86] Add more extract subvector cost model tests for smaller element sizes ↵Craig Topper2019-08-061-0/+9
| | | | | | | | | and smaller than 128-bit vectors. With the switch to widening legalization, we need to a better job of costing extractions of less than 128-bits. llvm-svn: 368081
* [LV][NFC] Share the LV illegality reporting with LoopVectorize.Hideki Saito2019-08-062-133/+135
| | | | | | | | | | | | Reviewers: hsaito, fhahn, rengolin Reviewed By: rengolin Patch by psamolysov, thanks! Differential Revision: https://reviews.llvm.org/D62997 llvm-svn: 367980
* Handle casts changing pointer size in the vectorizerStanislav Mekhanoshin2019-08-021-5/+16
| | | | | | | | | Added code to truncate or shrink offsets so that we can continue base pointer search if size has changed along the way. Differential Revision: https://reviews.llvm.org/D65612 llvm-svn: 367646
* Relax load store vectorizer pointer strip checksStanislav Mekhanoshin2019-08-011-3/+2
| | | | | | | | | | | The previous change to fix crash in the vectorizer introduced performance regressions. The condition to preserve pointer address space during the search is too tight, we only need to match the size. Differential Revision: https://reviews.llvm.org/D65600 llvm-svn: 367624
* Follow up of rL367592, fix the buildSjoerd Meijer2019-08-011-2/+0
| | | | | | | Some buildbots complained about: error: default label in switch which covers all enumeration values llvm-svn: 367603
* [LV] Tail-Loop FoldingSjoerd Meijer2019-08-012-54/+99
| | | | | | | | | | | This allows folding of the scalar epilogue loop (the tail) into the main vectorised loop body when the loop is annotated with a "vector predicate" metadata hint. To fold the tail, instructions need to be predicated (masked), enabling/disabling lanes for the remainder iterations. Differential Revision: https://reviews.llvm.org/D65197 llvm-svn: 367592
* [AMDGPU] Fix for vectorizer crash with pointers of different sizeStanislav Mekhanoshin2019-07-311-0/+5
| | | | | | | | | When vectorizer strips pointers it can eventually end up with pointers of two different sizes, then SCEV will crash. Differential Revision: https://reviews.llvm.org/D65480 llvm-svn: 367443
* [LV] Scalar Epilogue Lowering. NFC.Sjoerd Meijer2019-07-252-57/+66
| | | | | | | | | | | | | | | | This refactors boolean 'OptForSize' that was passed around in a lot of places. It controlled folding of the tail loop, the scalar epilogue, into the main loop but code-size reasons may not be the only reason to do this. Thus, this is a first step to generalise the concept of tail-loop folding, and hence OptForSize has been renamed and is using an enum ScalarEpilogueStatus that holds the status how the epilogue should be lowered. This will be followed up by D65197, that picks up the predicate loop hint and performs the tail-loop folding. Differential Revision: https://reviews.llvm.org/D64916 llvm-svn: 366993
* [SLPVectorizer] Revert local change that got accidently got committed in ↵Simon Pilgrim2019-07-231-1/+0
| | | | | | | | rL366799 This wasn't part of D63281 llvm-svn: 366807
* [TargetLowering] Add SimplifyMultipleUseDemandedBitsSimon Pilgrim2019-07-231-0/+1
| | | | | | | | | | | | | | | | | | This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured. The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use. We do see a couple of regressions that need to be addressed: AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better). X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG. The code owners have confirmed its ok for these cases to fixed up in future patches. Differential Revision: https://reviews.llvm.org/D63281 llvm-svn: 366799
* [SLPVectorizer] Remove null-pointer test. NFCI.Simon Pilgrim2019-07-231-4/+4
| | | | | | cast<CallInst> shouldn't return null and we dereference the pointer in a lot of other places, causing both MSVC + cppcheck to warn about dereferenced null pointers llvm-svn: 366793
* [SLPVectorizer] Fix some MSVC/cppcheck uninitialized variable warnings. NFCI.Simon Pilgrim2019-07-221-3/+3
| | | | llvm-svn: 366712
* Temporarily Revert "[SLP] Recommit: Look-ahead operand reordering heuristic."Eric Christopher2019-07-151-248/+46
| | | | | | | | | As there are some reported miscompiles with AVX512 and performance regressions in Eigen. Verified with the original committer and testcases will be forthcoming. This reverts commit r364964. llvm-svn: 366154
* [LoopVectorize] Pass unfiltered list of arguments to getIntrinsicInstCost.Florian Hahn2019-07-151-5/+2
| | | | | | | We do not compute the scalarization overhead in getVectorIntrinsicCost and TTI::getIntrinsicInstrCost requires the full arguments list. llvm-svn: 366049
* [LV] Exclude loop-invariant inputs from scalar cost computation.Florian Hahn2019-07-141-22/+42
| | | | | | | | | | | | | | | | Loop invariant operands do not need to be scalarized, as we are using the values outside the loop. We should ignore them when computing the scalarization overhead. Fixes PR41294 Reviewers: hsaito, rengolin, dcaballe, Ayal Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D59995 llvm-svn: 366030
* Delete dead storesFangrui Song2019-07-121-7/+2
| | | | llvm-svn: 365903
* [SLP] Optimize getSpillCost(); NFCINikita Popov2019-07-091-6/+10
| | | | | | | | | | | | For a given set of live values, the spill cost will always be the same for each call. Compute the cost once and multiply it by the number of calls. (I'm not sure this spill cost modeling makes sense if there are multiple calls, as the spill cost will likely be shared across calls in that case. But that's how it currently works.) llvm-svn: 365552
* [SLP] Recommit: Look-ahead operand reordering heuristic.Vasileios Porpodas2019-07-021-46/+248
| | | | | | | | | | | | | | | | Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example). Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk Reviewed By: RKSimon, dtemirbulatov Subscribers: hiraditya, phosek, rnk, rcorcs, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60897 llvm-svn: 364964
* Revert [SLP] Look-ahead operand reordering heuristic.Jordan Rupprecht2019-07-011-236/+46
| | | | | | | | This reverts r364478 (git commit 574cb0eb3a7ac95e62d223a60bef891171dfe321) The patch is causing compilation timeouts. llvm-svn: 364846
* [SLP] Look-ahead operand reordering heuristic.Vasileios Porpodas2019-06-261-46/+236
| | | | | | | | | | | | | | | | Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example). Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk Reviewed By: RKSimon, dtemirbulatov Subscribers: rnk, rcorcs, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60897 llvm-svn: 364478
* [SLP] NFC: Fixed typo in commentVasileios Porpodas2019-06-241-1/+1
| | | | llvm-svn: 364237
* [SLP] Support unary FNeg vectorizationCameron McInally2019-06-241-2/+30
| | | | | | Differential Revision: https://reviews.llvm.org/D63609 llvm-svn: 364219
* Revert [SLP] Look-ahead operand reordering heuristic.Reid Kleckner2019-06-211-232/+46
| | | | | | | | | This reverts r364084 (git commit 5698921be2d567f6abf925479ac9f5a376d6d74f) It caused crashes while compiling a file in Chrome. Reduction forthcoming. llvm-svn: 364111
* [SLP] Look-ahead operand reordering heuristic.Simon Pilgrim2019-06-211-46/+232
| | | | | | | | | | This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example). Committed on behalf of @vporpo (Vasileios Porpodas) Differential Revision: https://reviews.llvm.org/D60897 llvm-svn: 364084
* [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through ↵Orlando Cazalet-Hyams2019-06-191-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 > llvm-svn: 363046 llvm-svn: 363786
* [LV] Suppress vectorization in some nontemporal casesWarren Ristow2019-06-172-1/+33
| | | | | | | | | | | | | | | | | | | | | When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type *DataType, unsigned Alignment) const; bool isLegalNTLoad(Type *DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581
* PHINode: introduce setIncomingValueForBlock() function, and use it.Whitney Tsang2019-06-171-4/+2
| | | | | | | | | | | | | | | | Summary: There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue() but no function to replace incoming value for a specified BasicBlock* predecessor. Clearly, there are a lot of places that could use that functionality. Reviewer: craig.topper, lebedev.ri, Meinersbur, kbarton, fhahn Reviewed By: Meinersbur, fhahn Subscribers: fhahn, hiraditya, zzheng, jsji, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D63338 llvm-svn: 363566
* [LV] Deny irregular types in interleavedAccessCanBeWidenedBjorn Pettersson2019-06-171-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Avoid that loop vectorizer creates loads/stores of vectors with "irregular" types when interleaving. An example of an irregular type is x86_fp80 that is 80 bits, but that may have an allocation size that is 96 bits. So an array of x86_fp80 is not bitcast compatible with a vector of the same type. Not sure if interleavedAccessCanBeWidened is the best place for this check, but it solves the problem seen in the added test case. And it is the same kind of check that already exists in memoryInstructionCanBeWidened. Reviewers: fhahn, Ayal, craig.topper Reviewed By: fhahn Subscribers: hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63386 llvm-svn: 363547
* Revert "[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step ↵Orlando Cazalet-Hyams2019-06-121-14/+6
| | | | | | | | | through loop even after completion" This reverts commit 1a0f7a2077b70c9864faa476e15b048686cf1ca7. See phabricator thread for D60831. llvm-svn: 363132
* [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through ↵Orlando Cazalet-Hyams2019-06-111-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 llvm-svn: 363046
* [LV] Fix -Wunused-function after r362736Fangrui Song2019-06-071-0/+2
| | | | llvm-svn: 362762
* [LV] Wrap LV illegality reporting in a function. NFC.Renato Golin2019-06-061-100/+120
| | | | | | | | | | | | | | | | | | | | | | | A function for loop vectorization illegality reporting has been introduced: void LoopVectorizationLegality::reportVectorizationFailure( const StringRef DebugMsg, const StringRef OREMsg, const StringRef ORETag, Instruction * const I) const; The function prints a debug message when the debug for the compilation unit is enabled as well as invokes the optimization report emitter to generate a message with a specified tag. The function doesn't cover any complicated logic when a custom lambda should be passed to the emitter, only generating a message with a tag is supported. The function always prints the instruction `I` after the debug message whenever the instruction is specified, otherwise the debug message ends with a dot: 'LV: Not vectorizing: Disabled/already vectorized.' Patch by Pavel Samolysov <samolisov@gmail.com> llvm-svn: 362736
* [SLP] Fix regression in broadcasts caused by operand reordering patch D59973.Dinar Temirbulatov2019-06-051-5/+35
| | | | | | | | | | | | This patch fixes a regression caused by the operand reordering refactoring patch https://reviews.llvm.org/D59973 . The fix changes the strategy to Splat instead of Opcode, if broadcast opportunities are found. Please see the lit test for some examples. Committed on behalf of @vporpo (Vasileios Porpodas) Differential Revision: https://reviews.llvm.org/D62427 llvm-svn: 362613
* [LoopUtils][SLPVectorizer] clean up management of fast-math-flagsSanjay Patel2019-06-051-3/+9
| | | | | | | | | | | | | | | | Instead of passing around fast-math-flags as a parameter, we can set those using an IRBuilder guard object. This is no-functional-change-intended. The motivation is to eventually fix the vectorizers to use and set the correct fast-math-flags for reductions. Examples of that not behaving as expected are: https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast') https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0) D61802 (should be able to reduce with IR-level FMF) Differential Revision: https://reviews.llvm.org/D62272 llvm-svn: 362612
* [LV] Remove the redundant using LoopVectorizationPlanner:VPlanPtrFlorian Hahn2019-05-302-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | VPlan.h already contains the declaration of VPlanPtr type alias: using VPlanPtr = std::unique_ptr<VPlan>; The LoopVectorizationPlanner class also contains the same declaration of VPlanPtr and therefore LoopVectorize requires a long wording when its methods return VPlanPtr: LoopVectorizationPlanner::VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(...) but LoopVectorize.cpp includes VPlan.h (via LoopVectorizationPlanner.h) and can use VPlanPtr from that header. Patch by Pavel Samolysov. Reviewers: hsaito, rengolin, fhahn Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D62576 llvm-svn: 362126
* [LoopVectorize] Add FNeg instruction supportCraig Topper2019-05-301-9/+20
| | | | | | Differential Revision: https://reviews.llvm.org/D62510 llvm-svn: 362124
OpenPOWER on IntegriCloud