bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Revert rL355906: [SLP] Remove redundancy of performing operand reordering ↵	Simon Pilgrim	2019-03-12	1	-169/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	twice: once in buildTree() and later in vectorizeTree(). This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree(). To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order. This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo Patch by: @vporpo (Vasileios Porpodas) Differential Revision: https://reviews.llvm.org/D59059 ........ Reverted due to buildbot failures that I don't have time to track down. llvm-svn: 355913
*	Try to fix SLPVectorizer BoUpSLP::BoEdgeInfo::dump visibility on non-debug ↵	Simon Pilgrim	2019-03-12	1	-3/+1
\| \| \| \| \| \|	builds llvm-svn: 355912
*	[SLP] Remove redundancy of performing operand reordering twice: once in ↵	Simon Pilgrim	2019-03-12	1	-82/+171
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	buildTree() and later in vectorizeTree(). This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree(). To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order. This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo Patch by: @vporpo (Vasileios Porpodas) Differential Revision: https://reviews.llvm.org/D59059 llvm-svn: 355906
*	Reland "Relax constraints for reduction vectorization"	Sanjoy Das	2019-03-12	2	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change from original commit: move test (that uses an X86 triple) into the X86 subdirectory. Original description: Gating vectorizing reductions on all fastmath flags seems unnecessary; `reassoc` should be sufficient. Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal Reviewed By: sdesmalen Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57728 llvm-svn: 355889
*	Revert "Relax constraints for reduction vectorization"	Sanjoy Das	2019-03-11	2	-12/+7
\| \| \| \| \| \|	This reverts commit r355868. Breaks hexagon. llvm-svn: 355873
*	Relax constraints for reduction vectorization	Sanjoy Das	2019-03-11	2	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Gating vectorizing reductions on all fastmath flags seems unnecessary; `reassoc` should be sufficient. Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal Reviewed By: sdesmalen Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57728 llvm-svn: 355868
*	Hide two unused debugging methods, NFCI.	Jonas Hahnfeld	2019-03-01	1	-0/+4
\| \| \| \| \| \| \| \|	GCC correctly moans that PlainCFGBuilder::isExternalDef(llvm::Value*) and StackSafetyDataFlowAnalysis::verifyFixedPoint() are defined but not used in Release builds. Hide them behind 'ifndef NDEBUG'. llvm-svn: 355205
*	[Vectorizer] Add vectorization support for fixed smul/umul intrinsics	Simon Pilgrim	2019-02-25	2	-29/+38
\| \| \| \| \| \| \| \|	This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument. Differential Revision: https://reviews.llvm.org/D58616 llvm-svn: 354790
*	[MemorySSA & LoopPassManager] Add remaining book keeping [NFCI].	Alina Sbirlea	2019-02-12	1	-1/+5
\| \| \| \| \| \| \| \|	Add plumbing to get MemorySSA in the remaining loop passes. Also update unit test to add the dependency. [EnableMSSALoopDependency remains disabled]. llvm-svn: 353901
*	Refactor setAlreadyUnrolled() and setAlreadyVectorized().	Michael Kruse	2019-02-11	1	-51/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Loop::setAlreadyUnrolled() and LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that stops the same loop from being transformed multiple times. This patch merges both implementations. In doing so we fix 3 potential issues: * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.* metadata even though it will not be used anymore. This already caused problems such as http://llvm.org/PR40546. Change the behavior to the one of setAlreadyUnrolled which deletes this loop metadata. * setAlreadyUnrolled() used to create a new LoopID by calling MDNode::get with nullptr as the first operand, then replacing it by the returned references using replaceOperandWith. It is possible that MDNode::get would instead return an existing node (due to de-duplication) that then gets modified. To avoid, use a fresh TempMDNode that does not get uniqued with anything else before replacing it with replaceOperandWith. * LoopVectorizeHints::matchesHintMetadataName() only compares the suffix of the attribute to set the new value for. That is, when called with "enable", would erase attributes such as "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and "llvm.loop.distribute.enable" instead of the one to replace. Fortunately, function was only called with "isvectorized". Differential Revision: https://reviews.llvm.org/D57566 llvm-svn: 353738
*	Update files that were mistakenly added with the old file header to the	Chandler Carruth	2019-02-11	2	-8/+6
\| \| \| \| \| \|	new one. llvm-svn: 353665
*	[LV] Remove unnecessary assignment to UserIC.	Florian Hahn	2019-02-07	1	-1/+0
\| \| \| \|	llvm-svn: 353469
*	[LV] Prevent interleaving if computeMaxVF returned None.	Florian Hahn	2019-02-07	2	-22/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed in D57382, interleaving should be avoided if computeMaxVF returns None, same as we currently do for vectorization. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477 Reviewers: Ayal, dcaballe, hsaito, mkuper, rengolin Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D57837 llvm-svn: 353461
*	[opaque pointer types] Pass value type to GetElementPtr creation.	James Y Knight	2019-02-01	1	-8/+10
\| \| \| \| \| \| \| \| \|	This cleans up all GetElementPtr creation in LLVM to explicitly pass a value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57173 llvm-svn: 352913
*	[opaque pointer types] Pass value type to LoadInst creation.	James Y Knight	2019-02-01	3	-6/+7
\| \| \| \| \| \| \| \| \|	This cleans up all LoadInst creation in LLVM to explicitly pass the value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57172 llvm-svn: 352911
*	[SLPVectorizer] Get rid of IndexQueue array from vectorizeStores. NFCI.	Yevgeny Rouban	2019-02-01	1	-27/+18
\| \| \| \| \| \| \| \|	Indices are checked as they are generated. No need to fill the whole array of indices. Differential Revision: https://reviews.llvm.org/D57144 llvm-svn: 352839
*	[llvm] Clarify responsiblity of some of DILocation discriminator APIs	Mircea Trofin	2019-01-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Renamed setBaseDiscriminator to cloneWithBaseDiscriminator, to match similar APIs. Also changed its behavior to copy over the other discriminator components, instead of eliding them. Renamed cloneWithDuplicationFactor to cloneByMultiplyingDuplicationFactor, which more closely matches what this API does. Reviewers: dblaikie, wmi Reviewed By: dblaikie Subscribers: zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D56220 llvm-svn: 351996
*	[LV][VPlan] Change to implement VPlan based predication for	Hideki Saito	2019-01-23	7	-6/+419
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VPlan-native path Context: Patch Series #2 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). Patch series #2 checks that inner loops are still trivially lock-step among all vector elements. Non-loop branches are blindly assumed as divergent. Changes here implement VPlan based predication algorithm to compute predicates for blocks that need predication. Predicates are computed for the VPLoop region in reverse post order. A block's predicate is computed as OR of the masks of all incoming edges. The mask for an incoming edge is computed as AND of predecessor block's predicate and either predecessor's Condition bit or NOT(Condition bit) depending on whether the edge from predecessor block to the current block is true or false edge. Reviewers: fhahn, rengolin, hsaito, dcaballe Reviewed By: fhahn Patch by Satish Guggilla, thanks! Differential Revision: https://reviews.llvm.org/D53349 llvm-svn: 351990
*	Update the file headers across all of the LLVM projects in the monorepo	Chandler Carruth	2019-01-19	20	-80/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
*	[SLP] Fix PR40310: The reduction nodes should stay scalar.	Alexey Bataev	2019-01-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Sometimes the SLP vectorizer tries to vectorize the horizontal reduction nodes during regular vectorization. This may happen inside of the loops, when there are some vectorizable PHIs. Patch fixes this by checking if the node is the reduction node and thus it must not be vectorized, it must be gathered. Reviewers: RKSimon, spatel, hfinkel, fedor.sergeev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56783 llvm-svn: 351349
*	[LoopVectorizer] give more advice in remark about failure to vectorize call	Sanjay Patel	2019-01-12	1	-3/+23
\| \| \| \| \| \| \| \| \| \|	Something like this is requested by: https://bugs.llvm.org/show_bug.cgi?id=40265 ...and it seems like a common enough case that we should acknowledge it. Differential Revision: https://reviews.llvm.org/D56551 llvm-svn: 351010
*	[llvm] API for encoding/decoding DWARF discriminators.	Mircea Trofin	2018-12-21	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling) The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues: - communicates overflow back to the user - supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported. Reviewers: davidxl, danielcdh, wmi, dblaikie Reviewed By: dblaikie Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D55681 llvm-svn: 349973
*	Test commit	Anton Afanasyev	2018-12-19	1	-4/+4
\| \| \| \| \| \|	Fix typos. llvm-svn: 349644
*	[LoopVectorize] Rename pass options. NFC.	Michael Kruse	2018-12-18	2	-14/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rename: NoUnrolling to InterleaveOnlyWhenForced and AlwaysVectorize to !VectorizeOnlyWhenForced Contrary to what the name 'AlwaysVectorize' suggests, it does not unconditionally vectorize all loops, but applies a cost model to determine whether vectorization is profitable to all loops. Hence, passing false will disable the cost model, except when a loop is marked with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested by @hfinkel in D55716) better matches this behavior. Similarly, 'NoUnrolling' disables the profitability cost model for interleaving (a term to distinguish it from unrolling by the LoopUnrollPass); rename it for consistency. Differential Revision: https://reviews.llvm.org/D55785 llvm-svn: 349513
*	[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.	Michael Kruse	2018-12-12	1	-30/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g. #pragma clang loop unroll_and_jam(enable) #pragma clang loop distribute(enable) is the same as #pragma clang loop distribute(enable) #pragma clang loop unroll_and_jam(enable) and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used. This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance, !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.unroll_and_jam.enable"} !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3} !3 = !{!"llvm.loop.distribute.enable"} defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop. Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account. For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations. Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated. To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied. With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling). Reviewed By: hfinkel, dmgreen Differential Revision: https://reviews.llvm.org/D49281 Differential Revision: https://reviews.llvm.org/D55288 llvm-svn: 348944
*	[PM] Port LoadStoreVectorizer to the new pass manager.	Markus Lavin	2018-12-07	2	-16/+34
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D54848 llvm-svn: 348570
*	[SLP]PR39774: Update references of the replaced external instructions.	Alexey Bataev	2018-11-30	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: An additional fix for PR39774. Need to update the references for the RedcutionRoot instruction when it is replaced during the vectorization phase to avoid compiler crash on reduction vectorization. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55017 llvm-svn: 347997
*	[SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized.	Alexey Bataev	2018-11-28	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If the original reduction root instruction was vectorized, it might be removed from the tree. It means that the insertion point may become invalidated and the whole vectorization of the reduction leads to the incorrect output result. The ReductionRoot instruction must be marked as externally used so it could not be removed. Otherwise it might cause inconsistency with the cost model and we may end up with too optimistic optimization. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54955 llvm-svn: 347759
*	[IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlock	Vedant Kumar	2018-11-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add methods to BasicBlock which make it easier to efficiently check whether a block has N (or more) predecessors. This can be more efficient than using pred_size(), which is a linear time operation. We might consider adding similar methods for successors. I haven't done so in this patch because succ_size() is already O(1). With this patch applied, I measured a 0.065% compile-time reduction in user time for running `opt -O3` on the sqlite3 amalgamation (30 trials). The change in mergeStoreIntoSuccessor alone saves 45 million linked list iterations in a stage2 Release build of llc. See llvm.org/PR39702 for a harder but more general way of achieving similar results. Differential Revision: https://reviews.llvm.org/D54686 llvm-svn: 347256
*	[LV] Avoid vectorizing unsafe dependencies in uniform address	Anna Thomas	2018-11-19	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D54538 llvm-svn: 347220
*	[VPlan, SLP] Use SmallPtrSet for Candidates.	Florian Hahn	2018-11-14	2	-27/+26
\| \| \| \| \| \|	This slightly improves the candidate handling in getBest(). llvm-svn: 346870
*	[VPlan] Remove LLVM_DEBUG from VPlanSlp::dumpBundle.	Florian Hahn	2018-11-14	1	-4/+4
\| \| \| \| \| \|	The caller should take care of only calling it with debug enabled. llvm-svn: 346860
*	[VPlan] Update ifdef.	Florian Hahn	2018-11-14	1	-2/+2
\| \| \| \|	llvm-svn: 346858
*	[VPlan, SLP] Add simple SLP analysis on top of VPlan.	Florian Hahn	2018-11-14	5	-1/+623
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an initial implementation of the look-ahead SLP tree construction described in 'Look-Ahead SLP: Auto-vectorization in the Presence of Commutative Operations, CGO 2018 by Vasileios Porpodas, Rodrigo C. O. Rocha, Luís F. W. Góes'. It returns an SLP tree represented as VPInstructions, with combined instructions represented as a single, wider VPInstruction. This initial version does not support instructions with multiple different users (either inside or outside the SLP tree) or non-instruction operands; it won't generate any shuffles or insertelement instructions. It also just adds the analysis that builds an SLP tree rooted in a set of stores. It does not include any cost modeling or memory legality checks. The plan is to integrate it with VPlan based cost modeling, once available and to only apply it to operations that can be widened. A follow-up patch will add a support for replacing instructions in a VPlan with their SLP counter parts. Reviewers: Ayal, mssimpso, rengolin, mkuper, hfinkel, hsaito, dcaballe, vporpo, RKSimon, ABataev Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D4949 llvm-svn: 346857
*	[VPlan] VPlan version of InterleavedAccessInfo.	Florian Hahn	2018-11-13	4	-9/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch turns InterleaveGroup into a template with the instruction type being a template parameter. It also adds a VPInterleavedAccessInfo class, which only contains a mapping from VPInstructions to their respective InterleaveGroup. As we do not have access to scalar evolution in VPlan, we can re-use convert InterleavedAccessInfo to VPInterleavedAccess info. Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D49489 llvm-svn: 346758
*	[CostModel] Add more realistic SK_ExtractSubvector generic costs.	Simon Pilgrim	2018-11-12	1	-1/+2
\| \| \| \| \| \| \| \|	Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type. llvm-svn: 346656
*	[LV] Avoid vectorizing loops under opt for size that involve SCEV checks	Ayal Zaks	2018-11-02	1	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix PR39417, PR39497 The loop vectorizer may generate runtime SCEV checks for overflow and stride==1 cases, leading to execution of original scalar loop. The latter is forbidden when optimizing for size. An assert introduced in r344743 triggered the above PR's showing it does happen. This patch fixes this behavior by preventing vectorization in such cases. Differential Revision: https://reviews.llvm.org/D53612 llvm-svn: 345959
*	[LV] Support vectorization of interleave-groups that require an epilog under	Dorit Nuzman	2018-10-31	1	-32/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705
*	[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)	Simon Pilgrim	2018-10-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination! I've done my best to fix a number of vectorizer uses: SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment. LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta. Differential Revision: https://reviews.llvm.org/D53573 llvm-svn: 345617
*	[LoopVectorizer] Fix for cost values of memory accesses.	Jonas Paulsson	2018-10-30	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit is a combination of two patches: * "Fix in getScalarizationOverhead()" If target returns false in TTI.prefersVectorizedAddressing(), it means the address registers will not need to be extracted. Therefore, there should be no operands scalarization overhead for a load instruction. * "Don't pass the instruction pointer from getMemInstScalarizationCost." Since VF is always > 1, this is a cost query for an instruction in the vectorized loop and it should not be evaluated within the scalar context of the instruction. Review: Ulrich Weigand, Hal Finkel https://reviews.llvm.org/D52351 https://reviews.llvm.org/D52417 llvm-svn: 345603
*	[LV] Don't have fold-tail under optsize invalidate interleave-groups when	Dorit Nuzman	2018-10-24	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	masked-interleaving is enabled Enable interleave-groups under fold-tail scenario for Opt for size compilation; D50480 added support for vectorizing loops of arbitrary trip-count without a remiander, which in turn makes everything in the loop conditional, including interleave-groups if any. It therefore invalidated all interleave-groups because we didn't have support for vectorizing predicated interleaved-groups at the time. In the meantime, D53011 introduced this support, so we don't have to invalidate interleave-groups when masked-interleaved support is enabled. Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: hsaito Differential Revision: https://reviews.llvm.org/D53559 llvm-svn: 345115
*	[SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductions	Simon Pilgrim	2018-10-23	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Expand arithmetic reduction to include mul/and/or/xor instructions. This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests. This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason. Differential Revision: https://reviews.llvm.org/D53473 llvm-svn: 345037
*	Leftover bits from https://reviews.llvm.org/D53420 that were accidentally left	Dorit Nuzman	2018-10-23	1	-2/+1
\| \| \| \| \| \|	out of revision 344883 llvm-svn: 345021
*	[IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when	Dorit Nuzman	2018-10-22	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	optimizing for size LV is careful to respect -Os and not to create a scalar epilog in all cases (runtime tests, trip-counts that require a remainder loop) except for peeling due to gaps in interleave-groups. This patch fixes that; -Os will now have us invalidate such interleave-groups and vectorize without an epilog. The patch also removes a related FIXME comment that is now obsolete, and was also inaccurate: "FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller MaxVF that does not require a scalar epilog." (requiresScalarEpilog() has nothing to do with VF). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53420 llvm-svn: 344883
*	[LV] Fold tail by masking to vectorize loops of arbitrary trip count under ↵	Ayal Zaks	2018-10-18	4	-38/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	opt for size When optimizing for size, a loop is vectorized only if the resulting vector loop completely replaces the original scalar loop. This holds if no runtime guards are needed, if the original trip-count TC does not overflow, and if TC is a known constant that is a multiple of the VF. The last two TC-related conditions can be overcome by 1. rounding the trip-count of the vector loop up from TC to a multiple of VF; 2. masking the vector body under a newly introduced "if (i <= TC-1)" condition. The patch allows loops with arbitrary trip counts to be vectorized under -Os, subject to the existing cost model considerations. It also applies to loops with small trip counts (under -O2) which are currently handled as if under -Os. The patch does not handle loops with reductions, live-outs, or w/o a primary induction variable, and disallows interleave groups. (Third, final and main part of -) Differential Revision: https://reviews.llvm.org/D50480 llvm-svn: 344743
*	[LV] Teach vectorizer about variant value store into uniform address	Anna Thomas	2018-10-16	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Teach vectorizer about vectorizing variant value stores to uniform address. Similar to rL343028, we do not allow vectorization if we have multiple stores to the same uniform address. Cost model already has the change for considering the extract instruction cost for a variant value store. See added test cases for how vectorization is done. The patch also contains changes to the ORE messages. Reviewers: Ayal, mkuper, anemet, hsaito Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D52656 llvm-svn: 344613
*	[TI removal] Make `getTerminator()` return a generic `Instruction`.	Chandler Carruth	2018-10-15	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This removes the primary remaining API producing `TerminatorInst` which will reduce the rate at which code is introduced trying to use it and generally make it much easier to remove the remaining APIs across the codebase. Also clean up some of the stragglers that the previous mechanical update of variables missed. Users of LLVM and out-of-tree code generally will need to update any explicit variable types to handle this. Replacing `TerminatorInst` with `Instruction` (or `auto`) almost always works. Most of these edits were made in prior commits using the perl one-liner: ``` perl -i -ple 's/TerminatorInst(\b.* = .*getTerminator\(\))/Instruction\1/g' ``` This also my break some rare use cases where people overload for both `Instruction` and `TerminatorInst`, but these should be easily fixed by removing the `TerminatorInst` overload. llvm-svn: 344504
*	[TI removal] Make variables declared as `TerminatorInst` and initialized	Chandler Carruth	2018-10-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502
*	[LV] Fix comments reported when not vectorizing single iteration loops; NFC	Ayal Zaks	2018-10-14	1	-1/+8
\| \| \| \| \| \| \| \|	Landing this as a separate part of https://reviews.llvm.org/D50480, being a seemingly unrelated change ([LV] Vectorizing loops of arbitrary trip count without remainder under opt for size). llvm-svn: 344483
*	recommit 344472 after fixing build failure on ARM and PPC.	Dorit Nuzman	2018-10-14	3	-21/+116
\| \| \| \|	llvm-svn: 344475