summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Vectorize
Commit message (Collapse)AuthorAgeFilesLines
* [SLP]PR39774: Update references of the replaced external instructions.Alexey Bataev2018-11-301-0/+2
| | | | | | | | | | | | | | | Summary: An additional fix for PR39774. Need to update the references for the RedcutionRoot instruction when it is replaced during the vectorization phase to avoid compiler crash on reduction vectorization. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55017 llvm-svn: 347997
* [SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized.Alexey Bataev2018-11-281-5/+9
| | | | | | | | | | | | | | | | | | | Summary: If the original reduction root instruction was vectorized, it might be removed from the tree. It means that the insertion point may become invalidated and the whole vectorization of the reduction leads to the incorrect output result. The ReductionRoot instruction must be marked as externally used so it could not be removed. Otherwise it might cause inconsistency with the cost model and we may end up with too optimistic optimization. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54955 llvm-svn: 347759
* [IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlockVedant Kumar2018-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Add methods to BasicBlock which make it easier to efficiently check whether a block has N (or more) predecessors. This can be more efficient than using pred_size(), which is a linear time operation. We might consider adding similar methods for successors. I haven't done so in this patch because succ_size() is already O(1). With this patch applied, I measured a 0.065% compile-time reduction in user time for running `opt -O3` on the sqlite3 amalgamation (30 trials). The change in mergeStoreIntoSuccessor alone saves 45 million linked list iterations in a stage2 Release build of llc. See llvm.org/PR39702 for a harder but more general way of achieving similar results. Differential Revision: https://reviews.llvm.org/D54686 llvm-svn: 347256
* [LV] Avoid vectorizing unsafe dependencies in uniform addressAnna Thomas2018-11-191-4/+3
| | | | | | | | | | | | | | | | | | | Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D54538 llvm-svn: 347220
* [VPlan, SLP] Use SmallPtrSet for Candidates.Florian Hahn2018-11-142-27/+26
| | | | | | This slightly improves the candidate handling in getBest(). llvm-svn: 346870
* [VPlan] Remove LLVM_DEBUG from VPlanSlp::dumpBundle.Florian Hahn2018-11-141-4/+4
| | | | | | The caller should take care of only calling it with debug enabled. llvm-svn: 346860
* [VPlan] Update ifdef.Florian Hahn2018-11-141-2/+2
| | | | llvm-svn: 346858
* [VPlan, SLP] Add simple SLP analysis on top of VPlan.Florian Hahn2018-11-145-1/+623
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds an initial implementation of the look-ahead SLP tree construction described in 'Look-Ahead SLP: Auto-vectorization in the Presence of Commutative Operations, CGO 2018 by Vasileios Porpodas, Rodrigo C. O. Rocha, Luís F. W. Góes'. It returns an SLP tree represented as VPInstructions, with combined instructions represented as a single, wider VPInstruction. This initial version does not support instructions with multiple different users (either inside or outside the SLP tree) or non-instruction operands; it won't generate any shuffles or insertelement instructions. It also just adds the analysis that builds an SLP tree rooted in a set of stores. It does not include any cost modeling or memory legality checks. The plan is to integrate it with VPlan based cost modeling, once available and to only apply it to operations that can be widened. A follow-up patch will add a support for replacing instructions in a VPlan with their SLP counter parts. Reviewers: Ayal, mssimpso, rengolin, mkuper, hfinkel, hsaito, dcaballe, vporpo, RKSimon, ABataev Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D4949 llvm-svn: 346857
* [VPlan] VPlan version of InterleavedAccessInfo.Florian Hahn2018-11-134-9/+102
| | | | | | | | | | | | | | | | | This patch turns InterleaveGroup into a template with the instruction type being a template parameter. It also adds a VPInterleavedAccessInfo class, which only contains a mapping from VPInstructions to their respective InterleaveGroup. As we do not have access to scalar evolution in VPlan, we can re-use convert InterleavedAccessInfo to VPInterleavedAccess info. Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D49489 llvm-svn: 346758
* [CostModel] Add more realistic SK_ExtractSubvector generic costs.Simon Pilgrim2018-11-121-1/+2
| | | | | | | | Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type. llvm-svn: 346656
* [LV] Avoid vectorizing loops under opt for size that involve SCEV checksAyal Zaks2018-11-021-1/+25
| | | | | | | | | | | | | | Fix PR39417, PR39497 The loop vectorizer may generate runtime SCEV checks for overflow and stride==1 cases, leading to execution of original scalar loop. The latter is forbidden when optimizing for size. An assert introduced in r344743 triggered the above PR's showing it does happen. This patch fixes this behavior by preventing vectorization in such cases. Differential Revision: https://reviews.llvm.org/D53612 llvm-svn: 345959
* [LV] Support vectorization of interleave-groups that require an epilog underDorit Nuzman2018-10-311-32/+82
| | | | | | | | | | | | | | | | | | | | | | optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705
* [TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)Simon Pilgrim2018-10-301-1/+1
| | | | | | | | | | | | | | | | Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination! I've done my best to fix a number of vectorizer uses: SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment. LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta. Differential Revision: https://reviews.llvm.org/D53573 llvm-svn: 345617
* [LoopVectorizer] Fix for cost values of memory accesses.Jonas Paulsson2018-10-301-1/+8
| | | | | | | | | | | | | | | | | | | | | | This commit is a combination of two patches: * "Fix in getScalarizationOverhead()" If target returns false in TTI.prefersVectorizedAddressing(), it means the address registers will not need to be extracted. Therefore, there should be no operands scalarization overhead for a load instruction. * "Don't pass the instruction pointer from getMemInstScalarizationCost." Since VF is always > 1, this is a cost query for an instruction in the vectorized loop and it should not be evaluated within the scalar context of the instruction. Review: Ulrich Weigand, Hal Finkel https://reviews.llvm.org/D52351 https://reviews.llvm.org/D52417 llvm-svn: 345603
* [LV] Don't have fold-tail under optsize invalidate interleave-groups whenDorit Nuzman2018-10-241-1/+7
| | | | | | | | | | | | | | | | | | | | masked-interleaving is enabled Enable interleave-groups under fold-tail scenario for Opt for size compilation; D50480 added support for vectorizing loops of arbitrary trip-count without a remiander, which in turn makes everything in the loop conditional, including interleave-groups if any. It therefore invalidated all interleave-groups because we didn't have support for vectorizing predicated interleaved-groups at the time. In the meantime, D53011 introduced this support, so we don't have to invalidate interleave-groups when masked-interleaved support is enabled. Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: hsaito Differential Revision: https://reviews.llvm.org/D53559 llvm-svn: 345115
* [SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductionsSimon Pilgrim2018-10-231-2/+5
| | | | | | | | | | | | Expand arithmetic reduction to include mul/and/or/xor instructions. This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests. This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason. Differential Revision: https://reviews.llvm.org/D53473 llvm-svn: 345037
* Leftover bits from https://reviews.llvm.org/D53420 that were accidentally leftDorit Nuzman2018-10-231-2/+1
| | | | | | out of revision 344883 llvm-svn: 345021
* [IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when Dorit Nuzman2018-10-221-2/+8
| | | | | | | | | | | | | | | | | | | | | | | optimizing for size LV is careful to respect -Os and not to create a scalar epilog in all cases (runtime tests, trip-counts that require a remainder loop) except for peeling due to gaps in interleave-groups. This patch fixes that; -Os will now have us invalidate such interleave-groups and vectorize without an epilog. The patch also removes a related FIXME comment that is now obsolete, and was also inaccurate: "FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller MaxVF that does not require a scalar epilog." (requiresScalarEpilog() has nothing to do with VF). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53420 llvm-svn: 344883
* [LV] Fold tail by masking to vectorize loops of arbitrary trip count under ↵Ayal Zaks2018-10-184-38/+188
| | | | | | | | | | | | | | | | | | | | | | | | opt for size When optimizing for size, a loop is vectorized only if the resulting vector loop completely replaces the original scalar loop. This holds if no runtime guards are needed, if the original trip-count TC does not overflow, and if TC is a known constant that is a multiple of the VF. The last two TC-related conditions can be overcome by 1. rounding the trip-count of the vector loop up from TC to a multiple of VF; 2. masking the vector body under a newly introduced "if (i <= TC-1)" condition. The patch allows loops with arbitrary trip counts to be vectorized under -Os, subject to the existing cost model considerations. It also applies to loops with small trip counts (under -O2) which are currently handled as if under -Os. The patch does not handle loops with reductions, live-outs, or w/o a primary induction variable, and disallows interleave groups. (Third, final and main part of -) Differential Revision: https://reviews.llvm.org/D50480 llvm-svn: 344743
* [LV] Teach vectorizer about variant value store into uniform addressAnna Thomas2018-10-162-4/+4
| | | | | | | | | | | | | | | | | | | | Summary: Teach vectorizer about vectorizing variant value stores to uniform address. Similar to rL343028, we do not allow vectorization if we have multiple stores to the same uniform address. Cost model already has the change for considering the extract instruction cost for a variant value store. See added test cases for how vectorization is done. The patch also contains changes to the ORE messages. Reviewers: Ayal, mkuper, anemet, hsaito Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D52656 llvm-svn: 344613
* [TI removal] Make `getTerminator()` return a generic `Instruction`.Chandler Carruth2018-10-151-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | This removes the primary remaining API producing `TerminatorInst` which will reduce the rate at which code is introduced trying to use it and generally make it much easier to remove the remaining APIs across the codebase. Also clean up some of the stragglers that the previous mechanical update of variables missed. Users of LLVM and out-of-tree code generally will need to update any explicit variable types to handle this. Replacing `TerminatorInst` with `Instruction` (or `auto`) almost always works. Most of these edits were made in prior commits using the perl one-liner: ``` perl -i -ple 's/TerminatorInst(\b.* = .*getTerminator\(\))/Instruction\1/g' ``` This also my break some rare use cases where people overload for both `Instruction` and `TerminatorInst`, but these should be easily fixed by removing the `TerminatorInst` overload. llvm-svn: 344504
* [TI removal] Make variables declared as `TerminatorInst` and initializedChandler Carruth2018-10-151-1/+1
| | | | | | | | | | | | | by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502
* [LV] Fix comments reported when not vectorizing single iteration loops; NFCAyal Zaks2018-10-141-1/+8
| | | | | | | | Landing this as a separate part of https://reviews.llvm.org/D50480, being a seemingly unrelated change ([LV] Vectorizing loops of arbitrary trip count without remainder under opt for size). llvm-svn: 344483
* recommit 344472 after fixing build failure on ARM and PPC.Dorit Nuzman2018-10-143-21/+116
| | | | llvm-svn: 344475
* revert 344472 due to failures.Dorit Nuzman2018-10-143-116/+21
| | | | llvm-svn: 344473
* [IAI,LV] Add support for vectorizing predicated strided accesses using maskedDorit Nuzman2018-10-143-21/+116
| | | | | | | | | | | | | | | | | | | | | | | interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472
* [LV] Use SmallVector instead of DenseMap in calculateRegisterUsage (NFC).Florian Hahn2018-10-111-5/+4
| | | | | | | | | | | | | | We assign indices sequentially for seen instructions, so we can just use a vector and push back the seen instructions. No need for using a DenseMap. Reviewers: hsaito, rengolin, nadav, dcaballe Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D53089 llvm-svn: 344233
* [LV] Ignore more debug info.Florian Hahn2018-10-111-2/+2
| | | | | | | | | | | | | | | We can avoid doing some unnecessary work by skipping debug instructions in a few loops. It also helps to ensure debug instructions do not prevent vectorization, although I do not have any concrete test cases for that. Reviewers: rengolin, hsaito, dcaballe, aprantl, vsk Reviewed By: rengolin, dcaballe Differential Revision: https://reviews.llvm.org/D53091 llvm-svn: 344232
* [VPlan] Fix CondBit quoting in dumpBasicBlockRenato Golin2018-10-101-1/+3
| | | | | | Quotes were being printed for VPInstructions but not the rest. llvm-svn: 344161
* [LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160Max Kazantsev2018-10-081-17/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the point when we perform `emitTransformedIndex`, we have a broken IR (in particular, we have Phis for which not every incoming value is properly set). On such IR, it is illegal to create SCEV expressions, because their internal simplification process may try to prove some predicates and break when it stumbles across some broken IR. The only purpose of using SCEV in this particular place is attempt to simplify the generated code slightly. It seems that the result isn't worth it, because some trivial cases (like addition of zero and multiplication by 1) can be handled separately if needed, but more generally InstCombine is able to achieve the goals we want to achieve by using SCEV. This patch fixes a functional crash described in PR39160, and as side-effect it also generates a bit smarter code in some simple cases. It also may cause some optimality loss (i.e. we will now generate `mul` by power of `2` instead of shift etc), but there is nothing what InstCombine could not handle later. In case of dire need, we can support more trivial cases just in place. Note that this patch only fixes one particular case of the general problem that LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR. The general solution, however, seems complex enough. Differential Revision: https://reviews.llvm.org/D52881 Reviewed By: fhahn, hsaito llvm-svn: 343954
* [LoopVectorizer] Use TTI.getOperandInfo()Jonas Paulsson2018-10-051-28/+8
| | | | | | | | | | | | | | | | Call getOperandInfo() instead of using (near) duplicated code in LoopVectorizationCostModel::getInstructionCost(). This gets the OperandValueKind and OperandValueProperties values for a Value passed as operand to an arithmetic instruction. getOperandInfo() used to be a static method in TargetTransformInfo.cpp, but is now instead a public member. Review: Florian Hahn https://reviews.llvm.org/D52883 llvm-svn: 343852
* [LV][LAA] Vectorize loop invariant values stored into loop invariant addressAnna Thomas2018-09-252-13/+32
| | | | | | | | | | | | | | | | | | | Summary: We are overly conservative in loop vectorizer with respect to stores to loop invariant addresses. More details in https://bugs.llvm.org/show_bug.cgi?id=38546 This is the first part of the fix where we start with vectorizing loop invariant values to loop invariant addresses. This also includes changes to ORE for stores to invariant address. Reviewers: anemet, Ayal, mkuper, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50665 llvm-svn: 343028
* [Loop Vectorizer] Abandon vectorization when no integer IV foundWarren Ristow2018-09-212-0/+5
| | | | | | | | | | | | | | | | | Support for vectorizing loops with secondary floating-point induction variables was added in r276554. A primary integer IV is still required for vectorization to be done. If an FP IV was found, but no integer IV was found at all (primary or secondary), the attempt to vectorize still went forward, causing a compiler-crash. This change abandons that attempt when no integer IV is found. (Vectorizing FP-only cases like this, rather than bailing out, is discussed as possible future work in D52327.) See PR38800 for more information. Differential Revision: https://reviews.llvm.org/D52327 llvm-svn: 342786
* LSV: Fix adjust alloca alignment trick for AMDGPUMatt Arsenault2018-09-181-29/+31
| | | | | | | | | | This was checking the hardcoded address space 0 for the stack. Additionally, this should be checking for legality with the adjusted alignment, so defer the alignment check. Also try to split if the unaligned access isn't allowed. llvm-svn: 342442
* Fix for the buildbot failure ↵Hideki Saito2018-09-143-4/+11
| | | | | | | | http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/23635 from the commit (r342197) of https://reviews.llvm.org/D50820. llvm-svn: 342201
* [VPlan] Implement initial vector code generation support for simple outer loops.Hideki Saito2018-09-146-15/+287
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: [VPlan] Implement vector code generation support for simple outer loops. Context: Patch Series #1 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following: - force vector code generation using explicit vectorize_width - add conservative early returns in cost model and other places for VPlanNativePath - add code for setting up outer loop inductions - support for widening non-induction PHIs that can result from inner loops and uniform conditional branches - support for generating uniform inner branches We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer. Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal Reviewed By: fhahn, hsaito Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D50820 llvm-svn: 342197
* [LV] Move InterleaveGroup and InterleavedAccessInfo to VectorUtils.h (NFC)Florian Hahn2018-09-121-694/+11
| | | | | | | | | | | | | Move the 2 classes out of LoopVectorize.cpp to make it easier to re-use them for VPlan outside LoopVectorize.cpp Reviewers: Ayal, mssimpso, rengolin, dcaballe, mkuper, hsaito, hfinkel, xbolva00 Reviewed By: rengolin, xbolva00 Differential Revision: https://reviews.llvm.org/D49488 llvm-svn: 342027
* Move a transformation routine from LoopUtils to LoopVectorize.Vikram TV2018-09-101-4/+84
| | | | | | | | | | | | | | | | Summary: Move InductionDescriptor::transform() routine from LoopUtils to its only uses in LoopVectorize.cpp. Specifically, the function is renamed as InnerLoopVectorizer::emitTransformedIndex(). This is a child to D51153. Reviewers: dmgreen, llvm-commits Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D51837 llvm-svn: 341776
* Move createMinMaxOp() out of RecurrenceDescriptor.Vikram TV2018-09-101-2/+2
| | | | | | | | | | Reviewers: dmgreen, llvm-commits Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D51838 llvm-svn: 341773
* [LV] Fix code gen for conditionally executed loads and storesAnna Thomas2018-09-071-8/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a latent bug in loop vectorizer which generates incorrect code for memory accesses that are executed conditionally. As pointed in review, this bug definitely affects uniform loads and may affect conditional stores that should have turned into scatters as well). The code gen for conditionally executed uniform loads on architectures that support masked gather instructions is broken. Without this patch, we were unconditionally executing the *conditional* load in the vectorized version. This patch does the following: 1. Uniform conditional loads on architectures with gather support will have correct code generated. In particular, the cost model (setCostBasedWideningDecision) is fixed. 2. For the recipes which are handled after the widening decision is set, we use the isScalarWithPredication(I, VF) form which is added in the patch. 3. Fix the vectorization cost model for scalarization (getMemInstScalarizationCost): implement and use isPredicatedInst to identify *all* predicated instructions, not just scalar+predicated. So, now the cost for scalarization will be increased for maskedloads/stores and gather/scatter operations. In short, we should be choosing the gather/scatter in place of scalarization on archs where it is profitable. 4. We needed to weaken the assert in useEmulatedMaskMemRefHack. Reviewers: Ayal, hsaito, mkuper Differential Revision: https://reviews.llvm.org/D51313 llvm-svn: 341673
* [LV] First order recurrence phis should not be treated as uniformAnna Thomas2018-09-041-0/+5
| | | | | | | | | | | | | | This is fix for PR38786. First order recurrence phis were incorrectly treated as uniform, which caused them to be vectorized as uniform instructions. Patch by Ayal Zaks and Orivej Desh! Reviewed by: Anna Differential Revision: https://reviews.llvm.org/D51639 llvm-svn: 341416
* SLPVectorizer: Fix assert with different sized address spacesMatt Arsenault2018-08-311-1/+1
| | | | llvm-svn: 341215
* [LoopVectorize][NFCI] Use find instead of countDavid Bolvansky2018-08-231-42/+58
| | | | | | | | | | | | | | | | | Summary: Avoid "count" if possible -> use "find" to check for the existence of keys. Passed llvm test suite. Reviewers: fhahn, dcaballe, mkuper, rengolin Reviewed By: fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51054 llvm-svn: 340563
* [LV] Vectorize loops where non-phi instructions used outside loopAnna Thomas2018-08-212-7/+36
| | | | | | | | | | | | | | | | | | | Summary: Follow up change to rL339703, where we now vectorize loops with non-phi instructions used outside the loop. Note that the cyclic dependency identification occurs when identifying reduction/induction vars. We also need to identify that we do not allow users where the PSCEV information within and outside the loop are different. This was the fix added in rL307837 for PR33706. Reviewers: Ayal, mkuper, fhahn Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D50778 llvm-svn: 340278
* NFC: Clarify comment in loop vectorization legalityAnna Thomas2018-08-141-1/+2
| | | | | | | Clarifying the comment about PSCEV and external IV users by referencing the bug in question. llvm-svn: 339722
* [LV] Teach about non header phis that have uses outside the loopAnna Thomas2018-08-142-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch teaches the loop vectorizer to vectorize loops with non header phis that have have outside uses. This is because the iteration dependence distance for these phis can be widened upto VF (similar to how we do for induction/reduction) if they do not have a cyclic dependence with header phis. When identifying reduction/induction/first order recurrence header phis, we already identify if there are any cyclic dependencies that prevents vectorization. The vectorizer is taught to extract the last element from the vectorized phi and update the scalar loop exit block phi to contain this extracted element from the vector loop. This patch can be extended to vectorize loops where instructions other than phis have outside uses. Reviewers: Ayal, mkuper, mssimpso, efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50579 llvm-svn: 339703
* [SLP] Fix insert point for reused extract instructions.Alexey Bataev2018-08-071-7/+1
| | | | | | | | | | | | | | | Summary: Reworked the previously committed patch to insert shuffles for reused extract element instructions in the correct position. Previous logic was incorrect, and might lead to the crash with PHIs and EH instructions. Reviewers: efriedma, javed.absar Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50143 llvm-svn: 339166
* [SLP] Fix PR38339: Instruction does not dominate all uses!Alexey Bataev2018-07-311-0/+6
| | | | | | | | | | | | | | | | Summary: If the ExtractElement instructions can be optimized out during the vectorization and we need to reshuffle the parent vector, this ShuffleInstruction may be inserted in the wrong place causing compiler to produce incorrect code. Reviewers: spatel, RKSimon, mkuper, hfinkel, javed.absar Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49928 llvm-svn: 338380
* [VPlan] Introduce VPLoopInfo analysis.Diego Caballero2018-07-313-1/+66
| | | | | | | | | | | | | | The patch introduces loop analysis (VPLoopInfo/VPLoop) for VPBlockBases. This analysis will be necessary to perform some H-CFG transformations and detect and introduce regions representing a loop in the H-CFG. Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D48816 llvm-svn: 338346
* [VPlan] Introduce VPlan-based dominator analysis.Diego Caballero2018-07-306-21/+172
| | | | | | | | | | | | | | | The patch introduces dominator analysis for VPBlockBases and extend VPlan's GraphTraits specialization with the required interfaces. Dominator analysis will be necessary to perform some H-CFG transformations and to introduce VPLoopInfo (LoopInfo analysis on top of the VPlan representation). Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D48815 llvm-svn: 338310
OpenPOWER on IntegriCloud