summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
* [LV][VPlan] Detect outer loops for explicit vectorization.Diego Caballero2018-04-243-0/+553
| | | | | | | | | | | | | | | | | Patch #2 from VPlan Outer Loop Vectorization Patch Series #1 (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). This patch introduces the basic infrastructure to detect, legality check and process outer loops annotated with hints for explicit vectorization. All these changes are protected under the feature flag -enable-vplan-native-path. This should make this patch NFC for the existing inner loop vectorizer. Reviewers: hfinkel, mkuper, rengolin, fhahn, aemerson, mssimpso. Differential Revision: https://reviews.llvm.org/D42447 llvm-svn: 330739
* [LV] Introduce TTI::getMinimumVFKrzysztof Parzyszek2018-04-132-0/+175
| | | | | | | | | | | The function getMinimumVF(ElemWidth) will return the minimum VF for a vector with elements of size ElemWidth bits. This value will only apply to targets for which TTI::shouldMaximizeVectorBandwidth returns true. The value of 0 indicates that there is no minimum VF. Differential Revision: https://reviews.llvm.org/D45271 llvm-svn: 330062
* [InstCombine] reassociate loop invariant GEP chains to enable LICMSebastian Pop2018-03-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This change brings performance of zlib up by 10%. The example below is from a hot loop in longest_match() from zlib. do.body: %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ] %idx.ext = zext i32 %cur_match.addr.0 to i64 %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1 %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1 In this example %idx.ext1 is a loop invariant. It will be moved above the use of loop induction variable %idx.ext such that it can be hoisted out of the loop by LICM. The operands that have dependences carried by the loop will be sinked down in the GEP chain. This patch will produce the following output: do.body: %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ] %idx.ext = zext i32 %cur_match.addr.0 to i64 %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1 %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1 %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext llvm-svn: 328539
* Revert r325687 (workaround for PR36032).Evgeny Stupachenko2018-03-224-5/+5
| | | | | | | | | | | | | | Summary: Revert r325687 workaround for PR36032 since a fix was committed in r326154. Reviewers: sbaranga Differential Revision: http://reviews.llvm.org/D44768 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 328257
* [LV] Let recordVectorLoopValueForInductionCast to check if IV was created ↵Andrei Elovikov2018-03-201-0/+39
| | | | | | | | | | | | | | | | | | | | | | from the cast. Summary: It turned out to be error-prone to expect the callers to handle that - better to leave the decision to this routine and make the required data to be explicitly passed to the function. This handles the case that was missed in the r322473 and fixes the assert mentioned in PR36524. Reviewers: dorit, mssimpso, Ayal, dcaballe Reviewed By: dcaballe Subscribers: Ka-Ka, hiraditya, dneilson, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D43812 llvm-svn: 327960
* [LV] Adding test for r327109Renato Golin2018-03-091-0/+49
| | | | llvm-svn: 327155
* [SCEV] Smart range calculation for SCEVUnknown PhisMax Kazantsev2018-03-011-3/+43
| | | | | | | | | | The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN` can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D43810 llvm-svn: 326418
* [ARM] add loop vectorizer test based on 482.sphinx3 from SPEC2006; NFCSanjay Patel2018-02-271-0/+165
| | | | | | | | This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 326221
* Make test agnostic to cost modelAdam Nemet2018-02-271-1/+1
| | | | | | This was causing bot failures on greendragon llvm-svn: 326169
* Fix r326154 buildbots test failEvgeny Stupachenko2018-02-272-3/+2
| | | | | | | | | | Summary: Add specific mtriples to tests added in r326154. From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 326158
* Fix PR36032, PR35432Evgeny Stupachenko2018-02-272-0/+367
| | | | | | | | | | | | | | | Summary: The change fix an assert fail at ScalarEvolutionExpander.cpp: assert(ExitCount != SE.getCouldNotCompute() && "Invalid loop count"); Reviewers: sbaranga Differential Revision: http://reviews.llvm.org/D42604 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 326154
* [LV] Move isLegalMasked* functions from Legality to CostModelRenato Golin2018-02-262-2/+3
| | | | | | | | | | | | | | | | | | | | | | | All SIMD architectures can emulate masked load/store/gather/scatter through element-wise condition check, scalar load/store, and insert/extract. Therefore, bailing out of vectorization as legality failure, when they return false, is incorrect. We should proceed to cost model and determine profitability. This patch is to address the vectorizer's architectural limitation described above. As such, I tried to keep the cost model and vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning should be done separately. Please see http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for RFC and the discussions. Closes D43208. Patch by: Hideki Saito <hideki.saito@intel.com> llvm-svn: 326079
* [LV] Fix test checks, NFCAlexey Bataev2018-02-211-76/+2363
| | | | llvm-svn: 325699
* [SCEV] Temporarily disable loop versioning for the purposeSilviu Baranga2018-02-213-4/+4
| | | | | | | | | | of turning SCEVUnknowns of PHIs into AddRecExprs. This feature is now hidden behind the -scev-version-unknown flag. Fixes PR36032 and PR35432. llvm-svn: 325687
* revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280)Sanjay Patel2018-02-211-9/+36
| | | | | | | | There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. llvm-svn: 325658
* [LV] Fix test checks, NFC.Alexey Bataev2018-02-202-140/+3506
| | | | llvm-svn: 325617
* [TTI CostModel] change default cost of FP ops to 1 (PR36280)Sanjay Patel2018-02-191-36/+9
| | | | | | | | | | | | | | | | | | This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 llvm-svn: 325515
* [LV] Fix analyzeInterleaving when -pass-remarks enabledMircea Trofin2018-02-101-0/+43
| | | | | | | | | | | | | | | | | | | | | | | Summary: If -pass-remarks=loop-vectorize, atomic ops will be seen by analyzeInterleaving(), even though canVectorizeMemory() == false. This is because we are requesting extra analysis instead of bailing out. In such a case, we end up with a Group in both Load- and StoreGroups, and then we'll try to access freed memory when traversing LoadGroups after having had released the Group when iterating over StoreGroups. The fix is to include mayWriteToMemory() when validating that two instructions are the same kind of memory operation. Reviewers: mssimpso, davidxl Reviewed By: davidxl Subscribers: hsaito, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D43064 llvm-svn: 324786
* [LoopVectorize] auto-generate complete checks; NFCSanjay Patel2018-02-081-5/+80
| | | | llvm-svn: 324611
* Verify profile data confirms large loop trip counts.Mircea Trofin2018-02-071-1/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Loops with inequality comparers, such as: // unsigned bound for (unsigned i = 1; i < bound; ++i) {...} have getSmallConstantMaxTripCount report a large maximum static trip count - in this case, 0xffff fffe. However, profiling info may show that the trip count is much smaller, and thus counter-recommend vectorization. This change: - flips loop-vectorize-with-block-frequency on by default. - validates profiled loop frequency data supports vectorization, when static info appears to not counter-recommend it. Absence of profile data means we rely on static data, just as we've done so far. Reviewers: twoh, mkuper, davidxl, tejohnson, Ayal Reviewed By: davidxl Subscribers: bkramer, llvm-commits Differential Revision: https://reviews.llvm.org/D42946 llvm-svn: 324543
* [NFC] Add tests for PR35743Max Kazantsev2018-02-051-0/+102
| | | | llvm-svn: 324209
* [LV] Use Demanded Bits and ValueTracking for reduction type-shrinkingChad Rosier2018-02-041-2/+35
| | | | | | | | | | | | | | | The type-shrinking logic in reduction detection, although narrow in scope, is also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies the approach to rely on the demanded bits and value tracking analyses, if available. We currently perform type-shrinking separately for reductions and other instructions in the loop. Long-term, we should probably think about computing minimal bit widths in a more complete way for the loops we want to vectorize. PR35734 Differential Revision: https://reviews.llvm.org/D42309 llvm-svn: 324195
* [X86] Add support for passing 'prefer-vector-width' function attribute into ↵Craig Topper2018-01-201-0/+77
| | | | | | | | | | | | X86Subtarget and exposing via X86's getRegisterWidth TTI interface. This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for. I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10. This has been split from D41895 with the remainder in a follow up commit. llvm-svn: 323015
* [X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for ↵Craig Topper2018-01-181-1/+1
| | | | | | | | consistency with loads. Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads. llvm-svn: 322820
* [LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV ↵Andrei Elovikov2018-01-151-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | from a trunc. Summary: This method is supposed to be called for IVs that have casts in their use-def chains that are completely ignored after vectorization under PSE. However, for truncates of such IVs the same InductionDescriptor is used during creation/widening of both original IV based on PHINode and new IV based on TruncInst. This leads to unintended second call to recordVectorLoopValueForInductionCast with a VectorLoopVal set to the newly created IV for a trunc and causes an assert due to attempt to store new information for already existing entry in the map. This is wrong and should not be done. Fixes PR35773. Reviewers: dorit, Ayal, mssimpso Reviewed By: dorit Subscribers: RKSimon, dim, dcaballe, hsaito, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41913 llvm-svn: 322473
* [LV] Remove unnecessary DoExtraAnalysis guard (silent bug)Florian Hahn2017-12-201-0/+27
| | | | | | | | | | | | | | canVectorize is only checking if the loop has a normalized pre-header if DoExtraAnalysis is true. This doesn't make sense to me because reporting analysis information shouldn't alter legality checks. This is probably the result of a last minute minor change before committing (?). Patch by Diego Caballero. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D40973 llvm-svn: 321172
* Move Transforms/LoopVectorize/consecutive-ptr-cg-bug.ll into the X86 ↵Hal Finkel2017-12-161-0/+0
| | | | | | | | subdirectory This test depends on X86's TTI; move into the X86 subdirectory. llvm-svn: 320914
* [LV] Extend InstWidening with CM_Widen_RecursiveHal Finkel2017-12-161-0/+68
| | | | | | | | | | | | | | | | | | | | Changes to the original scalar loop during LV code gen cause the return value of Legal->isConsecutivePtr() to be inconsistent with the return value during legal/cost phases (further analysis and information of the bug is in D39346). This patch is an alternative fix to PR34965 following the CM_Widen approach proposed by Ayal and Gil in D39346. It extends InstWidening enum with CM_Widen_Reverse to properly record the widening decision for consecutive reverse memory accesses and, consequently, get rid of the Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner solution to PR34965 than the one in D39346. Fixes PR34965. Patch by Diego Caballero, thanks! Differential Revision: https://reviews.llvm.org/D40742 llvm-svn: 320913
* [LV] Support efficient vectorization of an induction with redundant castsDorit Nuzman2017-12-141-0/+211
| | | | | | | | | | | | | | | | | | | | | | | | | | | | D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose update chain involves casts; PSCEV can now build an AddRecurrence for some forms of such phi nodes, under the proper runtime overflow test. This means that we can identify such phi nodes as an induction, and the loop-vectorizer can now vectorize such inductions, however inefficiently. The vectorizer doesn't know that it can ignore the casts, and so it vectorizes them. This patch records the casts in the InductionDescriptor, so that they could be marked to be ignored for cost calculation (we use VecValuesToIgnore for that) and ignored for vectorization/widening/scalarization (i.e. treated as TriviallyDead). In addition to marking all these casts to be ignored, we also need to make sure that each cast is mapped to the right vector value in the vector loop body (be it a widened, vectorized, or scalarized induction). So whenever an induction phi is mapped to a vector value (during vectorization/widening/ scalarization), we also map the respective cast instruction (if exists) to that vector value. (If the phi-update sequence of an induction involves more than one cast, then the above mapping to vector value is relevant only for the last cast of the sequence as we allow only the "last cast" to be used outside the induction update chain itself). This is the last step in addressing PR30654. llvm-svn: 320672
* [LV] Ignore the cost of values that will not appear in the vectorized loopDorit Nuzman2017-12-121-0/+80
| | | | | | | | | VecValuesToIgnore holds values that will not appear in the vectorized loop. We should therefore ignore their cost when VF > 1. Differential Revision: https://reviews.llvm.org/D40883 llvm-svn: 320463
* [SCEV] Fix wrong Equal predicate created in getAddRecForPhiWithCastsDorit Nuzman2017-12-101-2/+3
| | | | | | | | | | | | | | | | | | | CreateAddRecFromPHIWithCastsImpl() adds an IncrementNUSW overflow predicate which allows the PSCEV rewriter to rewrite this scev expression: (zext i8 {0, + , (trunc i32 step to i8)} to i32) into {0, +, (sext i8 (trunc i32 step to i8) to i32)} But then it adds the wrong Equal predicate: %step == (zext i8 (trunc i32 %step to i8) to i32). instead of: %step == (sext i8 (trunc i32 %step to i8) to i32) This is fixed here. Differential Revision: https://reviews.llvm.org/D40641 llvm-svn: 320298
* [LV] Interleaved access vectorization: fix computing new alias infoAdam Nemet2017-12-061-0/+63
| | | | | | | | | | | As a new access is generated spanning across multiple fields, we need to propagate alias info from all the fields to form the most generic alias info. rdar://35602528 Differential Revision: https://reviews.llvm.org/D40617 llvm-svn: 319979
* [LV] Model masking in VPlan, introducing VPInstructionsGil Rapaport2017-11-201-1/+1
| | | | | | | | | | | | | | | This patch adds a new abstraction layer to VPlan and leverages it to model the planned instructions that manipulate masks (AND, OR, NOT), introduced during predication. The new VPValue and VPUser classes model how data flows into, through and out of a VPlan, forming the vertices of a planned Def-Use graph. The new VPInstruction class is a generic single-instruction Recipe that models a planned instruction along with its opcode, operands and users. See VectorizationPlan.rst for more details. Differential Revision: https://reviews.llvm.org/D38676 llvm-svn: 318645
* [LV] Introduce VPBlendRecipe, VPWidenMemoryInstructionRecipeGil Rapaport2017-11-141-0/+38
| | | | | | | | | | | | | | | | | | | | This patch is part of D38676. The patch introduces two new Recipes to handle instructions whose vectorization involves masking. These Recipes take VPlan-level masks in D38676, but still rely on ILV's existing createEdgeMask(), createBlockInMask() in this patch. VPBlendRecipe handles intra-loop phi nodes, which are vectorized as a sequence of SELECTs. Its execute() code is refactored out of ILV::widenPHIInstruction(), which now handles only loop-header phi nodes. VPWidenMemoryInstructionRecipe handles load/store which are to be widened (but are not part of an Interleave Group). In this patch it simply calls ILV::vectorizeMemoryInstruction on execute(). Differential Revision: https://reviews.llvm.org/D39068 llvm-svn: 318149
* Add an @llvm.sideeffect intrinsicDan Gohman2017-11-081-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements Chandler's idea [0] for supporting languages that require support for infinite loops with side effects, such as Rust, providing part of a solution to bug 965 [1]. Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual effect, but which appears to optimization passes to have obscure side effects, such that they don't optimize away loops containing it. It also teaches several optimization passes to ignore this intrinsic, so that it doesn't significantly impact optimization in most cases. As discussed on llvm-dev [2], this patch is the first of two major parts. The second part, to change LLVM's semantics to have defined behavior on infinite loops by default, with a function attribute for opting into potential-undefined-behavior, will be implemented and posted for review in a separate patch. [0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html [1] https://bugs.llvm.org/show_bug.cgi?id=965 [2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html Differential Revision: https://reviews.llvm.org/D38336 llvm-svn: 317729
* [LV/LAA] Avoid specializing a loop for stride=1 when this predicate implies aDorit Nuzman2017-11-052-2/+125
| | | | | | | | | | | | single-iteration loop This fixes PR34681. Avoid adding the "Stride == 1" predicate when we know that Stride >= Trip-Count. Such a predicate will effectively optimize a single or zero iteration loop, as Trip-Count <= Stride == 1. Differential Revision: https://reviews.llvm.org/D38785 llvm-svn: 317438
* [SimplifyCFG] use pass options and remove the latesimplifycfg passSanjay Patel2017-10-282-2/+2
| | | | | | | | | | | | | | | | | This is no-functional-change-intended. This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and D35411 (defer folding unconditional branches) with pass parameters rather than a named "latesimplifycfg" pass. Now that we have individual options to control the functionality, we could decouple when these fire (but that's an independent patch if desired). The next planned step would be to add another option bit to disable the sinking transform mentioned in D38566. This should also make it clear that the new pass manager needs to be updated to limit simplifycfg in the same way as the old pass manager. Differential Revision: https://reviews.llvm.org/D38631 llvm-svn: 316835
* Do not add discriminator encoding for debug intrinsics.Dehao Chen2017-10-261-4/+10
| | | | | | | | | | | | | | Summary: There are certain requirements for debug location of debug intrinsics, e.g. the scope of the DILocalVariable should be the same as the scope of its debug location. As a result, we should not add discriminator encoding for debug intrinsics. Reviewers: dblaikie, aprantl Reviewed By: aprantl Subscribers: JDevlieghere, aprantl, bjope, sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D39343 llvm-svn: 316703
* [LV] Fix PR34743 - handle casts that sink after interleaved loadsAyal Zaks2017-10-051-0/+52
| | | | | | | | | | | When ignoring a load that participates in an interleaved group, make sure to move a cast that needs to sink after it. Testcase derived from reproducer of PR34743. Differential Revision: https://reviews.llvm.org/D38338 llvm-svn: 314986
* [LV] Fix PR34711 - widen instruction ranges when sinking castsAyal Zaks2017-10-051-0/+49
| | | | | | | | | | | | | | | | Instead of trying to keep LastWidenRecipe updated after creating each recipe, have tryToWiden() retrieve the last recipe of the current VPBasicBlock and check if it's a VPWidenRecipe when attempting to extend its range. This ensures that such extensions, optimized to maintain the original instruction order, do so only when the instructions are to maintain their relative order. The latter does not always hold, e.g., when a cast needs to sink to unravel first order recurrence (r306884). Testcase derived from reproducer of PR34711. Differential Revision: https://reviews.llvm.org/D38339 llvm-svn: 314981
* [LV] Use correct insertion point when type shrinking reductionsMatthew Simpson2017-09-291-0/+40
| | | | | | | | | | | | | When type shrinking reductions, we should insert the truncations and extends at the end of the loop latch block. Previously, these instructions were inserted at the end of the loop header block. The difference is only a problem for loops with predicated instructions (e.g., conditional stores and instructions that may divide by zero). For these instructions, we create new basic blocks inside the vectorized loop, which cause the loop header and latch to no longer be the same block. This should fix PR34687. Reference: https://bugs.llvm.org/show_bug.cgi?id=34687 llvm-svn: 314542
* [InstCombine] remove extract-of-select vector transform (2nd try)Sanjay Patel2017-09-251-29/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 1st attempt at this: https://reviews.llvm.org/rL314117 was reverted at: https://reviews.llvm.org/rL314118 because of bot fails for clang tests that were checking optimized IR. That should be fixed with: https://reviews.llvm.org/rL314144 ...so try again. Original commit message: The transform to convert an extract-of-a-select-of-vectors was added at: https://reviews.llvm.org/rL194013 And a question about the validity of this transform was raised in the review: https://reviews.llvm.org/D1539: ...but not answered AFAICT> Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with the original commit, but they are not regressing even after we remove the transform in this patch. The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do those transforms as canonicalizations. The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301: https://bugs.llvm.org/show_bug.cgi?id=33301 ...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see. Differential Revision: https://reviews.llvm.org/D38006 llvm-svn: 314147
* revert r314117 because there are bogus clang tests that depend on the optimizerSanjay Patel2017-09-251-29/+29
| | | | llvm-svn: 314118
* [InstCombine] remove extract-of-select vector transformSanjay Patel2017-09-251-29/+29
| | | | | | | | | | | | | | | | | | | | | | | The transform to convert an extract-of-a-select-of-vectors was added at: rL194013 And a question about the validity of this transform was raised in the review: https://reviews.llvm.org/D1539: ...but not answered AFAICT> Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with the original commit, but they are not regressing even after we remove the transform in this patch. The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do those transforms as canonicalizations. The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301: https://bugs.llvm.org/show_bug.cgi?id=33301 ...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see. Differential Revision: https://reviews.llvm.org/D38006 llvm-svn: 314117
* [LV] Fix maximum legal VF calculationAlon Kom2017-09-141-0/+51
| | | | | | | | | | | | | | | | This patch fixes pr34283, which exposed that the computation of maximum legal width for vectorization was wrong, because it relied on MaxInterleaveFactor to obtain the maximum stride used in the loop, however not all strided accesses in the loop have an interleave-group associated with them. Instead of recording the maximum stride in the loop, which can be over conservative (e.g. if the access with the maximum stride is not involved in the dependence limitation), this patch tracks the actual maximum legal width imposed by accesses that are involved in dependencies. Differential Revision: https://reviews.llvm.org/D37507 llvm-svn: 313237
* [LV] Avoid computing the register usage for default VF. NFCAnna Thomas2017-09-131-4/+0
| | | | | | | | | | | These are changes to reduce redundant computations when calculating a feasible vectorization factor: 1. early return when target has no vector registers 2. don't compute register usage for the default VF. Suggested during review for D37702. llvm-svn: 313176
* [LV] Fix PR34523 - avoid generating redundant selectsAyal Zaks2017-09-131-0/+27
| | | | | | | | | | | | | | | | | When converting a PHI into a series of 'select' instructions to combine the incoming values together according their edge masks, initialize the first value to the incoming value In0 of the first predecessor, instead of generating a redundant assignment 'select(Cond[0], In0, In0)'. The latter fails when the Cond[0] mask is null, representing a full mask, which can happen only when there's a single incoming value. No functional changes intended nor expected other than surviving null Cond[0]'s. This fix follows D35725, which introduced using null to represent full masks. Differential Revision: https://reviews.llvm.org/D37619 llvm-svn: 313119
* [LV] Clamp the VF to the trip countAnna Thomas2017-09-121-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: When the MaxVectorSize > ConstantTripCount, we should just clamp the vectorization factor to be the ConstantTripCount. This vectorizes loops where the TinyTripCountThreshold >= TripCount < MaxVF. Earlier we were finding the maximum vector width, which could be greater than the trip count itself. The Loop vectorizer does all the work for generating a vectorizable loop, but in the end we would always choose the scalar loop (since the VF > trip count). This allows us to choose the VF keeping in mind the trip count if available. This is a fix on top of rL312472. Reviewers: Ayal, zvi, hfinkel, dneilson Reviewed by: Ayal Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37702 llvm-svn: 313046
* LoopVectorize: MaxVF should not be larger than the loop trip countZvi Rackover2017-09-041-0/+35
| | | | | | | | | | | | | | | | | | | | | | | Summary: Improve how MaxVF is computed while taking into account that MaxVF should not be larger than the loop's trip count. Other than saving on compile-time by pruning the possible MaxVF candidates, this patch fixes pr34438 which exposed the following flow: 1. Short trip count identified -> Don't bail out, set OptForSize:=True to avoid tail-loop and runtime checks. 2. Compute MaxVF returned 16 on a target supporting AVX512. 3. OptForSize -> choose VF:=MaxVF. 4. Bail out because TripCount = 8, VF = 16, TripCount % VF !=0 means we need a tail loop. With this patch step 2. will choose MaxVF=8 based on TripCount. Reviewers: Ayal, dorit, mkuper, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D37425 llvm-svn: 312472
* Canonicalize the representation of empty an expression in ↵Adrian Prantl2017-08-301-3/+3
| | | | | | | | | | | | | | | | DIGlobalVariableExpression This change simplifies code that has to deal with DIGlobalVariableExpression and mirrors how we treat DIExpressions in debug info intrinsics. Before this change there were two ways of representing empty expressions on globals, a nullptr and an empty !DIExpression(). If someone needs to upgrade out-of-tree testcases: perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll> will catch 95%. llvm-svn: 312144
OpenPOWER on IntegriCloud