summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
...
* Convert this sample-based-profiling testcase to use a NoDebug CU.Adrian Prantl2016-04-151-4/+1
| | | | llvm-svn: 266481
* [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.Adrian Prantl2016-04-1511-36/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
* [test] Require 'asserts' for a test which uses -debug-onlyVedant Kumar2016-04-141-0/+1
| | | | | | | Without this line, bots which run check-all on Release compilers will break. llvm-svn: 266386
* [ARM] Adding IEEE-754 SIMD detection to loop vectorizerRenato Golin2016-04-141-0/+335
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON. This patch teaches the loop vectorizer to only allow transformations of loops that either contain no floating-point operations or have enough allowance flags supporting lack of precision (ex. -ffast-math, Darwin). For that, the target description now has a method which tells us if the vectorizer is allowed to handle FP math without falling into unsafe representations, plus a check on every FP instruction in the candidate loop to check for the safety flags. This commit makes LLVM behave like GCC with respect to ARM NEON support, but it stops short of fixing the underlying problem: sub-normals. Neither GCC nor LLVM have a flag for allowing sub-normal operations. Before this patch, GCC only allows it using unsafe-math flags and LLVM allows it by default with no way to turn it off (short of not using NEON at all). As a first step, we push this change to make it safe and in sync with GCC. The second step is to discuss a new sub-normal's flag on both communitues and come up with a common solution. The third step is to improve the FastMath flags in LLVM to encode sub-normals and use those flags to restrict NEON FP. Fixes PR16275. llvm-svn: 266363
* Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"Adam Nemet2016-04-141-95/+20
| | | | | | | | This reverts commit r266086. It breaks the LTO build of gcc in SPEC2000. llvm-svn: 266282
* Support arbitrary addrspace pointers in masked load/store intrinsicsArtur Pilipenko2016-04-121-20/+95
| | | | | | | | | | | | | | This is a resubmittion of 263158 change. This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 266086
* [DebugInfo/Test] Add CU as required.Davide Italiano2016-04-111-0/+2
| | | | llvm-svn: 265999
* [LoopUtils, LV] Fix PR27246 (first-order recurrences)Matthew Simpson2016-04-111-0/+41
| | | | | | | | | | | | This patch ensures that when we detect first-order recurrences, we reject a phi node if its previous value is also a phi node. During vectorization the initial and previous values of the recurrence are shuffled together to create the value for the current iteration. However, phi nodes are not widened like other instructions. This fixes PR27246. Differential Revision: http://reviews.llvm.org/D18971 llvm-svn: 265983
* Loop vectorization with uniform loadElena Demikhovsky2016-04-101-0/+47
| | | | | | | | | Vectorization cost of uniform load wasn't correctly calculated. As a result, a simple loop that loads a uniform value wasn't vectorized. Differential Revision: http://reviews.llvm.org/D18940 llvm-svn: 265901
* [LoopVectorize] Register cloned assumptionsDavid Majnemer2016-04-081-0/+32
| | | | | | | | | | | | InstCombine cannot effectively remove redundant assumptions without them registered in the assumption cache. The vectorizer can create identical assumptions but doesn't register them with the cache, resulting in slower compile times because InstCombine tries to reason about a lot more assumptions. Fix this by registering the cloned assumptions. llvm-svn: 265800
* Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA ↵Silviu Baranga2016-04-082-1/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and LV This re-commits r265535 which was reverted in r265541 because it broke the windows bots. The problem was that we had a PointerIntPair which took a pointer to a struct allocated with new. The problem was that new doesn't provide sufficient alignment guarantees. This pattern was already present before r265535 and it just happened to work. To fix this, we now separate the PointerToIntPair from the ExitNotTakenInfo struct into a pointer and a bool. Original commit message: Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265786
* Revert r265535 until we know how we can fix the bots Silviu Baranga2016-04-062-168/+1
| | | | llvm-svn: 265541
* [SCEV] Introduce a guarded backedge taken count and use it in LAA and LVSilviu Baranga2016-04-062-1/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265535
* [SLPVectorizer] Vectorizing the libm sqrt to llvm's sqrt intrinsic requires nnanDavid Majnemer2016-04-061-1/+25
| | | | | | | | | | | | | | To quote the langref "Unlike sqrt in libm, however, llvm.sqrt has undefined behavior for negative numbers other than -0.0 (which allows for better optimization, because there is no need to worry about errno being set). llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sqrt." This means that it's unsafe to replace sqrt with llvm.sqrt unless the call is annotated with nnan. Thanks to Hal Finkel for pointing this out! llvm-svn: 265521
* [SLPVectorizer] Vectorize libcalls of sqrtDavid Majnemer2016-04-061-25/+1
| | | | | | | We didn't realize that we could transform the libcall into a vectorized intrinsic. llvm-svn: 265493
* [DebugInfo] Fix tests so that each subprogram belongs to a CU.Davide Italiano2016-04-052-0/+14
| | | | llvm-svn: 265490
* testcase gardening: update the emissionKind enum to the new syntax. (NFC)Adrian Prantl2016-04-018-8/+8
| | | | llvm-svn: 265081
* Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.Adrian Prantl2016-03-312-2/+2
| | | | | | | | | | | | | This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder into DICompileUnit. DIBuilder is not the right place for this enum to live in — a metadata consumer should not have to include DIBuilder.h. I also added a Verifier check that checks that the emission kind of a DICompileUnit is actually legal. http://reviews.llvm.org/D18612 <rdar://problem/25427165> llvm-svn: 265077
* [LoopVectorize] Don't vectorize loops when everything will be scalarizedHal Finkel2016-03-302-0/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change prevents the loop vectorizer from vectorizing when all of the vector types it generates will be scalarized. I've run into this problem on the PPC's QPX vector ISA, which only holds floating-point vector types. The loop vectorizer will, however, happily vectorize loops with purely integer computation. Here's an example: LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 256 bits. LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Scalar loop costs: 3. LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 2 costs: 2. LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 4 costs: 1. ... LV: Selecting VF: 8. LV: The target has 32 registers LV(REG): Calculating max register usage: LV(REG): At #0 Interval # 0 LV(REG): At #1 Interval # 1 LV(REG): At #2 Interval # 2 LV(REG): At #4 Interval # 1 LV(REG): At #5 Interval # 1 LV(REG): VF = 8 The problem is that the cost model here is not wrong, exactly. Since all of these operations are scalarized, their cost (aside from the uniform ones) are indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF picked, the lower the relative overhead from the loop itself (and the induction-variable update and check), and so in a sense, picking the largest VF here is the right thing to do. The problem is that vectorizing like this, where all of the vectors will be scalarized in the backend, isn't really vectorizing, but rather interleaving. By itself, this would be okay, but then the vectorizer itself also interleaves, and that's where the problem manifests itself. There's aren't actually enough scalar registers to support the normal interleave factor multiplied by a factor of VF (8 in this example). In other words, the problem with this is that our register-pressure heuristic does not account for scalarization. While we might want to improve our register-pressure heuristic, I don't think this is the right motivating case for that work. Here we have a more-basic problem: The job of the vectorizer is to vectorize things (interleaving aside), and if the IR it generates won't generate any actual vector code, then something is wrong. Thus, if every type looks like it will be scalarized (i.e. will be split into VF or more parts), then don't consider that VF. This is not a problem specific to PPC/QPX, however. The problem comes up under SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's reduced test case from PR26837 to this commit. Differential Revision: http://reviews.llvm.org/D18537 llvm-svn: 264904
* [VectorUtils] Don't try and truncate PHIs to a smaller bitwidthJames Molloy2016-03-301-0/+58
| | | | | | | | | | We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because both induction and reduction PHIs are truncated by other means. However, we weren't bailing out in all the places we should have, and we ended up by returning a PHI to be truncated, which has caused PR27018. This fixes PR17018. llvm-svn: 264852
* [Verifier] Reject PHIs using defs from own block.Michael Kruse2016-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reject the following IR as malformed (assuming that %entry, %next are not in a loop): next: %y = phi i32 [ 0, %entry ] %x = phi i32 [ %y, %entry ] Such PHI nodes came up in PR26718. While there was no consensus on whether or not this is valid IR, most opinions on that bug and in a discussion on the llvm-dev mailing list tended towards a "strict interpretation" (term by Joseph Tremoulet) of PHI node uses. Also, the language reference explicitly states that "the use of each incoming value is deemed to occur on the edge from the corresponding predecessor block to the current block" and `DominatorTree::dominates(Instruction*, Use&)` uses this definition as well. For the code mentioned in PR15384, clang does not compile to such PHIs (anymore?). The test case still hangs when replacing `%tmp6` with `%tmp` in revisions before r176366 (where PR15384 has been fixed). The occurrence of %tmp6 therefore was probably unintentional. Its value is not used except in other PHIs. Reviewers: majnemer, reames, JosephTremoulet, bkramer, grosser, jdoerfert, kparzysz, sanjoy Differential Revision: http://reviews.llvm.org/D18443 llvm-svn: 264528
* Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"Matthias Braun2016-03-221-95/+20
| | | | | | | | | | | This commit broke LTO builds. Reverting it to unbreak the bots while the issue is investigated. See also: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160321/341002.html This reverts r263158 llvm-svn: 264088
* [LoopVectorize] Annotate versioned loop with noalias metadataAdam Nemet2016-03-172-0/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Use the new LoopVersioning facility (D16712) to add noalias metadata in the vector loop if we versioned with memchecks. This can enable some optimization opportunities further down the pipeline (see the included test or the benchmark improvement quoted in D16712). The test also covers the bug I had in the initial version in D16712. The vectorizer did not previously use LoopVersioning. The reason is that the vectorizer performs its transformations in single shot. It creates an empty single-block vector loop that it then populates with the widened, if-converted instructions. Thus creating an intermediate versioned scalar loop seems wasteful. So this patch (rather than bringing in LoopVersioning fully) adds a special interface to LoopVersioning to allow the vectorizer to add no-alias annotation while still performing its own versioning. As the vectorizer propagates metadata from the instructions in the original loop to the vector instructions we also check the pointer in the original instruction and see if LoopVersioning can add no-alias metadata based on the issued memchecks. Reviewers: hfinkel, nadav, mzolotukhin Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17191 llvm-svn: 263744
* [LV] Preserve LoopInfo when store predication is usedAdam Nemet2016-03-151-4/+4
| | | | | | | | | | | | | | | | | This was a latent bug that got exposed by the change to add LoopSimplify as a dependence to LoopLoadElimination. Since LoopInfo was corrupted after LV, LoopSimplify mis-compiled nbench in the test-suite (more details in the PR). The problem was that when we create the blocks for predicated stores we didn't add those to any loops. The original testcase for store predication provides coverage for this assuming we verify LI on the way out of LV. Fixes PR26952. llvm-svn: 263565
* Support arbitrary addrspace pointers in masked load/store intrinsicsArtur Pilipenko2016-03-101-20/+95
| | | | | | | | | | | | This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 263158
* [x86] fix cost model inaccuracy for vector memory opsSanjay Patel2016-03-091-2/+2
| | | | | | | | | | | The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar has a completely double-pumped AVX implementation. But getting the cost model to reflect that is a much bigger problem. The small goal here is simply to improve on the lie that !AVX2 == SandyBridge. Differential Revision: http://reviews.llvm.org/D18000 llvm-svn: 263069
* add a test RUN to show unexpected behaviorSanjay Patel2016-03-091-7/+10
| | | | llvm-svn: 263037
* [LoopUtils, LV] Fix PR26734Matthew Simpson2016-03-031-0/+49
| | | | | | | | The vectorization of first-order recurrences (r261346) caused PR26734. When detecting these recurrences, we need to ensure that the previous value is actually defined inside the loop. This patch includes the fix and test case. llvm-svn: 262624
* Fix tests that used CHECK-NEXT-NOT and CHECK-DAG-NOT.Paul Robinson2016-02-262-2/+6
| | | | | | | | FileCheck actually doesn't support combo suffixes. Differential Revision: http://reviews.llvm.org/D17588 llvm-svn: 262054
* Revert r255691 "[LoopVectorizer] Refine loop vectorizer's register usage ↵Hans Wennborg2016-02-192-72/+1
| | | | | | | | calculator by ignoring specific instructions." It caused PR26509. llvm-svn: 261368
* [LV] Vectorize first-order recurrencesMatthew Simpson2016-02-191-0/+209
| | | | | | | | | | | | | | | | | | This patch enables the vectorization of first-order recurrences. A first-order recurrence is a non-reduction recurrence relation in which the value of the recurrence in the current loop iteration equals a value defined in the previous iteration. The load PRE of the GVN pass often creates these recurrences by hoisting loads from within loops. In this patch, we add a new recurrence kind for first-order phi nodes and attempt to vectorize them if possible. Vectorization is performed by shuffling the values for the current and previous iterations. The vectorization cost estimate is updated to account for the added shuffle instruction. Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org> Differential Revision: http://reviews.llvm.org/D16197 llvm-svn: 261346
* [LV] Fix PR26600: avoid out of bounds loads for interleaved access vectorizationSilviu Baranga2016-02-192-5/+9
| | | | | | | | | | | | | | | | | | | | Summary: If we don't have the first and last access of an interleaved load group, the first and last wide load in the loop can do an out of bounds access. Even though we discard results from speculative loads, this can cause problems, since it can technically generate page faults (or worse). We now discard interleaved load groups that don't have the first and load in the group. Reviewers: hfinkel, rengolin Subscribers: rengolin, llvm-commits, mzolotukhin, anemet Differential Revision: http://reviews.llvm.org/D17332 llvm-svn: 261331
* Create masked gather and scatter intrinsics in Loop Vectorizer.Elena Demikhovsky2016-02-172-1/+238
| | | | | | | | | Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access. The feature is enabled on AVX-512 target. Differential Revision: http://reviews.llvm.org/D15690 llvm-svn: 261140
* [LV] Add support for insertelt/extractelt processing during type truncationSilviu Baranga2016-02-151-0/+47
| | | | | | | | | | | | | | | | | | Summary: While shrinking types according to the required bits, we can encounter insert/extract element instructions. This will cause us to reach an llvm_unreachable statement. This change adds support for truncating insert/extract element operations, and adds a regression test. Reviewers: jmolloy Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17078 llvm-svn: 260893
* [SCEV][LAA] Re-commit r260085 and r260086, this time with a fix for the memorySilviu Baranga2016-02-081-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sanitizer issue. The PredicatedScalarEvolution's copy constructor wasn't copying the Generation value, and was leaving it un-initialized. Original commit message: [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection Summary: This change adds no wrap SCEV predicates with: - support for runtime checking - support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)} Note that we are sign extending the increment of the SCEV, even for the zext case. This is needed to cover the fairly common case where y would be a (small) negative integer. In order to do this, this change adds two new flags: nusw and nssw that are applicable to AddRecExprs and permit the transformations above. We also change isStridedPtr in LAA to be able to make use of these predicates. With this feature we should now always be able to work around overflow issues in the dependence analysis. Reviewers: mzolotukhin, sanjoy, anemet Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel Differential Revision: http://reviews.llvm.org/D15412 llvm-svn: 260112
* Revert r260086 and r260085. They have broken the memorySilviu Baranga2016-02-081-4/+6
| | | | | | sanitizer bots. llvm-svn: 260087
* [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided ↵Silviu Baranga2016-02-081-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | pointer detection Summary: This change adds no wrap SCEV predicates with: - support for runtime checking - support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)} Note that we are sign extending the increment of the SCEV, even for the zext case. This is needed to cover the fairly common case where y would be a (small) negative integer. In order to do this, this change adds two new flags: nusw and nssw that are applicable to AddRecExprs and permit the transformations above. We also change isStridedPtr in LAA to be able to make use of these predicates. With this feature we should now always be able to work around overflow issues in the dependence analysis. Reviewers: mzolotukhin, sanjoy, anemet Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel Differential Revision: http://reviews.llvm.org/D15412 llvm-svn: 260085
* [DemandedBits] Revert r249687 due to PR26071James Molloy2016-02-031-34/+0
| | | | | | | | | | | | | | This regresses a test in LoopVectorize, so I'll need to go away and think about how to solve this in a way that isn't broken. From the writeup in PR26071: What's happening is that ComputeKnownZeroes is telling us that all bits except the LSB are zero. We're then deciding that only the LSB needs to be demanded from the icmp's inputs. This is where we're wrong - we're assuming that after simplification the bits that were known zero will continue to be known zero. But they're not - during trivialization the upper bits get changed (because an XOR isn't shrunk), so the icmp fails. The fault is in demandedbits - its contract does clearly state that a non-demanded bit may either be zero or one. llvm-svn: 259649
* AVX1 : Enable vector masked_load/store to AVX1.Igor Breger2016-01-251-30/+28
| | | | | | | | Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q). Differential Revision: http://reviews.llvm.org/D16528 llvm-svn: 258675
* [LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵Cong Hou2015-12-152-1/+72
| | | | | | | | | | | | | | | | | | | | | | ignoring specific instructions. (This is the third attempt to check in this patch, and the first two are r255454 and r255460. The once failed test file reg-usage.ll is now moved to test/Transform/LoopVectorize/X86 directory with target datalayout and target triple indicated.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255691
* Revert r255460, which still causes test failures on some platforms.Cong Hou2015-12-132-70/+1
| | | | | | Further investigation on the failures is ongoing. llvm-svn: 255463
* [LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵Cong Hou2015-12-132-1/+70
| | | | | | | | | | | | | | | | | | | | ignoring specific instructions. (This is the second attempt to check in this patch: REQUIRES: asserts is added to reg-usage.ll now.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255460
* Revert r255454 as it leads to several test failers on buildbots.Cong Hou2015-12-132-69/+1
| | | | llvm-svn: 255456
* [LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵Cong Hou2015-12-132-1/+69
| | | | | | | | | | | | | | | | | ignoring specific instructions. LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255454
* [LoopVectorize] Use MapVector rather than DenseMap for MinBWs.Charlie Turner2015-11-261-0/+54
| | | | | | | | | | | | | | | | | The order in which instructions are truncated in truncateToMinimalBitwidths effects code generation. Switch to a map with a determinisic order, since the iteration order over a DenseMap is not defined. This code is not hot, so the difference in container performance isn't interesting. Many thanks to David Blaikie for making me aware of MapVector! Fixes PR25490. Differential Revision: http://reviews.llvm.org/D14981 llvm-svn: 254179
* Pointers in Masked Load, Store, Gather, Scatter intrinsicsElena Demikhovsky2015-11-191-0/+142
| | | | | | | | | | The masked intrinsics support all integer and floating point data types. I added the pointer type to this list. Added tests for CodeGen and for Loop Vectorizer. Updated the Language Reference. Differential Revision: http://reviews.llvm.org/D14150 llvm-svn: 253544
* [LoopVectorize] Address post-commit feedback on r250032James Molloy2015-11-091-1/+1
| | | | | | | | | | Implemented as many of Michael's suggestions as were possible: * clang-format the added code while it is still fresh. * tried to change Value* to Instruction* in many places in computeMinimumValueSizes - unfortunately there are several places where Constants need to be handled so this wasn't possible. * Reduce the pass list on loop-vectorization-factors.ll. * Fix a bug where we were querying MinBWs for I->getOperand(0) but using MinBWs[I]. llvm-svn: 252469
* DI: Reverse direction of subprogram -> function edge.Peter Collingbourne2015-11-0510-28/+28
| | | | | | | | | | | | | | | | | | | | | | | Previously, subprograms contained a metadata reference to the function they described. Because most clients need to get or set a subprogram for a given function rather than the other way around, this created unneeded inefficiency. For example, many passes needed to call the function llvm::makeSubprogramMap() to build a mapping from functions to subprograms, and the IR linker needed to fix up function references in a way that caused quadratic complexity in the IR linking phase of LTO. This change reverses the direction of the edge by storing the subprogram as function-level metadata and removing DISubprogram's function field. Since this is an IR change, a bitcode upgrade has been provided. Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is attached to the PR. Differential Revision: http://reviews.llvm.org/D14265 llvm-svn: 252219
* LoopVectorizer - skip 'bitcast' between GEP and load.Elena Demikhovsky2015-11-032-86/+125
| | | | | | | | | | | | Skipping 'bitcast' in this case allows to vectorize load: %arrayidx = getelementptr inbounds double*, double** %in, i64 %indvars.iv %tmp53 = bitcast double** %arrayidx to i64* %tmp54 = load i64, i64* %tmp53, align 8 Differential Revision http://reviews.llvm.org/D14112 llvm-svn: 251907
* Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using ↵Cong Hou2015-11-022-4/+50
| | | | | | | | | | | | larger vectorization factor. To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers. This is the second attempt to submit this patch. The first attempt got a test failure on ARM. This patch is updated to try to fix the failure (more specifically, by handling the case when VF=1). Differential revision: http://reviews.llvm.org/D8943 llvm-svn: 251850
OpenPOWER on IntegriCloud