summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Vectorize
Commit message (Collapse)AuthorAgeFilesLines
...
* [ValueTracking, VectorUtils] Refactor getIntrinsicIDForCallDavid Majnemer2016-04-192-10/+10
| | | | | | | | | | | | | The functionality contained within getIntrinsicIDForCall is two-fold: it checks if a CallInst's callee is a vectorizable intrinsic. If it isn't an intrinsic, it attempts to map the call's target to a suitable intrinsic. Move the mapping functionality into getIntrinsicForCallSite and rename getIntrinsicIDForCall to getVectorIntrinsicIDForCall while reimplementing it in terms of getIntrinsicForCallSite. llvm-svn: 266801
* Port DemandedBits to the new pass manager.Michael Kuperstein2016-04-182-9/+9
| | | | | | Differential Revision: http://reviews.llvm.org/D18679 llvm-svn: 266699
* [NFC] Header cleanupMehdi Amini2016-04-181-1/+0
| | | | | | | | | | | | | | Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595
* [ARM] Adding IEEE-754 SIMD detection to loop vectorizerRenato Golin2016-04-141-6/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON. This patch teaches the loop vectorizer to only allow transformations of loops that either contain no floating-point operations or have enough allowance flags supporting lack of precision (ex. -ffast-math, Darwin). For that, the target description now has a method which tells us if the vectorizer is allowed to handle FP math without falling into unsafe representations, plus a check on every FP instruction in the candidate loop to check for the safety flags. This commit makes LLVM behave like GCC with respect to ARM NEON support, but it stops short of fixing the underlying problem: sub-normals. Neither GCC nor LLVM have a flag for allowing sub-normal operations. Before this patch, GCC only allows it using unsafe-math flags and LLVM allows it by default with no way to turn it off (short of not using NEON at all). As a first step, we push this change to make it safe and in sync with GCC. The second step is to discuss a new sub-normal's flag on both communitues and come up with a common solution. The third step is to improve the FastMath flags in LLVM to encode sub-normals and use those flags to restrict NEON FP. Fixes PR16275. llvm-svn: 266363
* [CodeGen] Teach LLVM how to lower @llvm.{min,max}num to {MIN,MAX}NANDavid Majnemer2016-04-143-7/+32
| | | | | | | | | | | | | | | The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only one of the inputs is NaN: -NUM will return the non-NaN argument while -NAN would return NaN. It is desirable to lower to @llvm.{min,max}num to -NAN if they don't have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the -NAN semantics. N.B. Of course, it is only safe to do this if the intrinsic call is marked nnan. llvm-svn: 266279
* Loop vectorization with uniform loadElena Demikhovsky2016-04-101-0/+9
| | | | | | | | | Vectorization cost of uniform load wasn't correctly calculated. As a result, a simple loop that loads a uniform value wasn't vectorized. Differential Revision: http://reviews.llvm.org/D18940 llvm-svn: 265901
* [LoopVectorize] Register cloned assumptionsDavid Majnemer2016-04-081-10/+24
| | | | | | | | | | | | InstCombine cannot effectively remove redundant assumptions without them registered in the assumption cache. The vectorizer can create identical assumptions but doesn't register them with the cache, resulting in slower compile times because InstCombine tries to reason about a lot more assumptions. Fix this by registering the cloned assumptions. llvm-svn: 265800
* Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA ↵Silviu Baranga2016-04-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and LV This re-commits r265535 which was reverted in r265541 because it broke the windows bots. The problem was that we had a PointerIntPair which took a pointer to a struct allocated with new. The problem was that new doesn't provide sufficient alignment guarantees. This pattern was already present before r265535 and it just happened to work. To fix this, we now separate the PointerToIntPair from the ExitNotTakenInfo struct into a pointer and a bool. Original commit message: Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265786
* Revert r265535 until we know how we can fix the bots Silviu Baranga2016-04-061-2/+2
| | | | llvm-svn: 265541
* [SCEV] Introduce a guarded backedge taken count and use it in LAA and LVSilviu Baranga2016-04-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265535
* [SLPVectorizer] Don't insert an extractelement before a catchswitchDavid Majnemer2016-04-011-2/+9
| | | | | | | | | | | | | A catchswitch cannot be preceded by another instruction in the same basic block (other than a PHI node). Instead, insert the extract element right after the materialization of the vectorized value. This isn't optimal but is a reasonable compromise given the constraints of WinEH. This fixes PR27163. llvm-svn: 265157
* [LoopVectorize] Don't vectorize loops when everything will be scalarizedHal Finkel2016-03-301-18/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change prevents the loop vectorizer from vectorizing when all of the vector types it generates will be scalarized. I've run into this problem on the PPC's QPX vector ISA, which only holds floating-point vector types. The loop vectorizer will, however, happily vectorize loops with purely integer computation. Here's an example: LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 256 bits. LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Scalar loop costs: 3. LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 2 costs: 2. LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 4 costs: 1. ... LV: Selecting VF: 8. LV: The target has 32 registers LV(REG): Calculating max register usage: LV(REG): At #0 Interval # 0 LV(REG): At #1 Interval # 1 LV(REG): At #2 Interval # 2 LV(REG): At #4 Interval # 1 LV(REG): At #5 Interval # 1 LV(REG): VF = 8 The problem is that the cost model here is not wrong, exactly. Since all of these operations are scalarized, their cost (aside from the uniform ones) are indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF picked, the lower the relative overhead from the loop itself (and the induction-variable update and check), and so in a sense, picking the largest VF here is the right thing to do. The problem is that vectorizing like this, where all of the vectors will be scalarized in the backend, isn't really vectorizing, but rather interleaving. By itself, this would be okay, but then the vectorizer itself also interleaves, and that's where the problem manifests itself. There's aren't actually enough scalar registers to support the normal interleave factor multiplied by a factor of VF (8 in this example). In other words, the problem with this is that our register-pressure heuristic does not account for scalarization. While we might want to improve our register-pressure heuristic, I don't think this is the right motivating case for that work. Here we have a more-basic problem: The job of the vectorizer is to vectorize things (interleaving aside), and if the IR it generates won't generate any actual vector code, then something is wrong. Thus, if every type looks like it will be scalarized (i.e. will be split into VF or more parts), then don't consider that VF. This is not a problem specific to PPC/QPX, however. The problem comes up under SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's reduced test case from PR26837 to this commit. Differential Revision: http://reviews.llvm.org/D18537 llvm-svn: 264904
* Remove HasFnAttribute guards to getFnAttribute callsNirav Dave2016-03-301-3/+2
| | | | | | | | | | | | These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872
* [SLP] Remove unnecessary member variables by using container APIs.Chad Rosier2016-03-211-13/+6
| | | | | | | This changes the debug output, but still retains its usefulness. Differential Revision: http://reviews.llvm.org/D18324 llvm-svn: 263975
* [LoopVectorize] Annotate versioned loop with noalias metadataAdam Nemet2016-03-171-14/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Use the new LoopVersioning facility (D16712) to add noalias metadata in the vector loop if we versioned with memchecks. This can enable some optimization opportunities further down the pipeline (see the included test or the benchmark improvement quoted in D16712). The test also covers the bug I had in the initial version in D16712. The vectorizer did not previously use LoopVersioning. The reason is that the vectorizer performs its transformations in single shot. It creates an empty single-block vector loop that it then populates with the widened, if-converted instructions. Thus creating an intermediate versioned scalar loop seems wasteful. So this patch (rather than bringing in LoopVersioning fully) adds a special interface to LoopVersioning to allow the vectorizer to add no-alias annotation while still performing its own versioning. As the vectorizer propagates metadata from the instructions in the original loop to the vector instructions we also check the pointer in the original instruction and see if LoopVersioning can add no-alias metadata based on the issued memchecks. Reviewers: hfinkel, nadav, mzolotukhin Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17191 llvm-svn: 263744
* [SLP] Make DataLayout a member variable.Chad Rosier2016-03-161-38/+35
| | | | llvm-svn: 263656
* [LV] Preserve LoopInfo when store predication is usedAdam Nemet2016-03-151-1/+1
| | | | | | | | | | | | | | | | | This was a latent bug that got exposed by the change to add LoopSimplify as a dependence to LoopLoadElimination. Since LoopInfo was corrupted after LV, LoopSimplify mis-compiled nbench in the test-suite (more details in the PR). The problem was that when we create the blocks for predicated stores we didn't add those to any loops. The original testcase for store predication provides coverage for this assuming we verify LI on the way out of LV. Fixes PR26952. llvm-svn: 263565
* [SLP] Update comment to reflect reality. NFC.Chad Rosier2016-03-151-2/+2
| | | | llvm-svn: 263548
* [SLPVectorizer] Fix dependency listKeno Fischer2016-03-141-0/+1
| | | | | | | | | | | | | | | Summary: DemandedBits was added to the requirements of SLPVectorizer in rL261212 (and various earlier version of it), but the appropriate initialization statement was accidentally forgotten. Ref [[ https://github.com/JuliaLang/julia/issues/14998 | JuliaLang/julia#14998 ]]. Patch by Yichao Yu. Reviewers: mssimpso Differential Revision: http://reviews.llvm.org/D18152 llvm-svn: 263476
* Remove PreserveNames template parameter from IRBuilderMehdi Amini2016-03-131-2/+2
| | | | | | | | This reapplies r263258, which was reverted in r263321 because of issues on Clang side. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263393
* Temporarily revert:Eric Christopher2016-03-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ae14bf6488e8441f0f6d74f00455555f6f3943ac Author: Mehdi Amini <mehdi.amini@apple.com> Date: Fri Mar 11 17:15:50 2016 +0000 Remove PreserveNames template parameter from IRBuilder Summary: Following r263086, we are now relying on a flag on the Context to discard Value names in release builds. Reviewers: chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18023 From: Mehdi Amini <mehdi.amini@apple.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258 91177308-0d34-0410-b5e6-96231b3b80d8 until we can figure out what to do about clang and Release build testing. This reverts commit 263258. llvm-svn: 263321
* Remove PreserveNames template parameter from IRBuilderMehdi Amini2016-03-111-2/+2
| | | | | | | | | | | | | | | Summary: Following r263086, we are now relying on a flag on the Context to discard Value names in release builds. Reviewers: chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18023 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263258
* [SLP] Add -slp-min-reg-size command line option.Michael Zolotukhin2016-03-101-9/+21
| | | | | | | | | | MinVecRegSize is currently hardcoded to 128; this patch adds a cl::opt to allow changing it. I tried not to change any existing behavior for the default case. Differential revision: http://reviews.llvm.org/D13278 llvm-svn: 263089
* ADT: Remove == and != comparisons between ilist iterators and pointersDuncan P. N. Exon Smith2016-02-211-1/+1
| | | | | | | | | | | | | | I missed == and != when I removed implicit conversions between iterators and pointers in r252380 since they were defined outside ilist_iterator. Since they depend on getNodePtrUnchecked(), they indirectly rely on UB. This commit removes all uses of these operators. (I'll delete the operators themselves in a separate commit so that it can be easily reverted if necessary.) There should be NFC here. llvm-svn: 261498
* Revert r255691 "[LoopVectorizer] Refine loop vectorizer's register usage ↵Hans Wennborg2016-02-191-106/+31
| | | | | | | | calculator by ignoring specific instructions." It caused PR26509. llvm-svn: 261368
* [LV] Vectorize first-order recurrencesMatthew Simpson2016-02-191-6/+197
| | | | | | | | | | | | | | | | | | This patch enables the vectorization of first-order recurrences. A first-order recurrence is a non-reduction recurrence relation in which the value of the recurrence in the current loop iteration equals a value defined in the previous iteration. The load PRE of the GVN pass often creates these recurrences by hoisting loads from within loops. In this patch, we add a new recurrence kind for first-order phi nodes and attempt to vectorize them if possible. Vectorization is performed by shuffling the values for the current and previous iterations. The vectorization cost estimate is updated to account for the added shuffle instruction. Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org> Differential Revision: http://reviews.llvm.org/D16197 llvm-svn: 261346
* [LV] Fix PR26600: avoid out of bounds loads for interleaved access vectorizationSilviu Baranga2016-02-191-0/+10
| | | | | | | | | | | | | | | | | | | | Summary: If we don't have the first and last access of an interleaved load group, the first and last wide load in the loop can do an out of bounds access. Even though we discard results from speculative loads, this can cause problems, since it can technically generate page faults (or worse). We now discard interleaved load groups that don't have the first and load in the group. Reviewers: hfinkel, rengolin Subscribers: rengolin, llvm-commits, mzolotukhin, anemet Differential Revision: http://reviews.llvm.org/D17332 llvm-svn: 261331
* Reapply commit r259357 with a fix for PR26629Matthew Simpson2016-02-181-12/+237
| | | | | | | | | | Commit r259357 was reverted because it caused PR26629. We were assuming all roots of a vectorizable tree could be truncated to the same width, which is not the case in general. This commit reapplies the patch along with a fix and a new test case to ensure we don't regress because of this issue again. This should fix PR26629. llvm-svn: 261212
* Create masked gather and scatter intrinsics in Loop Vectorizer.Elena Demikhovsky2016-02-171-96/+184
| | | | | | | | | Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access. The feature is enabled on AVX-512 target. Differential Revision: http://reviews.llvm.org/D15690 llvm-svn: 261140
* Revert "Reapply commit r258404 with fix."David Majnemer2016-02-171-231/+11
| | | | | | This reverts commit r259357, it caused PR26629. llvm-svn: 261137
* [LV] Add support for insertelt/extractelt processing during type truncationSilviu Baranga2016-02-151-0/+14
| | | | | | | | | | | | | | | | | | Summary: While shrinking types according to the required bits, we can encounter insert/extract element instructions. This will cause us to reach an llvm_unreachable statement. This change adds support for truncating insert/extract element operations, and adds a regression test. Reviewers: jmolloy Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17078 llvm-svn: 260893
* [SLP] Add debug output for extract cost (NFC)Matthew Simpson2016-02-111-4/+6
| | | | llvm-svn: 260614
* [SCEV][LAA] Re-commit r260085 and r260086, this time with a fix for the memorySilviu Baranga2016-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sanitizer issue. The PredicatedScalarEvolution's copy constructor wasn't copying the Generation value, and was leaving it un-initialized. Original commit message: [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection Summary: This change adds no wrap SCEV predicates with: - support for runtime checking - support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)} Note that we are sign extending the increment of the SCEV, even for the zext case. This is needed to cover the fairly common case where y would be a (small) negative integer. In order to do this, this change adds two new flags: nusw and nssw that are applicable to AddRecExprs and permit the transformations above. We also change isStridedPtr in LAA to be able to make use of these predicates. With this feature we should now always be able to work around overflow issues in the dependence analysis. Reviewers: mzolotukhin, sanjoy, anemet Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel Differential Revision: http://reviews.llvm.org/D15412 llvm-svn: 260112
* [SLP] Fix placement of debug statement (NFC)Igor Breger2016-02-081-7/+7
| | | | | | | | By Ayal Zaks (ayal.zaks@intel.com) Differential Revision: http://reviews.llvm.org/D16976 llvm-svn: 260094
* Revert r260086 and r260085. They have broken the memorySilviu Baranga2016-02-081-1/+1
| | | | | | sanitizer bots. llvm-svn: 260087
* [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided ↵Silviu Baranga2016-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | pointer detection Summary: This change adds no wrap SCEV predicates with: - support for runtime checking - support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)} Note that we are sign extending the increment of the SCEV, even for the zext case. This is needed to cover the fairly common case where y would be a (small) negative integer. In order to do this, this change adds two new flags: nusw and nssw that are applicable to AddRecExprs and permit the transformations above. We also change isStridedPtr in LAA to be able to make use of these predicates. With this feature we should now always be able to work around overflow issues in the dependence analysis. Reviewers: mzolotukhin, sanjoy, anemet Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel Differential Revision: http://reviews.llvm.org/D15412 llvm-svn: 260085
* [SCEV] Try to reuse existing value during SCEV expansionWei Mi2016-02-041-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. The original commit triggered regressions in Polly tests. The regressions exposed two problems which have been fixed in current version. 1. Polly will generate a new function based on the old one. To generate an instruction for the new function, it builds SCEV for the old instruction, applies some tranformation on the SCEV generated, then expands the transformed SCEV and insert the expanded value into new function. Because SCEV expansion may reuse value cached in ExprValueMap, the value in old function may be inserted into new function, which is wrong. In SCEVExpander::expand, there is a logic to check the cached value to be used should dominate the insertion point. However, for the above case, the check always passes. That is because the insertion point is in a new function, which is unreachable from the old function. However for unreachable node, DominatorTreeBase::dominates thinks it will be dominated by any other node. The fix is to simply add a check that the cached value to be used in expansion should be in the same function as the insertion point instruction. 2. When the SCEV is of scConstant type, expanding it directly is cheaper than reusing a normal value cached. Although in the cached value set in ExprValueMap, there is a Constant type value, but it is not easy to find it out -- the cached Value set is not sorted according to the potential cost. Existing reuse logic in SCEVExpander::expand simply chooses the first legal element from the cached value set. The fix is that when the SCEV is of scConstant type, don't try the reuse logic. simply expand it. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259736
* Minor code cleanups. NFC.Junmo Park2016-02-031-18/+18
| | | | llvm-svn: 259725
* Revert r259662, which caused regressions on polly tests.Wei Mi2016-02-031-16/+4
| | | | llvm-svn: 259675
* [SCEV] Try to reuse existing value during SCEV expansionWei Mi2016-02-031-4/+16
| | | | | | | | | | | | | | | | Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259662
* [LV] Rename RdxPHIsToFix to PHIsToFix (NFC)Matthew Simpson2016-02-011-39/+32
| | | | | | | | | | In the future, we will vectorize recurrences other than reductions. This patch renames a few variables and updates their associated comments to enable them to be reused for non-reduction PHI nodes. This change was requested in the review for D16197. llvm-svn: 259364
* Reapply commit r258404 with fix.Matthew Simpson2016-02-011-11/+231
| | | | | | | The previous patch caused PR26364. The fix is to ensure that we don't enter a cycle when iterating over use-def chains. llvm-svn: 259357
* [SLP] Fix printing of debug statement (NFC)Matthew Simpson2016-01-291-1/+1
| | | | llvm-svn: 259212
* Revert "Reapply commit r258404 with fix"David Majnemer2016-01-291-226/+11
| | | | | | This reverts commit r258929, it caused PR26364. llvm-svn: 259148
* Reapply commit r258404 with fixMatthew Simpson2016-01-271-11/+226
| | | | | | | | | | | | | | This patch is the second attempt to reapply commit r258404. There was bug in the initial patch and subsequent fix (mentioned below). The initial patch caused an assertion because we were computing smaller type sizes for instructions that cannot be demoted. The fix first determines the instructions that will be demoted, and then applies the smaller type size to only those instructions. This should fix PR26239 and PR26307. llvm-svn: 258929
* [SLPVectorizer] Swap the checking order of isCommutative and isConsecutiveAccessHaicheng Wu2016-01-271-4/+4
| | | | | | NFC llvm-svn: 258909
* Remove autoconf supportChris Bieneman2016-01-261-15/+0
| | | | | | | | | | | | | | | | Summary: This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html "I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened." - Obi Wan Kenobi Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D16471 llvm-svn: 258861
* Revert "Reapply commit r258404 with fix"Matthew Simpson2016-01-261-226/+11
| | | | | | | | This commit exposes a crash in computeKnownBits on the Chromium buildbots. Reverting to investigate. Reference: https://llvm.org/bugs/show_bug.cgi?id=26307 llvm-svn: 258812
* [LIR] Add support for structs and hand unrolled loopsHaicheng Wu2016-01-261-78/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a recommit of r258620 which causes PR26293. The original message: Now LIR can turn following codes into memset: typedef struct foo { int a; int b; } foo_t; void bar(foo_t *f, unsigned n) { for (unsigned i = 0; i < n; ++i) { f[i].a = 0; f[i].b = 0; } } void test(foo_t *f, unsigned n) { for (unsigned i = 0; i < n; i += 2) { f[i] = 0; f[i+1] = 0; } } llvm-svn: 258777
* Reapply commit r25804 with fixMatthew Simpson2016-01-251-11/+226
| | | | | | | | | | | We were hitting an assertion because we were computing smaller type sizes for instructions that cannot be demoted. The fix first determines the instructions that will be demoted, and then applies the smaller type size to only those instructions. This should fix PR26239. llvm-svn: 258705
OpenPOWER on IntegriCloud