summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopStrengthReduce
Commit message (Collapse)AuthorAgeFilesLines
...
* LSR: Check more intrinsic pointer operandsMatt Arsenault2017-12-111-1/+81
| | | | llvm-svn: 320424
* Revert "[X86] Improvement in CodeGen instruction selection for LEAs."Matt Morehouse2017-12-011-3/+3
| | | | | | This reverts r319543, due to ASan bot breakage. llvm-svn: 319591
* [X86] Improvement in CodeGen instruction selection for LEAs.Jatin Bhateja2017-12-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will now look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. 4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX) which offers better through put. PR32755 will be taken care of by this pathc. Previous patch revisions : r313343 , r314886 Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy, jbhateja Reviewed By: lsaba, RKSimon, jbhateja Subscribers: jmolloy, spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 319543
* Revert r314886 "[X86] Improvement in CodeGen instruction selection for LEAs ↵Hans Wennborg2017-10-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (re-applying post required revision changes.)" It broke the Chromium / SQLite build; see PR34830. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy > > Reviewed By: lsaba > > Subscribers: jmolloy, spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 314919
* [X86] Improvement in CodeGen instruction selection for LEAs (re-applying ↵Jatin Bhateja2017-10-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | post required revision changes.) Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy Reviewed By: lsaba Subscribers: jmolloy, spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 314886
* Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection ↵Hans Wennborg2017-09-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for LEAs." This caused PR34629: asserts firing when building Chromium. It also broke some buildbots building test-suite as reported on the commit thread. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet > > Reviewed By: lsaba > > Subscribers: spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313376
* [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.Jatin Bhateja2017-09-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet Reviewed By: lsaba Subscribers: spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313343
* [LSR] Fix Shadow IV in case of integer overflowMax Kazantsev2017-08-291-0/+94
| | | | | | | | | | | | | | | | | | | | | | When LSR processes code like int accumulator = 0; for (int i = 0; i < N; i++) { accummulator += i; use((double) accummulator); } It may decide to replace integer `accumulator` with a double Shadow IV to get rid of casts. The problem with that is that the `accumulator`'s value may overflow. Starting from this moment, the behavior of integer and double accumulators will differ. This patch strenghtens up the conditions of Shadow IV mechanism applicability. We only allow it for IVs that are proved to be `AddRec`s with `nsw`/`nuw` flag. Differential Revision: https://reviews.llvm.org/D37209 llvm-svn: 311986
* [NFC] Fix indents in testMax Kazantsev2017-08-291-1/+1
| | | | llvm-svn: 311982
* [NFC] Refactor ShadowIV test to use FileCheckMax Kazantsev2017-08-291-27/+45
| | | | | | Also get rid of unnamed values that make the test hard to read. llvm-svn: 311980
* [ARM, Thumb1] Prevent ARMTargetLowering::isLegalAddressingMode from ↵Evgeny Astigeevich2017-08-241-0/+122
| | | | | | | | | | | | | | accepting illegal modes ARMTargetLowering::isLegalAddressingMode can accept illegal addressing modes for the Thumb1 target. This causes generation of redundant code and affects performance. This fixes PR34106: https://bugs.llvm.org/show_bug.cgi?id=34106 Differential Revision: https://reviews.llvm.org/D36467 llvm-svn: 311649
* Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720).Evgeny Stupachenko2017-08-076-22/+26
| | | | | | | | | | | | | | | | The root cause of reverting was fixed - PR33514. Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 310289
* [SCEV] Preserve NSW information for sext(subtract).Amara Emerson2017-08-041-2/+1
| | | | | | | | | | Pushes the sext onto the operands of a Sub if NSW is present. Also adds support for propagating the nowrap flags of the llvm.ssub.with.overflow intrinsic during analysis. Differential Revision: https://reviews.llvm.org/D35256 llvm-svn: 310117
* Fix PR33514Evgeny Stupachenko2017-08-041-1/+2
| | | | | | | | | | | | | | Summary: The bug was uncovered after fix of PR23384 (part 3 of 3). The patch restricts pointer multiplication in SCEV computaion for ICmpZero. Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D36170 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 310092
* Remove the obsolete offset parameter from @llvm.dbg.valueAdrian Prantl2017-07-281-2/+2
| | | | | | | | | | | | There is no situation where this rarely-used argument cannot be substituted with a DIExpression and removing it allows us to simplify the DWARF backend. Note that this patch does not yet remove any of the newly dead code. rdar://problem/33580047 Differential Revision: https://reviews.llvm.org/D35951 llvm-svn: 309426
* [LSR] Narrow search space by filtering non-optimal formulae with the same ↵Wei Mi2017-07-062-2/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ScaledReg and Scale. When the formulae search space is huge, LSR uses a series of heuristic to keep pruning the search space until the number of possible solutions are within certain limit. The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs, which picks the register which is used by the most LSRUses and deletes the other formulae which don't use the register. This is a effective way to prune the search space, but quite often not a good way to keep the best solution. We saw cases before that the heuristic pruned the best formula candidate out of search space. To relieve the problem, we introduce a new heuristic called NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to reduce the search space while keeping the best formula, we want to keep as many formulae with different Scale and ScaledReg as possible. That is because the central idea of LSR is to choose a group of loop induction variables and use those induction variables to represent LSRUses. An induction variable candidate is often represented by the Scale and ScaledReg in a formula. If we have more formulae with different ScaledReg and Scale to choose, we have better opportunity to find the best solution. That is why we believe pruning search space by only keeping the best formula with the same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use two criteria to choose the best formula with the same Scale and ScaledReg. The first criteria is to select the formula using less non shared registers, and the second criteria is to select the formula with less cost got from RateFormula. The patch implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the last resort. Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help on the two improved internal cases we saw. Differential Revision: https://reviews.llvm.org/D34583 llvm-svn: 307269
* Revert r304824 "Fix PR23384 (part 3 of 3)"Hans Wennborg2017-06-196-26/+22
| | | | | | | | | | | | | | | | | This seems to be interacting badly with ASan somehow, causing false reports of heap-buffer overflows: PR33514. > Summary: > The patch makes instruction count the highest priority for > LSR solution for X86 (previously registers had highest priority). > > Reviewers: qcolombet > > Differential Revision: http://reviews.llvm.org/D30562 > > From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 305720
* [SCEV] Teach SCEVExpander to expand BinPowMax Kazantsev2017-06-191-0/+264
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current implementation of SCEVExpander demonstrates a very naive behavior when it deals with power calculation. For example, a SCEV for x^8 looks like (x * x * x * x * x * x * x * x) If we try to expand it, it generates a very straightforward sequence of muls, like: x2 = mul x, x x3 = mul x2, x x4 = mul x3, x ... x8 = mul x7, x This is a non-efficient way of doing that. A better way is to generate a sequence of binary power calculation. In this case the expanded calculation will look like: x2 = mul x, x x4 = mul x2, x2 x8 = mul x4, x4 In some cases the code size reduction for such SCEVs is dramatic. If we had a loop: x = a; for (int i = 0; i < 3; i++) x = x * x; And this loop have been fully unrolled, we have something like: x = a; x2 = x * x; x4 = x2 * x2; x8 = x4 * x4; The SCEV for x8 is the same as in example above, and if we for some reason want to expand it, we will generate naively 7 multiplications instead of 3. The BinPow expansion algorithm here allows to keep code size reasonable. This patch teaches SCEV Expander to generate a sequence of BinPow multiplications if we have repeating arguments in SCEVMulExpressions. Differential Revision: https://reviews.llvm.org/D34025 llvm-svn: 305663
* Fix PR23384 (part 3 of 3)Evgeny Stupachenko2017-06-066-22/+26
| | | | | | | | | | | | | Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 304824
* [X86] Replace 'REQUIRES: x86' in tests with 'REQUIRES: ↵Craig Topper2017-06-041-1/+1
| | | | | | x86-registered-target' which seems to be the correct way to make them run on an x86 build. llvm-svn: 304682
* [SCEVExpander] Try harder to avoid introducing inttoptrKeno Fischer2017-05-271-0/+45
| | | | | | | | | | | | | | | | Summary: This fixes introduction of an incorrect inttoptr/ptrtoint pair in the included test case which makes use of non-integral pointers. I suspect there are more cases like this left, but this takes care of the one I was seeing at the moment. Reviewers: sanjoy Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D33129 llvm-svn: 304058
* Re-enable "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start"Max Kazantsev2017-05-263-14/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch rL303730 was reverted because test lsr-expand-quadratic.ll failed on many non-X86 configs with this patch. The reason of this is that the patch makes a correctless fix that changes optimizer's behavior for this test. Without the change, LSR was making an overconfident simplification basing on a wrong SCEV. Apparently it did not need the IV analysis to do this. With the change, it chose a different way to simplify (that wasn't so confident), and this way required the IV analysis. Now, following the right execution path, LSR tries to make a transformation relying on IV Users analysis. This analysis is target-dependent due to this code: // LSR is not APInt clean, do not touch integers bigger than 64-bits. // Also avoid creating IVs of non-native types. For example, we don't want a // 64-bit IV in 32-bit code just because the loop has one 64-bit cast. uint64_t Width = SE->getTypeSizeInBits(I->getType()); if (Width > 64 || !DL.isLegalInteger(Width)) return false; To make a proper transformation in this test case, the type i32 needs to be legal for the specified data layout. When the test runs on some non-X86 configuration (e.g. pure ARM 64), opt gets confused by the specified target and does not use it, rejecting the specified data layout as well. Instead, it uses some default layout that does not treat i32 as a legal type (currently the layout that is used when it is not specified does not have legal types at all). As result, the transformation we expect to happen does not happen for this test. This re-enabling patch does not have any source code changes compared to the original patch rL303730. The only difference is that the failing test is moved to X86 directory and now has requirement of running on x86 only to comply with the specified target triple and data layout. Differential Revision: https://reviews.llvm.org/D33543 llvm-svn: 303971
* Revert "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start"Diana Picus2017-05-243-19/+14
| | | | | | This reverts commit r303730 because it broke all the buildbots. llvm-svn: 303747
* [SCEV] Do not fold dominated SCEVUnknown into AddRecExpr startMax Kazantsev2017-05-243-14/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | When folding arguments of AddExpr or MulExpr with recurrences, we rely on the fact that the loop of our base recurrency is the bottom-lost in terms of domination. This assumption may be broken by an expression which is treated as invariant, and which depends on a complex Phi for which SCEVUnknown was created. If such Phi is a loop Phi, and this loop is lower than the chosen AddRecExpr's loop, it is invalid to fold our expression with the recurrence. Another reason why it might be invalid to fold SCEVUnknown into Phi start value is that unlike other SCEVs, SCEVUnknown are sometimes position-bound. For example, here: for (...) { // loop phi = {A,+,B} } X = load ... Folding phi + X into {A+X,+,B}<loop> actually makes no sense, because X does not exist and cannot exist while we are iterating in loop (this memory can be even not allocated and not filled by this moment). It is only valid to make such folding if X is defined before the loop. In this case the recurrence {A+X,+,B}<loop> may be existant. This patch prohibits folding of SCEVUnknown (and those who use them) into the start value of an AddRecExpr, if this instruction is dominated by the loop. Merging the dominating unknown values is still valid. Some tests that relied on the fact that some SCEVUnknown should be folded into AddRec's are changed so that they no longer expect such behavior. llvm-svn: 303730
* [LSR] Call canonicalize after we generate a new Formula in ↵Wei Mi2017-05-181-0/+36
| | | | | | | | | | | | | | GenerateTruncates. Fix PR33077. The testcase in PR33077 generates a LSR Use Formula with two SCEVAddRecExprs for the same loop. Such uncommon formula will become non-canonical after GenerateTruncates adds sign extension to the ScaledReg of the Formula, and it will break the assertion that every Formula to be inserted is canonical. The fix is to call canonicalize for the raw Formula generated by GenerateTruncates before inserting it. llvm-svn: 303361
* ARM: handle post-indexed NEON ops where the offset isn't the access width.Tim Northover2017-04-201-5/+3
| | | | | | | | | | | Before, we assumed that any ConstantInt offset was precisely the access width, so we could use the "[rN]!" form. ISelLowering only ever created that kind, but further simplification during combining could lead to unexpected constants and incorrect codegen. Should fix PR32658. llvm-svn: 300878
* Turn on -addr-sink-using-gep by default.Eli Friedman2017-04-062-3/+0
| | | | | | | | | The new codepath has been in the tree for years, and there isn't any reason to use two codepaths here. Differential Revision: https://reviews.llvm.org/D30596 llvm-svn: 299723
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-215-12/+12
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* Set option enabling LSR alternative way to resolve complex solution to false.Evgeny Stupachenko2017-03-042-2/+2
| | | | | | | Differential Revision: http://reviews.llvm.org/D29862 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296959
* [LSR] Canonicalize formula and put recursive Reg related with current loop ↵Wei Mi2017-02-221-0/+65
| | | | | | | | | | | | | | | | in ScaledReg. After rL294814, LSR formula can have multiple SCEVAddRecExprs inside of its BaseRegs. Previous canonicalization will swap the first SCEVAddRecExpr in BaseRegs with ScaledReg. But now we want to swap the SCEVAddRecExpr Reg related with current loop with ScaledReg. Otherwise, we may generate code like this: RegA + lsr.iv + RegB, where loop invariant parts RegA and RegB are not grouped together and cannot be promoted outside of loop. With this patch, it will ensure lsr.iv to be generated later in the expr: RegA + RegB + lsr.iv, so that RegA + RegB can be promoted outside of loop. Differential Revision: https://reviews.llvm.org/D26781 llvm-svn: 295884
* The patch introduces new way of narrowing complex (>UINT16 variants) solutions.Evgeny Stupachenko2017-02-212-2/+2
| | | | | | | | | | | | | | | | | | | The new method introduced under "-lsr-exp-narrow" option (currenlty set to true). Summary: The method is based on registers number mathematical expectation and should be generally closer to optimal solution. Please see details in comments to "LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function (in lib/Transforms/Scalar/LoopStrengthReduce.cpp). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D29862 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 295704
* [LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loopsWei Mi2017-02-161-0/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In rL294814, we allow formula with SCEVAddRecExpr type of Reg from loops other than current loop. This is good for the case when induction variable of outerloop being used in expr in innerloop. But it is very bad to allow such Reg from sibling loop because we may need to add lsr.iv in other sibling loops when scev expanding those SCEVAddRecExpr type exprs. For the testcase below, one loop can be inserted with a bunch of lsr.iv because of LSR for other loops. // The induction variable j from a loop in the middle will have initial // value generated from previous sibling loop and exit value used by its // next sibling loop. void goo(long i, long j); long cond; void foo(long N) { long i = 0; long j = 0; i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); } The fix is to only allow formula with SCEVAddRecExpr type of Reg from current loop or its parents. Differential Revision: https://reviews.llvm.org/D30021 llvm-svn: 295378
* [LSR] Pointers with different address spaces are considered incompatible.Mikael Holmen2017-02-141-0/+31
| | | | | | | | | | | | | | | | | | | | | | Summary: Function isCompatibleIVType is already used as a guard before the call to SE.getMinusSCEV(OperExpr, PrevExpr); in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions to be of the same type, so we now consider two pointers with different address spaces to be incompatible, since it is possible that the pointers in fact have different sizes. Reviewers: qcolombet, eli.friedman Reviewed By: qcolombet Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D29885 llvm-svn: 295033
* The patch fixes r294821Evgeny Stupachenko2017-02-112-6/+6
| | | | | | | | Summary: Update register match for windows testing From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 294825
* Fix PR23384 (under "-lsr-insns-cost" option)Evgeny Stupachenko2017-02-112-0/+110
| | | | | | | | | | | | | Summary: The patch adds instructions number generated by a solution to LSR cost under "-lsr-insns-cost" option. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D28307 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 294821
* [LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with ↵Wei Mi2017-02-111-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | outerloop. The recommit includes some changes of testcases. No functional change to the patch. In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 294814
* LSR: Check atomic instruction pointer operandsMatt Arsenault2017-02-081-0/+87
| | | | llvm-svn: 294410
* LSR: Don't drop address space when type doesn't matchMatt Arsenault2017-01-301-0/+54
| | | | | | | | | | For targets with different addressing modes in each address space, if this is dropped querying isLegalAddressingMode later with this will give a nonsense result, breaking the isLegalUse assertions. This is a candidate for the 4.0 release branch. llvm-svn: 293542
* This test apparently requires an x86 target and is failing on numerousChandler Carruth2017-01-231-0/+48
| | | | | | | | | | | bots ever since d0k fixed the CHECK lines so that it did something at all. It isn't actually testing SCEV directly but LSR, so move it into LSR and the x86-specific tree of tests that already exists there. Target dependence is common and unavoidable with the current design of LSR. llvm-svn: 292774
* [PM] Clean up the testing for IVUsers, especially with the new PM.Chandler Carruth2017-01-151-54/+0
| | | | | | | | | | First, I've moved a test of IVUsers from the LSR tree to a dedicated IVUsers test directory. I've also simplified its RUN line now that the new pass manager's loop PM is providing analyses on their own. No functionality changed, but it makes subsequent changes cleaner. llvm-svn: 292060
* [LoopStrengthReduce] Don't bother rewriting PHIs in catchswitch blocksDavid Majnemer2017-01-131-0/+58
| | | | | | | | | The catchswitch instruction cannot be split, don't bother trying to rewrite it. This fixes PR31627. llvm-svn: 291966
* Revert r286999 which caused buildbot test failures. Some testcases need to ↵Wei Mi2016-11-151-65/+0
| | | | | | be made target specific. llvm-svn: 287014
* [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.Wei Mi2016-11-151-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 286999
* [ARM] Loop Strength Reduction crashes when targeting ARM or Thumb.Alexandros Lamprineas2016-11-091-0/+35
| | | | | | | | | | | Scalar Evolution asserts when not all the operands of an Add Recurrence Expression are loop invariants. Loop Strength Reduction should only create affine Add Recurrences, so that both the start and the step of the expression are loop invariants. Differential Revision: https://reviews.llvm.org/D26185 llvm-svn: 286347
* Fix testcases failing after r284036Krzysztof Parzyszek2016-10-121-3/+1
| | | | | | The codegen has changed slightly between my tests and the commit. llvm-svn: 284049
* Do not remove implicit defs in BranchFolderKrzysztof Parzyszek2016-10-121-0/+1
| | | | | | | | | | | Branch folder removes implicit defs if they are the only non-branching instructions in a block, and the branches do not use the defined registers. The problem is that in some cases these implicit defs are required for the liveness information to be correct. Differential Revision: https://reviews.llvm.org/D25478 llvm-svn: 284036
* [LSR] Don't try and create post-inc expressions on non-rotated loopsJames Molloy2016-08-151-0/+43
| | | | | | | | | | | | | | | If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs! Motivating testcase: void f(float *a, float *b, float *c, int n) { while (n-- > 0) *c++ = *a++ + *b++; } It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage. llvm-svn: 278658
* [SCEV] Update interface to handle SCEVExpander insert point motion.Geoff Berry2016-08-111-0/+47
| | | | | | | | | | | | | | | | | | | | Summary: This is an extension of the fix in r271424. That fix dealt with builder insert points being moved by SCEV expansion, but only for the lifetime of the expand call. This change modifies the interface so that LSR can safely call expand multiple times at the same insert point and do the right thing if one of the expansions decides to move the original insert point. This is a fix for PR28719. Reviewers: sanjoy Subscribers: llvm-commits, mcrosier, mzolotukhin Differential Revision: https://reviews.llvm.org/D23342 llvm-svn: 278413
* [PM] Significantly refactor the pass pipeline parsing to be easier toChandler Carruth2016-08-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reason about and less error prone. The core idea is to fully parse the text without trying to identify passes or structure. This is done with a single state machine. There were various bugs in the logic around this previously that were repeated and scattered across the code. Having a single routine makes it much easier to fix and get correct. For example, this routine doesn't suffer from PR28577. Then the actual pass construction is handled using *much* easier to read code and simple loops, with particular pass manager construction sunk to live with other pass construction. This is especially nice as the pass managers *are* in fact passes. Finally, the "implicit" pass manager synthesis is done much more simply by forming "pre-parsed" structures rather than having to duplicate tons of logic. One of the bugs fixed by this was evident in the tests where we accepted a pipeline that wasn't really well formed. Another bug is PR28577 for which I have added a test case. The code is less efficient than the previous code but I'm really hoping that's not a priority. ;] Thanks to Sean for the review! Differential Revision: https://reviews.llvm.org/D22724 llvm-svn: 277561
* [PM] Convert Loop Strength Reduce pass to new PMDehao Chen2016-07-181-0/+1
| | | | | | | | | | | | Summary: Convert Loop String Reduce pass to new PM Reviewers: davidxl, silvas Subscribers: junbuml, sanjoy, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D22468 llvm-svn: 275919
OpenPOWER on IntegriCloud