summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopStrengthReduce/X86
Commit message (Collapse)AuthorAgeFilesLines
* Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵Fangrui Song2019-12-241-2/+2
| | | | as cleanups after D56351
* [lit] Delete empty lines at the end of lit.local.cfg NFCFangrui Song2019-06-171-1/+0
| | | | llvm-svn: 363538
* [SCEV] Pass NoWrapFlags when expanding an AddExprSam Parker2019-06-141-1/+1
| | | | | | | | | | | | InsertBinop now accepts NoWrapFlags, so pass them through when expanding a simple add expression. This is the first re-commit of the functional changes from rL362687, which was previously reverted. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363364
* Revert "[SCEV] Use wrap flags in InsertBinop"Benjamin Kramer2019-06-062-2/+2
| | | | | | This reverts commit r362687. Miscompiles llvm-profdata during selfhost. llvm-svn: 362699
* [SCEV] Use wrap flags in InsertBinopSam Parker2019-06-062-2/+2
| | | | | | | | | | If the given SCEVExpr has no (un)signed flags attached to it, transfer these to the resulting instruction or use them to find an existing instruction. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 362687
* [X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. ↵Craig Topper2019-05-251-4/+4
| | | | | | | | | | | | | | Support LEA64_32r properly. INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags. This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg. One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input. Differential Revision: https://reviews.llvm.org/D61472 llvm-svn: 361691
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-1725-0/+2622
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-1725-2622/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [X86MacroFusion] Handle branch fusion (AMD CPUs).Clement Courbet2019-03-281-2/+3
| | | | | | | | | | | | | | | | | | Summary: This adds a BranchFusion feature to replace the usage of the MacroFusion for AMD CPUs. See D59688 for context. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59872 llvm-svn: 357171
* [X86MacroFusion][NFC] Add a bulldozer test.Clement Courbet2019-03-271-1/+23
| | | | llvm-svn: 357099
* [LSR] Update test from rL356256 after rebase.Florian Hahn2019-03-151-6/+6
| | | | llvm-svn: 356257
* [LSR] Check for signed overflow in NarrowSearchSpaceByDetectingSupersets.Florian Hahn2019-03-151-0/+38
| | | | | | | | | | | | | | | | | We are adding a sign extended IR value to an int64_t, which can cause signed overflows, as in the attached test case, where we have a formula with BaseOffset = -1 and a constant with numeric_limits<int64_t>::min(). If the addition would overflow, skip the simplification for this formula. Note that the target triple is required to trigger the failure. Reviewers: qcolombet, gilr, kparzysz, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D59211 llvm-svn: 356256
* [LSR] Check SCEV on isZero() after extend. PR40514Max Kazantsev2019-02-051-0/+57
| | | | | | | | | | | | | | | | | | | | | When LSR first adds SCEVs to BaseRegs, it only does it if `isZero()` has returned false. In the end, in invocation of `InsertFormula`, it asserts that all values there are still not zero constants. However between these two points, it makes some transformations, in particular extends them to wider type. SCEV does not give us guarantee that if `S` is not a constant zero, then `sext(S)` is also not a constant zero. It might have missed some optimizing transforms when it was calculating `S` and then made them when it took `sext`. For example, it may happen if previously optimizing transforms were limited by depth or somehow else. This patch adds a bailout when we may end up with a zero SCEV after extension. Differential Revision: https://reviews.llvm.org/D57565 Reviewed By: samparker llvm-svn: 353136
* [SCEV] Prohibit SCEV transformations for huge SCEVsMax Kazantsev2019-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | Currently SCEV attempts to limit transformations so that they do not work with big SCEVs (that may take almost infinite compile time). But for this, it uses heuristics such as recursion depth and number of operands, which do not give us a guarantee that we don't actually have big SCEVs. This situation is still possible, though it is not likely to happen. However, the bug PR33494 showed a bunch of simple corner case tests where we still produce huge SCEVs, even not reaching big recursion depth etc. This patch introduces a concept of 'huge' SCEVs. A SCEV is huge if its expression size (intoduced in D35989) exceeds some threshold value. We prohibit optimizing transformations if any of SCEVs we are dealing with is huge. This gives us a reliable check that we don't spend too much time working with them. As the next step, we can possibly get rid of old limiting mechanisms, such as recursion depth thresholds. Differential Revision: https://reviews.llvm.org/D35990 Reviewed By: reames llvm-svn: 352728
* Generalize MergeBlockIntoPredecessor. Replace uses of ↵Alina Sbirlea2018-06-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MergeBasicBlockIntoOnlyPred. Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor Prior to the patch: 1. MergeBasicBlockIntoOnlyPred Updates either DomTree or DeferredDominance Moves all instructions from Pred to BB, deletes Pred Asserts BB has single predecessor If address was taken, replace the block address with constant 1 (?) 2. MergeBlockIntoPredecessor Updates DomTree, LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken After the patch: Method 2. MergeBlockIntoPredecessor is attempting to become the new default: Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken Uses of MergeBasicBlockIntoOnlyPred that need to be replaced: 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp Updated in this patch. No challenges. 2. lib/CodeGen/CodeGenPrepare.cpp Updated in this patch. i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation. ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks Some interesting aspects: - Since Pred is not deleted (BB is), the entry block does not need updating. - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred. - isMergingEmptyBlockProfitable assumes BB is the one to be deleted. - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead. - adding some test owner as subscribers for the interesting tests modified: test/CodeGen/X86/avx-cmp.ll test/CodeGen/AMDGPU/nested-loop-conditions.ll test/CodeGen/AMDGPU/si-annotate-cf.ll test/CodeGen/X86/hoist-spill.ll test/CodeGen/X86/2006-11-17-IllegalMove.ll 3. lib/Transforms/Scalar/JumpThreading.cpp Not covered in this patch. It is the only use case using the DeferredDominance. I would defer to Brian Rzycki to make this replacement. Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D48202 llvm-svn: 335183
* [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused ↵Sanjay Patel2018-02-052-35/+32
| | | | | | | | | | | | | | | (PR35681) In the motivating case from PR35681 and represented by the macro-fuse-cmp test: https://bugs.llvm.org/show_bug.cgi?id=35681 ...there's a 37 -> 31 byte size win for the loop because we eliminate the big base address offsets. SPEC2017 on Ryzen shows no significant perf difference. Differential Revision: https://reviews.llvm.org/D42607 llvm-svn: 324289
* [LSR] Don't force bases of foldable formulae to the final type.Mikael Holmen2018-02-012-42/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Before emitting code for scaled registers, we prevent SCEVExpander from hoisting any scaled addressing mode by emitting all the bases first. However, these bases are being forced to the final type, resulting in some odd code. For example, if the type of the base is an integer and the final type is a pointer, we will emit an inttoptr for the base, a ptrtoint for the scale, and then a 'reverse' GEP where the GEP pointer is actually the base integer and the index is the pointer. It's more intuitive to use the pointer as a pointer and the integer as index. Patch by: Bevin Hansson Reviewers: atrick, qcolombet, sanjoy Reviewed By: qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42103 llvm-svn: 323946
* Followup on Proposal to move MIR physical register namespace to '$' sigil.Puyan Lotfi2018-01-311-2/+2
| | | | | | | | | | | | Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922
* [LoopStrengthReduce] add test to show potential macro-fusion-based diff ↵Sanjay Patel2018-01-301-0/+126
| | | | | | | | (PR35681); NFC This is the baseline output for the test proposed with D42607. llvm-svn: 323806
* [x86] auto-generate complete checks; NFCSanjay Patel2018-01-263-93/+443
| | | | llvm-svn: 323571
* Revert "[X86] Improvement in CodeGen instruction selection for LEAs."Matt Morehouse2017-12-011-3/+3
| | | | | | This reverts r319543, due to ASan bot breakage. llvm-svn: 319591
* [X86] Improvement in CodeGen instruction selection for LEAs.Jatin Bhateja2017-12-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will now look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. 4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX) which offers better through put. PR32755 will be taken care of by this pathc. Previous patch revisions : r313343 , r314886 Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy, jbhateja Reviewed By: lsaba, RKSimon, jbhateja Subscribers: jmolloy, spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 319543
* Revert r314886 "[X86] Improvement in CodeGen instruction selection for LEAs ↵Hans Wennborg2017-10-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (re-applying post required revision changes.)" It broke the Chromium / SQLite build; see PR34830. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy > > Reviewed By: lsaba > > Subscribers: jmolloy, spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 314919
* [X86] Improvement in CodeGen instruction selection for LEAs (re-applying ↵Jatin Bhateja2017-10-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | post required revision changes.) Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy Reviewed By: lsaba Subscribers: jmolloy, spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 314886
* Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection ↵Hans Wennborg2017-09-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for LEAs." This caused PR34629: asserts firing when building Chromium. It also broke some buildbots building test-suite as reported on the commit thread. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet > > Reviewed By: lsaba > > Subscribers: spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313376
* [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.Jatin Bhateja2017-09-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet Reviewed By: lsaba Subscribers: spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313343
* [LSR] Fix Shadow IV in case of integer overflowMax Kazantsev2017-08-291-0/+94
| | | | | | | | | | | | | | | | | | | | | | When LSR processes code like int accumulator = 0; for (int i = 0; i < N; i++) { accummulator += i; use((double) accummulator); } It may decide to replace integer `accumulator` with a double Shadow IV to get rid of casts. The problem with that is that the `accumulator`'s value may overflow. Starting from this moment, the behavior of integer and double accumulators will differ. This patch strenghtens up the conditions of Shadow IV mechanism applicability. We only allow it for IVs that are proved to be `AddRec`s with `nsw`/`nuw` flag. Differential Revision: https://reviews.llvm.org/D37209 llvm-svn: 311986
* [NFC] Fix indents in testMax Kazantsev2017-08-291-1/+1
| | | | llvm-svn: 311982
* [NFC] Refactor ShadowIV test to use FileCheckMax Kazantsev2017-08-291-27/+45
| | | | | | Also get rid of unnamed values that make the test hard to read. llvm-svn: 311980
* Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720).Evgeny Stupachenko2017-08-076-22/+26
| | | | | | | | | | | | | | | | The root cause of reverting was fixed - PR33514. Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 310289
* [LSR] Narrow search space by filtering non-optimal formulae with the same ↵Wei Mi2017-07-061-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ScaledReg and Scale. When the formulae search space is huge, LSR uses a series of heuristic to keep pruning the search space until the number of possible solutions are within certain limit. The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs, which picks the register which is used by the most LSRUses and deletes the other formulae which don't use the register. This is a effective way to prune the search space, but quite often not a good way to keep the best solution. We saw cases before that the heuristic pruned the best formula candidate out of search space. To relieve the problem, we introduce a new heuristic called NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to reduce the search space while keeping the best formula, we want to keep as many formulae with different Scale and ScaledReg as possible. That is because the central idea of LSR is to choose a group of loop induction variables and use those induction variables to represent LSRUses. An induction variable candidate is often represented by the Scale and ScaledReg in a formula. If we have more formulae with different ScaledReg and Scale to choose, we have better opportunity to find the best solution. That is why we believe pruning search space by only keeping the best formula with the same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use two criteria to choose the best formula with the same Scale and ScaledReg. The first criteria is to select the formula using less non shared registers, and the second criteria is to select the formula with less cost got from RateFormula. The patch implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the last resort. Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help on the two improved internal cases we saw. Differential Revision: https://reviews.llvm.org/D34583 llvm-svn: 307269
* Revert r304824 "Fix PR23384 (part 3 of 3)"Hans Wennborg2017-06-196-26/+22
| | | | | | | | | | | | | | | | | This seems to be interacting badly with ASan somehow, causing false reports of heap-buffer overflows: PR33514. > Summary: > The patch makes instruction count the highest priority for > LSR solution for X86 (previously registers had highest priority). > > Reviewers: qcolombet > > Differential Revision: http://reviews.llvm.org/D30562 > > From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 305720
* [SCEV] Teach SCEVExpander to expand BinPowMax Kazantsev2017-06-191-0/+264
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current implementation of SCEVExpander demonstrates a very naive behavior when it deals with power calculation. For example, a SCEV for x^8 looks like (x * x * x * x * x * x * x * x) If we try to expand it, it generates a very straightforward sequence of muls, like: x2 = mul x, x x3 = mul x2, x x4 = mul x3, x ... x8 = mul x7, x This is a non-efficient way of doing that. A better way is to generate a sequence of binary power calculation. In this case the expanded calculation will look like: x2 = mul x, x x4 = mul x2, x2 x8 = mul x4, x4 In some cases the code size reduction for such SCEVs is dramatic. If we had a loop: x = a; for (int i = 0; i < 3; i++) x = x * x; And this loop have been fully unrolled, we have something like: x = a; x2 = x * x; x4 = x2 * x2; x8 = x4 * x4; The SCEV for x8 is the same as in example above, and if we for some reason want to expand it, we will generate naively 7 multiplications instead of 3. The BinPow expansion algorithm here allows to keep code size reasonable. This patch teaches SCEV Expander to generate a sequence of BinPow multiplications if we have repeating arguments in SCEVMulExpressions. Differential Revision: https://reviews.llvm.org/D34025 llvm-svn: 305663
* Fix PR23384 (part 3 of 3)Evgeny Stupachenko2017-06-066-22/+26
| | | | | | | | | | | | | Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 304824
* [X86] Replace 'REQUIRES: x86' in tests with 'REQUIRES: ↵Craig Topper2017-06-041-1/+1
| | | | | | x86-registered-target' which seems to be the correct way to make them run on an x86 build. llvm-svn: 304682
* Re-enable "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start"Max Kazantsev2017-05-262-7/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch rL303730 was reverted because test lsr-expand-quadratic.ll failed on many non-X86 configs with this patch. The reason of this is that the patch makes a correctless fix that changes optimizer's behavior for this test. Without the change, LSR was making an overconfident simplification basing on a wrong SCEV. Apparently it did not need the IV analysis to do this. With the change, it chose a different way to simplify (that wasn't so confident), and this way required the IV analysis. Now, following the right execution path, LSR tries to make a transformation relying on IV Users analysis. This analysis is target-dependent due to this code: // LSR is not APInt clean, do not touch integers bigger than 64-bits. // Also avoid creating IVs of non-native types. For example, we don't want a // 64-bit IV in 32-bit code just because the loop has one 64-bit cast. uint64_t Width = SE->getTypeSizeInBits(I->getType()); if (Width > 64 || !DL.isLegalInteger(Width)) return false; To make a proper transformation in this test case, the type i32 needs to be legal for the specified data layout. When the test runs on some non-X86 configuration (e.g. pure ARM 64), opt gets confused by the specified target and does not use it, rejecting the specified data layout as well. Instead, it uses some default layout that does not treat i32 as a legal type (currently the layout that is used when it is not specified does not have legal types at all). As result, the transformation we expect to happen does not happen for this test. This re-enabling patch does not have any source code changes compared to the original patch rL303730. The only difference is that the failing test is moved to X86 directory and now has requirement of running on x86 only to comply with the specified target triple and data layout. Differential Revision: https://reviews.llvm.org/D33543 llvm-svn: 303971
* Revert "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start"Diana Picus2017-05-241-5/+7
| | | | | | This reverts commit r303730 because it broke all the buildbots. llvm-svn: 303747
* [SCEV] Do not fold dominated SCEVUnknown into AddRecExpr startMax Kazantsev2017-05-241-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | When folding arguments of AddExpr or MulExpr with recurrences, we rely on the fact that the loop of our base recurrency is the bottom-lost in terms of domination. This assumption may be broken by an expression which is treated as invariant, and which depends on a complex Phi for which SCEVUnknown was created. If such Phi is a loop Phi, and this loop is lower than the chosen AddRecExpr's loop, it is invalid to fold our expression with the recurrence. Another reason why it might be invalid to fold SCEVUnknown into Phi start value is that unlike other SCEVs, SCEVUnknown are sometimes position-bound. For example, here: for (...) { // loop phi = {A,+,B} } X = load ... Folding phi + X into {A+X,+,B}<loop> actually makes no sense, because X does not exist and cannot exist while we are iterating in loop (this memory can be even not allocated and not filled by this moment). It is only valid to make such folding if X is defined before the loop. In this case the recurrence {A+X,+,B}<loop> may be existant. This patch prohibits folding of SCEVUnknown (and those who use them) into the start value of an AddRecExpr, if this instruction is dominated by the loop. Merging the dominating unknown values is still valid. Some tests that relied on the fact that some SCEVUnknown should be folded into AddRec's are changed so that they no longer expect such behavior. llvm-svn: 303730
* [LSR] Call canonicalize after we generate a new Formula in ↵Wei Mi2017-05-181-0/+36
| | | | | | | | | | | | | | GenerateTruncates. Fix PR33077. The testcase in PR33077 generates a LSR Use Formula with two SCEVAddRecExprs for the same loop. Such uncommon formula will become non-canonical after GenerateTruncates adds sign extension to the ScaledReg of the Formula, and it will break the assertion that every Formula to be inserted is canonical. The fix is to call canonicalize for the raw Formula generated by GenerateTruncates before inserting it. llvm-svn: 303361
* Turn on -addr-sink-using-gep by default.Eli Friedman2017-04-061-2/+0
| | | | | | | | | The new codepath has been in the tree for years, and there isn't any reason to use two codepaths here. Differential Revision: https://reviews.llvm.org/D30596 llvm-svn: 299723
* [LSR] Canonicalize formula and put recursive Reg related with current loop ↵Wei Mi2017-02-221-0/+65
| | | | | | | | | | | | | | | | in ScaledReg. After rL294814, LSR formula can have multiple SCEVAddRecExprs inside of its BaseRegs. Previous canonicalization will swap the first SCEVAddRecExpr in BaseRegs with ScaledReg. But now we want to swap the SCEVAddRecExpr Reg related with current loop with ScaledReg. Otherwise, we may generate code like this: RegA + lsr.iv + RegB, where loop invariant parts RegA and RegB are not grouped together and cannot be promoted outside of loop. With this patch, it will ensure lsr.iv to be generated later in the expr: RegA + RegB + lsr.iv, so that RegA + RegB can be promoted outside of loop. Differential Revision: https://reviews.llvm.org/D26781 llvm-svn: 295884
* [LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loopsWei Mi2017-02-161-0/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In rL294814, we allow formula with SCEVAddRecExpr type of Reg from loops other than current loop. This is good for the case when induction variable of outerloop being used in expr in innerloop. But it is very bad to allow such Reg from sibling loop because we may need to add lsr.iv in other sibling loops when scev expanding those SCEVAddRecExpr type exprs. For the testcase below, one loop can be inserted with a bunch of lsr.iv because of LSR for other loops. // The induction variable j from a loop in the middle will have initial // value generated from previous sibling loop and exit value used by its // next sibling loop. void goo(long i, long j); long cond; void foo(long N) { long i = 0; long j = 0; i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); } The fix is to only allow formula with SCEVAddRecExpr type of Reg from current loop or its parents. Differential Revision: https://reviews.llvm.org/D30021 llvm-svn: 295378
* The patch fixes r294821Evgeny Stupachenko2017-02-112-6/+6
| | | | | | | | Summary: Update register match for windows testing From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 294825
* Fix PR23384 (under "-lsr-insns-cost" option)Evgeny Stupachenko2017-02-112-0/+110
| | | | | | | | | | | | | Summary: The patch adds instructions number generated by a solution to LSR cost under "-lsr-insns-cost" option. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D28307 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 294821
* [LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with ↵Wei Mi2017-02-111-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | outerloop. The recommit includes some changes of testcases. No functional change to the patch. In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 294814
* This test apparently requires an x86 target and is failing on numerousChandler Carruth2017-01-231-0/+48
| | | | | | | | | | | bots ever since d0k fixed the CHECK lines so that it did something at all. It isn't actually testing SCEV directly but LSR, so move it into LSR and the x86-specific tree of tests that already exists there. Target dependence is common and unavoidable with the current design of LSR. llvm-svn: 292774
* Fix testcases failing after r284036Krzysztof Parzyszek2016-10-121-3/+1
| | | | | | The codegen has changed slightly between my tests and the commit. llvm-svn: 284049
* Do not remove implicit defs in BranchFolderKrzysztof Parzyszek2016-10-121-0/+1
| | | | | | | | | | | Branch folder removes implicit defs if they are the only non-branching instructions in a block, and the branches do not use the defined registers. The problem is that in some cases these implicit defs are required for the liveness information to be correct. Differential Revision: https://reviews.llvm.org/D25478 llvm-svn: 284036
* [SCEV] Update interface to handle SCEVExpander insert point motion.Geoff Berry2016-08-111-0/+47
| | | | | | | | | | | | | | | | | | | | Summary: This is an extension of the fix in r271424. That fix dealt with builder insert points being moved by SCEV expansion, but only for the lifetime of the expand call. This change modifies the interface so that LSR can safely call expand multiple times at the same insert point and do the right thing if one of the expansions decides to move the original insert point. This is a fix for PR28719. Reviewers: sanjoy Subscribers: llvm-commits, mcrosier, mzolotukhin Differential Revision: https://reviews.llvm.org/D23342 llvm-svn: 278413
* Don't delete empty preheaders in CodeGenPrepare if it would create a ↵Chuang-Yu Cheng2016-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | critical edge Presently, CodeGenPrepare deletes all nearly empty (only phi and branch) basic blocks. This pass can delete loop preheaders which frequently creates critical edges. A preheader can be a convenient place to spill registers to the stack. If the entrance to a loop body is a critical edge, then spills may occur in the loop body rather than immediately before it. This patch protects loop preheaders from deletion in CodeGenPrepare even if they are nearly empty. Since the patch alters the CFG, it affects a large number of test cases. In most cases, the changes are merely cosmetic (basic blocks have different names or instruction orders change slightly). I am somewhat concerned about the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not deleted, then the MIPS backend does not take advantage of a branch delay slot. Consequently, I would like some close review by a MIPS expert. The patch also partially subsumes D16893 from George Burgess IV. George correctly notes that CodeGenPrepare does not actually preserve the dominator tree. I think the dominator tree was usually not valid when CodeGenPrepare ran, but I am using LoopInfo to mark preheaders, so the dominator tree is now always valid before CodeGenPrepare. Author: Tom Jablin (tjablin) Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng http://reviews.llvm.org/D16984 llvm-svn: 265397
OpenPOWER on IntegriCloud