summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopStrengthReduce
Commit message (Collapse)AuthorAgeFilesLines
* Migrate function attribute "no-frame-pointer-elim"="false" to ↵Fangrui Song2019-12-241-1/+1
| | | | "frame-pointer"="none" as cleanups after D56351
* Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵Fangrui Song2019-12-243-5/+5
| | | | as cleanups after D56351
* [NFC][LSR] Avoid undefined grep in pr2570.llHubert Tong2019-06-191-1/+1
| | | | | | | | | greater-than-sign is not a BRE special character. POSIX.1-2017 XBD Section 9.3.2 indicates that the interpretation of `\>` is undefined. This patch replaces the pattern. llvm-svn: 363828
* [SCEV] Use NoWrapFlags when expanding a simple mulSam Parker2019-06-172-3/+3
| | | | | | | | | | Second functional change following on from rL362687. Pass the NoWrapFlags from the MulExpr to InsertBinop when we're generating a shl or mul. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363540
* [lit] Delete empty lines at the end of lit.local.cfg NFCFangrui Song2019-06-173-3/+0
| | | | llvm-svn: 363538
* [SCEV] Pass NoWrapFlags when expanding an AddExprSam Parker2019-06-141-1/+1
| | | | | | | | | | | | InsertBinop now accepts NoWrapFlags, so pass them through when expanding a simple add expression. This is the first re-commit of the functional changes from rL362687, which was previously reverted. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363364
* [NFC] Updated testcase for D54411/rL363284David Bolvansky2019-06-131-14/+8
| | | | llvm-svn: 363285
* Revert "[SCEV] Use wrap flags in InsertBinop"Benjamin Kramer2019-06-064-5/+5
| | | | | | This reverts commit r362687. Miscompiles llvm-profdata during selfhost. llvm-svn: 362699
* [SCEV] Use wrap flags in InsertBinopSam Parker2019-06-064-5/+5
| | | | | | | | | | If the given SCEVExpr has no (un)signed flags attached to it, transfer these to the resulting instruction or use them to find an existing instruction. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 362687
* [X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. ↵Craig Topper2019-05-251-4/+4
| | | | | | | | | | | | | | Support LEA64_32r properly. INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags. This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg. One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input. Differential Revision: https://reviews.llvm.org/D61472 llvm-svn: 361691
* [SCEV] Add explicit representations of umin/sminKeno Fischer2019-05-071-2/+0
| | | | | | | | | | | | | | | | | | Summary: Currently we express umin as `~umax(~x, ~y)`. However, this becomes a problem for operands in non-integral pointer spaces, because `~x` is not something we can compute for `x` non-integral. However, since comparisons are generally still allowed, we are actually able to express `umin(x, y)` directly as long as we don't try to express is as a umax. Support this by adding an explicit umin/smin representation to SCEV. We do this by factoring the existing getUMax/getSMax functions into a new function that does all four. The previous two functions were largely identical. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D50167 llvm-svn: 360159
* [LSR] Limit the recursion for setup costDavid Green2019-04-232-1/+94
| | | | | | | | | | | | | | In some circumstances we can end up with setup costs that are very complex to compute, even though the scevs are not very complex to create. This can also lead to setupcosts that are calculated to be exactly -1, which LSR treats as an invalid cost. This patch puts a limit on the recursion depth for setup cost to prevent them taking too long. Thanks to @reames for the report and test case. Differential Revision: https://reviews.llvm.org/D60944 llvm-svn: 358958
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-17125-0/+10606
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-17125-10606/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [LSR] Rewrite misses some fixup locations if it splits critical edgeQuentin Colombet2019-04-151-0/+101
| | | | | | | | | | | | | | | If LSR split critical edge during rewriting phi operands and phi node has other pending fixup operands, we need to update those pending fixups. Otherwise formulae will not be implemented completely and some instructions will not be eliminated. llvm.org/PR41445 Differential Revision: https://reviews.llvm.org/D60645 Patch by: Denis Bakhvalov <denis.bakhvalov@intel.com> llvm-svn: 358457
* [LSR] Fix signed overflow in GenerateCrossUseConstantOffsets.Florian Hahn2019-03-281-0/+39
| | | | | | | | | | | | | | | For the attached test case, unchecked addition of immediate starts and ends overflows, as they can be arbitrary i64 constants. Proof: https://rise4fun.com/Alive/Plqc Reviewers: qcolombet, gilr, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D59218 llvm-svn: 357217
* [X86MacroFusion] Handle branch fusion (AMD CPUs).Clement Courbet2019-03-281-2/+3
| | | | | | | | | | | | | | | | | | Summary: This adds a BranchFusion feature to replace the usage of the MacroFusion for AMD CPUs. See D59688 for context. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59872 llvm-svn: 357171
* [X86MacroFusion][NFC] Add a bulldozer test.Clement Courbet2019-03-271-1/+23
| | | | llvm-svn: 357099
* [LSR] Update test from rL356256 after rebase.Florian Hahn2019-03-151-6/+6
| | | | llvm-svn: 356257
* [LSR] Check for signed overflow in NarrowSearchSpaceByDetectingSupersets.Florian Hahn2019-03-151-0/+38
| | | | | | | | | | | | | | | | | We are adding a sign extended IR value to an int64_t, which can cause signed overflows, as in the attached test case, where we have a formula with BaseOffset = -1 and a constant with numeric_limits<int64_t>::min(). If the addition would overflow, skip the simplification for this formula. Note that the target triple is required to trigger the failure. Reviewers: qcolombet, gilr, kparzysz, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D59211 llvm-svn: 356256
* [ARM] Run ARMParallelDSP in the IRPasses phaseSam Parker2019-03-141-1/+0
| | | | | | | | | Run EarlyCSE before ParallelDSP and do this in the backend IR opt phase. Differential Revision: https://reviews.llvm.org/D59257 llvm-svn: 356130
* [LSR] Attempt to increase the accuracy of LSR's setup costDavid Green2019-03-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | In some loops, we end up generating loop induction variables that look like: {(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1} As opposed to the simpler: {(zext i16 (%i0 * %i1) to i32),+,-1} i.e we count up from -limit to 0, not the simpler counting down from limit to 0. This is because the scores, as LSR calculates them, are the same and the second is filtered in place of the first. We end up with a redundant SUB from 0 in the code. This patch tries to make the calculation of the setup cost a little more thoroughly, recursing into the scev members to better approximate the setup required. The cost function for comparing LSR costs is: return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds, C1.ScaleCost, C1.ImmCost, C1.SetupCost) < std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds, C2.ScaleCost, C2.ImmCost, C2.SetupCost); So this will only alter results if none of the other variables turn out to be different. Differential Revision: https://reviews.llvm.org/D58770 llvm-svn: 355597
* Fix invalid target triples in tests. (NFC)Florian Hahn2019-03-042-2/+2
| | | | llvm-svn: 355349
* [LSR] Generate cross iteration indexesSam Parker2019-02-071-15/+9
| | | | | | | | | | | | | | Modify GenerateConstantOffsetsImpl to create offsets that can be used by indexed addressing modes. If formulae can be generated which result in the constant offset being the same size as the recurrence, we can generate a pre-indexed access. This allows the pointer to be updated via the single pre-indexed access so that (hopefully) no add/subs are required to update it for the next iteration. For small cores, this can significantly improve performance DSP-like loops. Differential Revision: https://reviews.llvm.org/D55373 llvm-svn: 353403
* [LSR] Check SCEV on isZero() after extend. PR40514Max Kazantsev2019-02-051-0/+57
| | | | | | | | | | | | | | | | | | | | | When LSR first adds SCEVs to BaseRegs, it only does it if `isZero()` has returned false. In the end, in invocation of `InsertFormula`, it asserts that all values there are still not zero constants. However between these two points, it makes some transformations, in particular extends them to wider type. SCEV does not give us guarantee that if `S` is not a constant zero, then `sext(S)` is also not a constant zero. It might have missed some optimizing transforms when it was calculating `S` and then made them when it took `sext`. For example, it may happen if previously optimizing transforms were limited by depth or somehow else. This patch adds a bailout when we may end up with a zero SCEV after extension. Differential Revision: https://reviews.llvm.org/D57565 Reviewed By: samparker llvm-svn: 353136
* [SCEV] Prohibit SCEV transformations for huge SCEVsMax Kazantsev2019-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | Currently SCEV attempts to limit transformations so that they do not work with big SCEVs (that may take almost infinite compile time). But for this, it uses heuristics such as recursion depth and number of operands, which do not give us a guarantee that we don't actually have big SCEVs. This situation is still possible, though it is not likely to happen. However, the bug PR33494 showed a bunch of simple corner case tests where we still produce huge SCEVs, even not reaching big recursion depth etc. This patch introduces a concept of 'huge' SCEVs. A SCEV is huge if its expression size (intoduced in D35989) exceeds some threshold value. We prohibit optimizing transformations if any of SCEVs we are dealing with is huge. This gives us a reliable check that we don't spend too much time working with them. As the next step, we can possibly get rid of old limiting mechanisms, such as recursion depth thresholds. Differential Revision: https://reviews.llvm.org/D35990 Reviewed By: reames llvm-svn: 352728
* [LoopStrengthReduce] ComplexityLimit as an optionSam Parker2018-11-292-0/+120
| | | | | | | | Convert ComplexityLimit into a command line value. Differential Revision: https://reviews.llvm.org/D54899 llvm-svn: 347843
* [LSR] Combine unfolded offset into invariant registerGil Rapaport2018-11-082-55/+75
| | | | | | | | | | | | | | | LSR reassociates constants as unfolded offsets when the constants fit as immediate add operands, which currently prevents such constants from being combined later with loop invariant registers. This patch modifies GenerateCombinations() to generate a second formula which includes the unfolded offset in the combined loop-invariant register. This commit fixes a bug in the original patch (committed at r345114, reverted at r345123). Differential Revision: https://reviews.llvm.org/D51861 llvm-svn: 346390
* Revert r345114Gil Rapaport2018-10-241-20/+55
| | | | | | Investigating fails. llvm-svn: 345123
* [LSR] Combine unfolded offset into invariant registerGil Rapaport2018-10-241-55/+20
| | | | | | | | | | | | LSR reassociates constants as unfolded offsets when the constants fit as immediate add operands, which currently prevents such constants from being combined later with loop invariant registers. This patch modifies GenerateCombinations() to generate a second formula which includes the unfolded offset in the combined loop-invariant register. Differential Revision: https://reviews.llvm.org/D51861 llvm-svn: 345114
* AMDGPU: Fix some outdated datalayouts in testsMatt Arsenault2018-09-134-4/+4
| | | | llvm-svn: 342131
* [LSR] Add tests for small constants; NFCGil Rapaport2018-09-101-0/+151
| | | | | | | | | LSR reassociates small constants that fit into add immediate operands as unfolded offset. Since unfolded offset is not combined with loop-invariant registers, LSR does not consider solutions that bump invariant registers by these constants outside the loop. llvm-svn: 341835
* SCEVExpander::expandAddRecExprLiterally(): check before casting as InstructionRoman Lebedev2018-06-291-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: An alternative to D48597. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]]. The problem is as follows: 1. `indvars` marks `%dec` as `NUW`. 2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908) 3. `loop-reduce` tries to do some further modification, but crashes with an type assertion in cast, because `%dec` is no longer an `Instruction`, If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`, store that into a file, and then run `-loop-reduce`, there is no crash. So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV. But in this case we can just not crash if it's not an `Instruction`. This is just a local fix, unlike D48597, so there may very well be other problems. Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi Reviewed By: mkazantsev Subscribers: evstupac, javed.absar, spatel, llvm-commits Differential Revision: https://reviews.llvm.org/D48599 llvm-svn: 335950
* Generalize MergeBlockIntoPredecessor. Replace uses of ↵Alina Sbirlea2018-06-202-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MergeBasicBlockIntoOnlyPred. Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor Prior to the patch: 1. MergeBasicBlockIntoOnlyPred Updates either DomTree or DeferredDominance Moves all instructions from Pred to BB, deletes Pred Asserts BB has single predecessor If address was taken, replace the block address with constant 1 (?) 2. MergeBlockIntoPredecessor Updates DomTree, LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken After the patch: Method 2. MergeBlockIntoPredecessor is attempting to become the new default: Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken Uses of MergeBasicBlockIntoOnlyPred that need to be replaced: 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp Updated in this patch. No challenges. 2. lib/CodeGen/CodeGenPrepare.cpp Updated in this patch. i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation. ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks Some interesting aspects: - Since Pred is not deleted (BB is), the entry block does not need updating. - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred. - isMergingEmptyBlockProfitable assumes BB is the one to be deleted. - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead. - adding some test owner as subscribers for the interesting tests modified: test/CodeGen/X86/avx-cmp.ll test/CodeGen/AMDGPU/nested-loop-conditions.ll test/CodeGen/AMDGPU/si-annotate-cf.ll test/CodeGen/X86/hoist-spill.ll test/CodeGen/X86/2006-11-17-IllegalMove.ll 3. lib/Transforms/Scalar/JumpThreading.cpp Not covered in this patch. It is the only use case using the DeferredDominance. I would defer to Brian Rzycki to make this replacement. Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D48202 llvm-svn: 335183
* reapply r334209 with fixes for harfbuzz in ChromiumDaniil Fukalov2018-06-081-1/+42
| | | | | | | | | | | r334209 description: [LSR] Check yet more intrinsic pointer operands the patch fixes another assertion in isLegalUse() Differential Revision: https://reviews.llvm.org/D47794 llvm-svn: 334300
* Revert r334209 "[LSR] Check yet more intrinsic pointer operands"Reid Kleckner2018-06-081-42/+1
| | | | | | | This causes cast failures when compiling harfbuzz in Chromium. Reproducer on the way. llvm-svn: 334254
* [LSR] Check yet more intrinsic pointer operandsDaniil Fukalov2018-06-071-1/+42
| | | | | | | | the patch fixes another assertion in isLegalUse() Differential Revision: https://reviews.llvm.org/D47794 llvm-svn: 334209
* [AMDGPU] Move lsr test. NFC.Stanislav Mekhanoshin2018-05-171-0/+37
| | | | llvm-svn: 332562
* Fix LSR compile time hang.Evgeny Stupachenko2018-05-161-0/+1336
| | | | | | | | | | | | | Summary: Limit number of reassociations in GenerateReassociationsImpl. Reviewers: qcolombet, mkazantsev Differential Revision: https://reviews.llvm.org/D46039 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 332426
* Revert "[PowerPC] LSR tunings for PowerPC"Stefan Pintilie2018-03-091-57/+0
| | | | | | | | Revert the rest of the LST tune commit. It seems that the LSR tune commit breaks internal tests. Reverting the commit. llvm-svn: 327143
* Revert "[PowerPC] Move test to correct location."Stefan Pintilie2018-03-091-0/+57
| | | | | | Revert part of the LSR tune commit. llvm-svn: 327142
* [PowerPC] Move test to correct location.Stefan Pintilie2018-03-071-57/+0
| | | | | | | Test was added in r326906 to an incorrect location. Moving the test to PPC CodeGen directory as the test is PPC specific. llvm-svn: 326923
* [PowerPC] LSR tunings for PowerPCStefan Pintilie2018-03-071-0/+57
| | | | | | | | | The purpose of this patch is to have LSR generate better code on Power. This is done by overriding isLSRCostLess. Differential Revision: https://reviews.llvm.org/D40855 llvm-svn: 326906
* [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused ↵Sanjay Patel2018-02-052-35/+32
| | | | | | | | | | | | | | | (PR35681) In the motivating case from PR35681 and represented by the macro-fuse-cmp test: https://bugs.llvm.org/show_bug.cgi?id=35681 ...there's a 37 -> 31 byte size win for the loop because we eliminate the big base address offsets. SPEC2017 on Ryzen shows no significant perf difference. Differential Revision: https://reviews.llvm.org/D42607 llvm-svn: 324289
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-021-3/+2
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* [LSR] Don't force bases of foldable formulae to the final type.Mikael Holmen2018-02-012-42/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Before emitting code for scaled registers, we prevent SCEVExpander from hoisting any scaled addressing mode by emitting all the bases first. However, these bases are being forced to the final type, resulting in some odd code. For example, if the type of the base is an integer and the final type is a pointer, we will emit an inttoptr for the base, a ptrtoint for the scale, and then a 'reverse' GEP where the GEP pointer is actually the base integer and the index is the pointer. It's more intuitive to use the pointer as a pointer and the integer as index. Patch by: Bevin Hansson Reviewers: atrick, qcolombet, sanjoy Reviewed By: qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42103 llvm-svn: 323946
* Followup on Proposal to move MIR physical register namespace to '$' sigil.Puyan Lotfi2018-01-311-2/+2
| | | | | | | | | | | | Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922
* [LoopStrengthReduce] add test to show potential macro-fusion-based diff ↵Sanjay Patel2018-01-301-0/+126
| | | | | | | | (PR35681); NFC This is the baseline output for the test proposed with D42607. llvm-svn: 323806
* [x86] auto-generate complete checks; NFCSanjay Patel2018-01-263-93/+443
| | | | llvm-svn: 323571
* [SCEV] Do not cache S -> V if S is not equivalent of VSerguei Katkov2018-01-091-2/+3
| | | | | | | | | | | | | | | | SCEV tracks the correspondence of created SCEV to original instruction. However during creation of SCEV it is possible that nuw/nsw/exact flags are lost. As a result during expansion of the SCEV the instruction with nuw/nsw/exact will be used where it was expected and we produce poison incorreclty. Reviewers: sanjoy, mkazantsev, sebpop, jbhateja Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41578 llvm-svn: 322058
OpenPOWER on IntegriCloud