summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopUnroll
Commit message (Collapse)AuthorAgeFilesLines
...
* [LoopUnrollRuntime] NFC: Refactored safety checks of unrolling multi-exit loopAnna Thomas2017-07-121-0/+68
| | | | | | | | | | | Refactored the code and separated out a function `canSafelyUnrollMultiExitLoop` to reduce redundant checks and make it easier to add profitability heuristics later. Added tests to runtime unrolling to make sure that unrolling for multi-exit loops is not done unless the option -unroll-runtime-multi-exit is true. llvm-svn: 307843
* [LoopUnrollRuntime] Avoid multi-exit nested loop with epilog generationAnna Thomas2017-07-111-0/+44
| | | | | | | | | | The loop structure for the outer loop does not contain the epilog preheader when we try to unroll inner loop with multiple exits and epilog code is generated. For now, we just bail out in such cases. Added a test case that shows the problem. Without this bailout, we would trip on assert saying LCSSA form is incorrect for outer loop. llvm-svn: 307676
* [LoopUnrollRuntime] Remove strict assert about VMap requirementAnna Thomas2017-07-101-1/+42
| | | | | | | | | | | | | When unrolling under multiple exits which is under off-by-default option, the assert that checks for VMap entry in loop exit values is too strong. (assert if VMap entry did not exist, the value should be a constant). However, values derived from constants or from values outside loop, does not have a VMap entry too. Removed the assert and added a testcase showcasing the property for non-constant values. llvm-svn: 307542
* [LoopUnrollRuntime] Support multiple exit blocks unrolling when prolog ↵Anna Thomas2017-07-071-79/+165
| | | | | | | | | | | | | | | remainder generated With the NFC refactoring in rL307417 (git SHA 987dd01), all the logic is in place to support multiple exit/exiting blocks when prolog remainder is generated. This patch removed the assert that multiple exit blocks unrolling is only supported when epilog remainder is generated. Also, added test runs and checks with PROLOG prefix in runtime-loop-multiple-exits.ll test cases. llvm-svn: 307435
* [LoopUnrollRuntime] Bailout when multiple exiting blocks to the unique latch ↵Anna Thomas2017-07-061-0/+30
| | | | | | | | | | | | | | exit block Currently, we do not support multiple exiting blocks to the latch exit block. However, this bailout wasn't triggered when we had a unique exit block (which is the latch exit), with multiple exiting blocks to that unique exit. Moved the bailout so that it's triggered in both cases and added testcase. llvm-svn: 307291
* [RuntimeUnrolling] Add logic for loops with multiple exit blocksAnna Thomas2017-06-301-0/+279
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Runtime unrolling is done for loops with a single exit block and a single exiting block (and this exiting block should be the latch block). This patch adds logic to support unrolling in the presence of multiple exit blocks (which also means multiple exiting blocks). Currently this is under an off-by-default option and is supported when epilog code is generated. Support in presence of prolog code will be in a future patch (we just need to add more tests, and update comments). This patch is essentially an implementation patch. I have not added any heuristic (in terms of branches added or code size) to decide when this should be enabled. Reviewers: mkuper, sanjoy, reames, evstupac Reviewed by: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33001 llvm-svn: 306846
* [AArch64][Falkor] Try to avoid exhausting HW prefetcher resources when ↵Geoff Berry2017-06-281-0/+169
| | | | | | | | | | | | unrolling. Reviewers: t.p.northover, mcrosier Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34533 llvm-svn: 306584
* [LoopUnroll] Fix bug in computeUnrollCount causing it to not honor MaxCountGeoff Berry2017-06-281-0/+31
| | | | | | | | | | Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: mcrosier, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D34532 llvm-svn: 306564
* [LoopUnroll] Fix a test. REQUIRE should be REQUIRES.Davide Italiano2017-05-121-1/+1
| | | | | | Found by inspection. llvm-svn: 302909
* [LoopUnroll] Don't try to unroll non canonical loops.Davide Italiano2017-04-241-0/+26
| | | | | | | | | | | | The current Loop Unroll implementation works with loops having a single latch that contains a conditional branch to a block outside the loop (the other successor is, by defition of latch, the header). If this precondition doesn't hold, avoid unrolling the loop as the code is not ready to handle such circumstances. Differential Revision: https://reviews.llvm.org/D32261 llvm-svn: 301239
* [LoopPeeling] Get rid of Phis that become invariant after N stepsMax Kazantsev2017-04-171-3/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a generalization of the improvement introduced in rL296898. Previously, we were able to peel one iteration of a loop to get rid of a Phi that becomes an invariant on the 2nd iteration. In more general case, if a Phi becomes invariant after N iterations, we can peel N times and turn it into invariant. In order to do this, we for every Phi in loop's header we define the Invariant Depth value which is calculated as follows: Given %x = phi <Inputs from above the loop>, ..., [%y, %back.edge]. If %y is a loop invariant, then Depth(%x) = 1. If %y is a Phi from the loop header, Depth(%x) = Depth(%y) + 1. Otherwise, Depth(%x) is infinite. Notice that if we peel a loop, all Phis with Depth = 1 become invariants, and all other Phis with finite depth decrease the depth by 1. Thus, peeling N first iterations allows us to turn all Phis with Depth <= N into invariants. Reviewers: reames, apilipenko, mkuper, skatkov, anna, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31613 llvm-svn: 300446
* [LoopPeeling] Fix condition for phi-eliminating peelingMax Kazantsev2017-04-172-1/+29
| | | | | | | | | | | | | | | | | When peeling loops basing on phis becoming invariants, we make a wrong loop size check. UP.Threshold should be compared against the total numbers of instructions after the transformation, which is equal to 2 * LoopSize in case of peeling one iteration. We should also check that the maximum allowed number of peeled iterations is not zero. Reviewers: sanjoy, anna, reames, mkuper Reviewed By: mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31753 llvm-svn: 300441
* [LoopUnroll] Remap references in peeled iterationSerge Pavlov2017-03-261-0/+61
| | | | | | | | | References in cloned blocks must be remapped prior to dominator calculation. Differential Revision: https://reviews.llvm.org/D31281 llvm-svn: 298811
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-212-5/+5
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* [LoopUnroll] Don't peel loops where the latch isn't the exiting blockMichael Kuperstein2017-03-161-0/+36
| | | | | | | | | Peeling assumed this doesn't happen, but didn't check it. This fixes PR32178. Differential Revision: https://reviews.llvm.org/D30757 llvm-svn: 297993
* [LoopUnrolling] Fix loop size check for peelingSanjoy Das2017-03-071-1/+29
| | | | | | | | | | | | | | | | | | Summary: We should check if loop size allows us to peel at least one iteration before we do so. Patch by Max Kazantsev! Reviewers: sanjoy, mkuper, efriedma Reviewed By: mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30632 llvm-svn: 297122
* Add test missed in r296770.Evgeny Stupachenko2017-03-041-0/+65
| | | | | | | Differential Revision: http://reviews.llvm.org/D27004 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296962
* [LoopUnrolling] Peel loops with invariant backedge Phi inputSanjoy Das2017-03-031-0/+25
| | | | | | | | | | | | | | | | | | | | | Summary: If a loop contains a Phi node which has an invariant input from back edge, it is profitable to peel such loops (rather than unroll them) to use the advantage that this Phi is always invariant starting from 2nd iteration. After the 1st iteration is peeled, other optimizations can potentially simplify calculations with this invariant. Patch by Max Kazantsev! Reviewers: sanjoy, apilipenko, igor-laevsky, anna, mkuper, reames Reviewed By: mkuper Subscribers: mkuper, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D30161 llvm-svn: 296898
* The patch fixes r296770Evgeny Stupachenko2017-03-021-2/+2
| | | | | | | | | | Summary: Extend -unroll-partial-threshold to 200 for runtime-loop3.ll test as epilogue unroll initially add 1 more IV to the loop. From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296803
* The patch turns on epilogue unroll for loops with constant recurency start.Evgeny Stupachenko2017-03-024-20/+21
| | | | | | | | | | | | | | | | | Summary: Set unroll remainder to epilog if a loop contains a phi with constant parameter: loop: pn = phi [Const, PreHeader], [pn.next, Latch] ... Reviewer: hfinkel Differential Revision: http://reviews.llvm.org/D27004 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296770
* [LoopUnroll] Enable PGO-based loop peeling by default.Michael Kuperstein2017-02-221-1/+1
| | | | | | | | | This enables peeling of loops with low dynamic iteration count by default, when profile information is available. Differential Revision: https://reviews.llvm.org/D27734 llvm-svn: 295796
* AMDGPU: Don't unroll for private with dynamic allocasMatt Arsenault2017-02-031-0/+34
| | | | | | | This won't be elimnated, so this will just bloat code if/when these are ever used/supported. llvm-svn: 294030
* [AMDGPU] Unroll preferences improvementsStanislav Mekhanoshin2017-02-031-0/+120
| | | | | | | | | | | Exit loop analysis early if suitable private access found. Do not account for GEPs which are invariant to loop induction variable. Do not account for Allocas which are too big to fit into register file anyway. Add option for tuning: -amdgpu-unroll-threshold-private. Differential Revision: https://reviews.llvm.org/D29473 llvm-svn: 293991
* [PM] Simplify the new PM interface to the loop unroller and expose twoChandler Carruth2017-01-2610-8/+26
| | | | | | | | | | | | | | factory functions for the two modes the loop unroller is actually used in in-tree: simplified full-unrolling and the entire thing including partial unrolling. I've also wired these up to nice names so you can express both of these being in a pipeline easily. This is a precursor to actually enabling these parts of the O2 pipeline. Differential Revision: https://reviews.llvm.org/D28897 llvm-svn: 293136
* [LoopUnroll] Properly update loopinfo for runtime unrolling by 2Michael Kuperstein2017-01-262-2/+44
| | | | | | | | | | | Even when we don't create a remainder loop (that is, when we unroll by 2), we may duplicate nested loops into the remainder. This is complicated by the fact the remainder may itself be either inserted into an outer loop, or at the top level. In the latter case, we may need to create new top-level loops. Differential Revision: https://reviews.llvm.org/D29156 llvm-svn: 293124
* [PM] Teach LoopUnroll to update the LPM infrastructure as it unrollsChandler Carruth2017-01-2517-1/+174
| | | | | | | | | | | | | | | | | | | loops. We do this by reconstructing the newly added loops after the unroll completes to avoid threading pass manager details through all the mess of the unrolling infrastructure. I've enabled some extra assertions in the LPM to try and catch issues here and enabled a bunch of unroller tests to try and make sure this is sane. Currently, I'm manually running loop-simplify when needed. That should go away once it is folded into the LPM infrastructure. Differential Revision: https://reviews.llvm.org/D28848 llvm-svn: 293011
* Update domtree incrementally in loop peeling.Serge Pavlov2017-01-241-1/+1
| | | | | | | | | With this change dominator tree remains in sync after each step of loop peeling. Differential Revision: https://reviews.llvm.org/D29029 llvm-svn: 292895
* [LoopUnroll] First form LCSSA, then loop-simplifyMichael Kuperstein2017-01-231-0/+55
| | | | | | | | | | | | | Running non-LCSSA-preserving LoopSimplify followed by LCSSA on (roughly) the same loop is incorrect, since LoopSimplify may break LCSSA arbitrarily higher in the loop nest. Instead, run LCSSA first, and then run LCSSA-preserving LoopSimplify on the result. This fixes PR31718. Differential Revision: https://reviews.llvm.org/D29055 llvm-svn: 292854
* Introduce -unroll-partial-threshold to separate PartialThreshold from ↵Dehao Chen2017-01-172-3/+3
| | | | | | | | | | | | | | | | Threshold in loop unorller. Summary: Partial unrolling should have separate threshold with full unrolling. Reviewers: efriedma, mzolotukhin Reviewed By: efriedma, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28831 llvm-svn: 292293
* Add test that verifies we don't peel loops in optsize functions. NFC.Michael Kuperstein2017-01-111-0/+39
| | | | llvm-svn: 291708
* Make sure total loop body weight is preserved in loop peelingXin Tong2017-01-021-1/+1
| | | | | | | | | | | | | | | Summary: Regardless how the loop body weight is distributed, we should preserve total loop body weight. i.e. we should have same weight reaching the body of the loop or its duplicates in peeled and unpeeled case. Reviewers: mkuper, davidxl, anemet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28179 llvm-svn: 290833
* Use continuous boosting factor for complete unroll.Dehao Chen2016-12-309-18/+15
| | | | | | | | | | | | | | | | | | | | Summary: The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously. The patch also simplified the code by reducing one parameter in UP. The patch only affects code-gen of two speccpu2006 benchmark: 445.gobmk binary size decreases 0.08%, no performance change. 464.h264ref binary size increases 0.24%, no performance change. Reviewers: mzolotukhin, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26989 llvm-svn: 290737
* [LoopUnroll] Implement profile-based loop peelingMichael Kuperstein2016-11-302-0/+143
| | | | | | | | | | | | | | | | | | | This implements PGO-driven loop peeling. The basic idea is that when the average dynamic trip-count of a loop is known, based on PGO, to be low, we can expect a performance win by peeling off the first several iterations of that loop. Unlike unrolling based on a known trip count, or a trip count multiple, this doesn't save us the conditional check and branch on each iteration. However, it does allow us to simplify the straight-line code we get (constant-folding, etc.). This is important given that we know that we will usually only hit this code, and not the actual loop. This is currently disabled by default. Differential Revision: https://reviews.llvm.org/D25963 llvm-svn: 288274
* Use profile info to adjust loop unroll threshold.Dehao Chen2016-11-171-0/+59
| | | | | | | | | | | | | | Summary: For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold. For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling. Reviewers: davidxl, mzolotukhin Subscribers: sanjoy, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D26527 llvm-svn: 287186
* [LoopUnroll] Keep the loop test only on the first iteration of max-or-zero loopsJohn Brawn2016-10-211-0/+207
| | | | | | | | | | | | | | | | When we have a loop with a known upper bound on the number of iterations, and furthermore know that either the number of iterations will be either exactly that upper bound or zero, then we can fully unroll up to that upper bound keeping only the first loop test to check for the zero iteration case. Most of the work here is in plumbing this 'max-or-zero' information from the part of scalar evolution where it's detected through to loop unrolling. I've also gone for the safe default of 'false' everywhere but howManyLessThans which could probably be improved. Differential Revision: https://reviews.llvm.org/D25682 llvm-svn: 284818
* Reapply "[LoopUnroll] Use the upper bound of the loop trip count to fullly ↵Haicheng Wu2016-10-121-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unroll a loop" Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049. The original summary: This patch tries to fully unroll loops having break statement like this for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } } GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts. The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code. The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count. llvm-svn: 284053
* Revert "[LoopUnroll] Use the upper bound of the loop trip count to fullly ↵Haicheng Wu2016-10-121-43/+0
| | | | | | | | unroll a loop" This reverts commit r284044. llvm-svn: 284051
* [LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loopHaicheng Wu2016-10-121-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch tries to fully unroll loops having break statement like this for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } } GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts. The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code. The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count. Differential Revision: https://reviews.llvm.org/D24790 llvm-svn: 284044
* Revert test change in r282894 as it's broken in some platforms.Dehao Chen2016-09-301-40/+2
| | | | llvm-svn: 282903
* Update loop unroller cost model to make sure debug info does not affect ↵Dehao Chen2016-09-301-2/+40
| | | | | | | | | | | | | | optimization decisions. Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info. Reviewers: davidxl, mzolotukhin Subscribers: haicheng, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D25098 llvm-svn: 282894
* [LoopSimplify] When simplifying phis in loop-simplify, do it only if it ↵Michael Zolotukhin2016-09-271-0/+37
| | | | | | preserves LCSSA form. llvm-svn: 282541
* Revert "[LoopUnroll] Properly update loop-info when cloning prologues and ↵Michael Zolotukhin2016-09-081-44/+0
| | | | | | | | | | epilogues." This reverts commit r280901. This caused a bunch of failures, reverting it until I investigate them. llvm-svn: 280905
* [LoopUnroll] Properly update loop-info when cloning prologues and epilogues.Michael Zolotukhin2016-09-081-0/+44
| | | | | | | | | | | | | | | | | | | Summary: When cloning blocks for prologue/epilogue we need to replicate the loop structure from the original loop. It wasn't a problem for the innermost loops, but it led to an incorrect loop info when we unrolled a loop with a child loop - in this case created prologue-loop had a child loop, but loop info didn't reflect that. This fixes PR28888. Reviewers: chandlerc, sanjoy, hfinkel Subscribers: llvm-commits, silvas Differential Revision: https://reviews.llvm.org/D24203 llvm-svn: 280901
* [LoopUnroll] Fix a PowerPC test broken by r277524.Michael Zolotukhin2016-08-021-1/+1
| | | | llvm-svn: 277527
* [LoopUnroll] Switch the default value of -unroll-runtime-epilog back to its ↵Michael Zolotukhin2016-08-029-49/+25
| | | | | | | | | | original value. As agreed in post-commit review of r265388, I'm switching the flag to its original value until the 90% runtime performance regression on SingleSource/Benchmarks/Stanford/Bubblesort is addressed. llvm-svn: 277524
* [LoopUnroll] Ensure we create prolog loops in simplified form.Michael Zolotukhin2016-08-021-1/+1
| | | | llvm-svn: 277502
* [LoopUnroll] Include hotness of region in opt remarkAdam Nemet2016-07-292-1/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter is added to the common function analysis passes that loop passes depend on. The BFI and indirectly BPI used in this pass is computed lazily so no overhead should be observed unless -pass-remarks-with-hotness is used. This is how the patch affects the O3 pipeline: Dominator Tree Construction Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Rotate Loops Loop Invariant Code Motion Unswitch loops Simplify the CFG Dominator Tree Construction Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Combine redundant instructions Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Induction Variable Simplification Recognize loop idioms Delete dead loops Unroll loops ... llvm-svn: 277203
* [PM] Port LoopUnroll.Sean Silva2016-07-191-0/+1
| | | | | | | | | We just set PreserveLCSSA to always true since we don't have an analogous method `mustPreserveAnalysisID(LCSSA)`. Also port LoopInfo verifier pass to test LoopUnrollPass. llvm-svn: 276063
* [LoopUnrollAnalyzer] Fix a bug in UnrolledInstAnalyzer::visitLoad.Michael Zolotukhin2016-06-231-0/+18
| | | | | | | | | | | When simplifying a load we need to make sure that the type of the simplified value matches the type of the instruction we're processing. In theory, we can handle casts here as we deal with constant data, but since it's not implemented at the moment, we at least need to bail out. This fixes PR28262. llvm-svn: 273562
* [SCEV] Fix incorrect trip count computationSanjoy Das2016-06-181-32/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The way we elide max expressions when computing trip counts is incorrect -- it breaks cases like this: ``` static int wrapping_add(int a, int b) { return (int)((unsigned)a + (unsigned)b); } void test() { volatile int end_buf = 2147483548; // INT_MIN - 100 int end = end_buf; unsigned counter = 0; for (int start = wrapping_add(end, 200); start < end; start++) counter++; print(counter); } ``` Note: the `NoWrap` variable that was being tested has little to do with the values flowing into the max expression; it is a property of the induction variable. test/Transforms/LoopUnroll/nsw-tripcount.ll was added to solely test functionality I'm reverting in this change, so I've deleted the test fully. llvm-svn: 273079
OpenPOWER on IntegriCloud