summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Utils/LoopUnroll.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Add a comment for a todo in LoopUnroll post cleanupPhilip Reames2016-12-301-0/+5
| | | | llvm-svn: 290769
* [LoopUnroll] Modify a comment to clarify the usage of TripCount. NFC.Haicheng Wu2016-12-201-8/+8
| | | | | | | | | Make it clear that TripCount is the upper bound of the iteration on which control exits LatchBlock. Differential Revision: https://reviews.llvm.org/D26675 llvm-svn: 290199
* Revert @llvm.assume with operator bundles (r289755-r289757)Daniel Jasper2016-12-191-5/+12
| | | | | | | This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086
* Remove the AssumptionCacheHal Finkel2016-12-151-12/+5
| | | | | | | | | After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756
* [LoopUnroll] Implement profile-based loop peelingMichael Kuperstein2016-11-301-10/+26
| | | | | | | | | | | | | | | | | | | This implements PGO-driven loop peeling. The basic idea is that when the average dynamic trip-count of a loop is known, based on PGO, to be low, we can expect a performance win by peeling off the first several iterations of that loop. Unlike unrolling based on a known trip count, or a trip count multiple, this doesn't save us the conditional check and branch on each iteration. However, it does allow us to simplify the straight-line code we get (constant-folding, etc.). This is important given that we know that we will usually only hit this code, and not the actual loop. This is currently disabled by default. Differential Revision: https://reviews.llvm.org/D25963 llvm-svn: 288274
* [LoopUnroll] Keep the loop test only on the first iteration of max-or-zero loopsJohn Brawn2016-10-211-6/+7
| | | | | | | | | | | | | | | | When we have a loop with a known upper bound on the number of iterations, and furthermore know that either the number of iterations will be either exactly that upper bound or zero, then we can fully unroll up to that upper bound keeping only the first loop test to check for the zero iteration case. Most of the work here is in plumbing this 'max-or-zero' information from the part of scalar evolution where it's detected through to loop unrolling. I've also gone for the safe default of 'false' everywhere but howManyLessThans which could probably be improved. Differential Revision: https://reviews.llvm.org/D25682 llvm-svn: 284818
* Reapply "[LoopUnroll] Use the upper bound of the loop trip count to fullly ↵Haicheng Wu2016-10-121-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unroll a loop" Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049. The original summary: This patch tries to fully unroll loops having break statement like this for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } } GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts. The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code. The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count. llvm-svn: 284053
* Revert "[LoopUnroll] Use the upper bound of the loop trip count to fullly ↵Haicheng Wu2016-10-121-18/+9
| | | | | | | | unroll a loop" This reverts commit r284044. llvm-svn: 284051
* [LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loopHaicheng Wu2016-10-121-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch tries to fully unroll loops having break statement like this for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } } GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts. The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code. The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count. Differential Revision: https://reviews.llvm.org/D24790 llvm-svn: 284044
* [LoopUnroll] Port to the new streaming interface for opt remarks.Adam Nemet2016-09-301-10/+13
| | | | llvm-svn: 282834
* [LoopUnroll] Don't clear out the AssumptionCache on each loopDavid Majnemer2016-08-161-6/+8
| | | | | | | | | | | Clearing out the AssumptionCache can cause us to rescan the entire function for assumes. If there are many loops, then we are scanning over the entire function many times. Instead of clearing out the AssumptionCache, register all cloned assumes. llvm-svn: 278854
* Use range algorithms instead of unpacking begin/endDavid Majnemer2016-08-111-2/+3
| | | | | | No functionality change is intended. llvm-svn: 278417
* [LoopUnroll] Simplify loops created by unrolling.Michael Zolotukhin2016-08-081-0/+19
| | | | | | | | | | | | | | Summary: Currently loop-unrolling doesn't preserve loop-simplified form. This patch fixes it by resimplifying affected loops. Reviewers: chandlerc, sanjoy, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23148 llvm-svn: 278038
* [LoopUnroll] Switch the default value of -unroll-runtime-epilog back to its ↵Michael Zolotukhin2016-08-021-1/+1
| | | | | | | | | | original value. As agreed in post-commit review of r265388, I'm switching the flag to its original value until the 90% runtime performance regression on SingleSource/Benchmarks/Stanford/Bubblesort is addressed. llvm-svn: 277524
* [LoopUnroll] Include hotness of region in opt remarkAdam Nemet2016-07-291-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter is added to the common function analysis passes that loop passes depend on. The BFI and indirectly BPI used in this pass is computed lazily so no overhead should be observed unless -pass-remarks-with-hotness is used. This is how the patch affects the O3 pipeline: Dominator Tree Construction Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Rotate Loops Loop Invariant Code Motion Unswitch loops Simplify the CFG Dominator Tree Construction Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Combine redundant instructions Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Induction Variable Simplification Recognize loop idioms Delete dead loops Unroll loops ... llvm-svn: 277203
* [PM] Port LoopSimplify to the new pass manager.Davide Italiano2016-07-091-0/+1
| | | | | | | | | While here move simplifyLoop() function to the new header, as suggested by Chandler in the review. Differential Revision: http://reviews.llvm.org/D21404 llvm-svn: 274959
* Reinstate r273711David Majnemer2016-06-251-6/+5
| | | | | | | | | | r273711 was reverted by r273743. The inliner needs to know about any call sites in the inlined function. These were obscured if we replaced a call to undef with an undef but kept the call around. This fixes PR28298. llvm-svn: 273753
* Revert r273711, it caused PR28298.Nico Weber2016-06-241-5/+6
| | | | llvm-svn: 273743
* SimplifyInstruction does not imply DCEDavid Majnemer2016-06-241-6/+5
| | | | | | | We cannot remove an instruction with no uses just because SimplifyInstruction succeeds. It may have side effects. llvm-svn: 273711
* [LoopUnroll] Check that DT is available before trying to verify it.Michael Zolotukhin2016-06-081-1/+1
| | | | llvm-svn: 272221
* The patch refactors unroll pass.Evgeny Stupachenko2016-05-271-3/+7
| | | | | | | | | | | | | | | | Summary: Unroll factor (Count) calculations moved to a new function. Early exits on pragma and "-unroll-count" defined factor added. New type of unrolling "Force" introduced (previously used implicitly). New unroll preference "AllowRemainder" introduced and set "true" by default. (should be set to false for architectures that suffers from it). Reviewers: hfinkel, mzolotukhin, zzheng Differential Revision: http://reviews.llvm.org/D19553 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 271071
* Minor formatting fixes in LoopUnroll.cpp.Justin Lebar2016-05-101-4/+2
| | | | llvm-svn: 268995
* Follow-up for r265605: don't mutate vector we're iterating.Michael Zolotukhin2016-04-071-4/+6
| | | | llvm-svn: 265625
* [LoopUnroll] Fix the way we update DT after complete unrolling.Michael Zolotukhin2016-04-061-10/+15
| | | | | | | | Updating dominators for exit-blocks of the unrolled loops is not enough, as shown in PR27157. The proper way is to update dominators for all dominance-children of original loop blocks. llvm-svn: 265605
* Adds the ability to use an epilog remainder loop during loop unrolling and makesDavid L Kreitzer2016-04-051-4/+10
| | | | | | | | | | this the default behavior. Patch by Evgeny Stupachenko (evstupac@gmail.com). Differential Revision: http://reviews.llvm.org/D18158 llvm-svn: 265388
* Use some braces to format this a little better.Eric Christopher2016-03-151-5/+9
| | | | llvm-svn: 263527
* Fix llvm/llvm/lib/Transforms/Utils/LoopUnroll.cpp:285:53: error: suggestEric Christopher2016-03-151-11/+9
| | | | | | parentheses around '&&' within '||' [-Werror=parentheses]. llvm-svn: 263525
* [LoopUnroll] Respect the convergent attribute.Justin Lebar2016-03-141-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Specifically, when we perform runtime loop unrolling of a loop that contains a convergent op, we can only unroll k times, where k divides the loop trip multiple. Without this change, we'll happily unroll e.g. the following loop for (int i = 0; i < N; ++i) { if (i == 0) convergent_op(); foo(); } into int i = 0; if (N % 2 == 1) { convergent_op(); foo(); ++i; } for (; i < N - 1; i += 2) { if (i == 0) convergent_op(); foo(); foo(); }. This is unsafe, because we've just added a control-flow dependency to the convergent op in the prelude. In general, runtime unrolling loops that contain convergent ops is safe only if we don't have emit a prelude, which occurs when the unroll count divides the trip multiple. Reviewers: resistor Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17526 llvm-svn: 263509
* rangify, fix function names; NFCISanjay Patel2016-03-081-27/+22
| | | | llvm-svn: 262940
* don't repeat function names in documentation comments; NFCSanjay Patel2016-03-081-4/+4
| | | | llvm-svn: 262937
* Follow up for r261597: Add the * to the auto.Michael Zolotukhin2016-02-231-1/+1
| | | | llvm-svn: 261600
* Follow-up for r261595: use range loop.Michael Zolotukhin2016-02-231-4/+2
| | | | llvm-svn: 261597
* [LoopUnroll] Avoid unnecessary DT recomputation.Michael Zolotukhin2016-02-231-8/+54
| | | | | | | | | | | | | | | | | | | Summary: When we completely unroll a loop, it's pretty easy to update DT in-place and thus avoid rebuilding it. DT recalculation is one of the most time-consuming tasks in loop-unroll, so avoiding it at least in case of full unroll should be beneficial. On some extreme (but still real-world) tests this patch improves compile time by ~2x. Reviewers: escha, jmolloy, hfinkel, sanjoy, chandlerc Subscribers: joker.eph, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D17473 llvm-svn: 261595
* [LoopUnrolling] Fix a bug introduced in r259869 (PR26688).Michael Zolotukhin2016-02-221-2/+6
| | | | | | | | The issue was that we only required LCSSA rebuilding if the immediate parent-loop had values used outside of it. The fix is to enaable the same logic for all outer loops, not only immediate parent. llvm-svn: 261575
* [LoopUnrolling] Try harder to avoid rebuilding LCSSA when possible.Michael Zolotukhin2016-02-051-7/+56
| | | | | | | | | | | | | In r255133 (reapplied r253126) we started to avoid redundant recomputation of LCSSA after loop-unrolling. This patch moves one step further in this direction - now we can avoid it for much wider range of loops, as we start to look at IR and try to figure out if the transformation actually breaks LCSSA phis or makes it necessary to insert new ones. Differential Revision: http://reviews.llvm.org/D16838 llvm-svn: 259869
* LoopInfo: Simplify ownership of Loop objectsJustin Bogner2016-01-081-2/+2
| | | | | | | | | | | It's strange that LoopInfo mostly owns the Loop objects, but that it defers deleting them to the loop pass manager. Instead, change the oddly named "updateUnloop" to "markAsRemoved" and have it queue the Loop object for deletion. We can't delete the Loop immediately when we remove it, since we need its pointer identity still, so we'll mark the object as "invalid" so that clients can see what's going on. llvm-svn: 257191
* LPM: Make callers of LPM.deleteLoopFromQueue update LoopInfo directly. NFCJustin Bogner2015-12-161-8/+5
| | | | | | | | | | | As of r255720, the loop pass manager will DTRT when passes update the loop info for removed loops, so they no longer need to reach into LPPassManager APIs to do this kind of transformation. This change very nearly removes the need for the LPPassManager to even be passed into loop passes - the only remaining pass that uses the LPM argument is LoopUnswitch. llvm-svn: 255797
* LPM: Stop threading `Pass *` through all of the loop utility APIs. NFCJustin Bogner2015-12-151-41/+32
| | | | | | | | | | | | | | | | | | | | | | A large number of loop utility functions take a `Pass *` and reach into it to find out which analyses to preserve. There are a number of problems with this: - The APIs have access to pretty well any Pass state they want, so it's hard to tell what they may or may not do. - Other APIs have copied these and pass around a `Pass *` even though they don't even use it. Some of these just hand a nullptr to the API since the callers don't even have a pass available. - Passes in the new pass manager don't work like the current ones, so the APIs can't be used as is there. Instead, we should explicitly thread the analysis results that we actually care about through these APIs. This is both simpler and more reusable. llvm-svn: 255669
* Revert "Revert r253253 and r253126: "Don't recompute LCSSA after ↵Michael Zolotukhin2015-12-091-2/+12
| | | | | | | | | | | loop-unrolling when possible."" The bug in IndVarSimplify was fixed in r254976, r254977, so I'm reapplying the original patch for avoiding redundant LCSSA recomputation. This reverts commit ffe3b434e505e403146aff00be0c177bb6d13466. llvm-svn: 255133
* Revert r253253 and r253126: "Don't recompute LCSSA after loop-unrolling when ↵Michael Zolotukhin2015-11-191-12/+2
| | | | | | | | | | | | possible." The change exposed a bug in IndVarSimplify (PR25578), which led to a failure (PR25538). When the bug is fixed, this patch can be reapplied. The tests are kept in tree, as they're useful anyway, and will not break with this revert. llvm-svn: 253596
* [PR25538]: Fix a failure caused by r253126.Michael Zolotukhin2015-11-161-2/+2
| | | | | | | | | | | | | | | | In r253126 we stopped to recompute LCSSA after loop unrolling in all cases, except the unrolling is full and at least one of the loop exits is outside the parent loop. In other cases the transformation should not break LCSSA, but it turned out, that we also call SimplifyLoop on the parent loop, which might break LCSSA by itself. This fix just triggers LCSSA recomputation in this case as well. I'm committing it without a test case for now, but I'll try to invent one. It's a bit tricky because in an isolated test LoopSimplify would be scheduled before LoopUnroll, and thus will change the test and hide the problem. llvm-svn: 253253
* Don't recompute LCSSA after loop-unrolling when possible.Michael Zolotukhin2015-11-141-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Currently we always recompute LCSSA for outer loops after unrolling an inner loop. That leads to compile time problem when we have big loop nests, and we can solve it by avoiding unnecessary work. For instance, if w eonly do partial unrolling, we don't break LCSSA, so we don't need to rebuild it. Also, if all exits from the inner loop are inside the enclosing loop, then complete unrolling won't break LCSSA either. I replaced unconditional LCSSA recomputation with conditional recomputation + unconditional assert and added several tests, which were failing when I experimented with it. Soon I plan to follow up with a similar patch for recalculation of dominators tree. Reviewers: hfinkel, dexonsmith, bogner, joker.eph, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14526 llvm-svn: 253126
* TransformUtils: Remove implicit ilist iterator conversions, NFCDuncan P. N. Exon Smith2015-10-131-2/+2
| | | | | | | | | | | Continuing the work from last week to remove implicit ilist iterator conversions. First related commit was probably r249767, with some more motivation in r249925. This edition gets LLVMTransformUtils compiling without the implicit conversions. No functional change intended. llvm-svn: 250142
* [IndVars] Don't break dominance in `eliminateIdentitySCEV`Sanjoy Das2015-10-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Summary: After r249211, `getSCEV(X) == getSCEV(Y)` does not guarantee that X and Y are related in the dominator tree, even if X is an operand to Y (I've included a toy example in comments, and a real example as a test case). This commit changes `SimplifyIndVar` to require a `DominatorTree`. I don't think this is a problem because `ScalarEvolution` requires it anyway. Fixes PR25051. Depends on D13459. Reviewers: atrick, hfinkel Subscribers: joker.eph, llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D13460 llvm-svn: 249471
* [Unroll] When completely unrolling the loop, replace conditinal branches ↵Michael Zolotukhin2015-09-231-2/+3
| | | | | | | | | | | | | | | with unconditional. Nothing is expected to change, except we do less redundant work in clean-up. Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12951 llvm-svn: 248444
* [PM] Port ScalarEvolution to the new pass manager.Chandler Carruth2015-08-171-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change makes ScalarEvolution a stand-alone object and just produces one from a pass as needed. Making this work well requires making the object movable, using references instead of overwritten pointers in a number of places, and other refactorings. I've also wired it up to the new pass manager and added a RUN line to a test to exercise it under the new pass manager. This includes basic printing support much like with other analyses. But there is a big and somewhat scary change here. Prior to this patch ScalarEvolution was never *actually* invalidated!!! Re-running the pass just re-wired up the various other analyses and didn't remove any of the existing entries in the SCEV caches or clear out anything at all. This might seem OK as everything in SCEV that can uses ValueHandles to track updates to the values that serve as SCEV keys. However, this still means that as we ran SCEV over each function in the module, we kept accumulating more and more SCEVs into the cache. At the end, we would have a SCEV cache with every value that we ever needed a SCEV for in the entire module!!! Yowzers. The releaseMemory routine would dump all of this, but that isn't realy called during normal runs of the pipeline as far as I can see. To make matters worse, there *is* actually a key that we don't update with value handles -- there is a map keyed off of Loop*s. Because LoopInfo *does* release its memory from run to run, it is entirely possible to run SCEV over one function, then over another function, and then lookup a Loop* from the second function but find an entry inserted for the first function! Ouch. To make matters still worse, there are plenty of updates that *don't* trip a value handle. It seems incredibly unlikely that today GVN or another pass that invalidates SCEV can update values in *just* such a way that a subsequent run of SCEV will incorrectly find lookups in a cache, but it is theoretically possible and would be a nightmare to debug. With this refactoring, I've fixed all this by actually destroying and recreating the ScalarEvolution object from run to run. Technically, this could increase the amount of malloc traffic we see, but then again it is also technically correct. ;] I don't actually think we're suffering from tons of malloc traffic from SCEV because if we were, the fact that we never clear the memory would seem more likely to have come up as an actual problem before now. So, I've made the simple fix here. If in fact there are serious issues with too much allocation and deallocation, I can work on a clever fix that preserves the allocations (while clearing the data) between each run, but I'd prefer to do that kind of optimization with a test case / benchmark that shows why we need such cleverness (and that can test that we actually make it faster). It's possible that this will make some things faster by making the SCEV caches have higher locality (due to being significantly smaller) so until there is a clear benchmark, I think the simple change is best. Differential Revision: http://reviews.llvm.org/D12063 llvm-svn: 245193
* [PM/AA] Remove all of the dead AliasAnalysis pointers being threadedChandler Carruth2015-07-221-1/+1
| | | | | | | | | | through APIs that are no longer necessary now that the update API has been removed. This will make changes to the AA interfaces significantly less disruptive (I hope). Either way, it seems like a really nice cleanup. llvm-svn: 242882
* [LoopUnrollRuntime] Avoid high-cost trip count computation.Sanjoy Das2015-04-141-3/+12
| | | | | | | | | | | | | | | | | Summary: Runtime unrolling of loops needs to emit an expression to compute the loop's runtime trip-count. Avoid runtime unrolling if this computation will be expensive. Depends on D8993. Reviewers: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8994 llvm-svn: 234846
* Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.Benjamin Kramer2015-03-231-1/+1
| | | | llvm-svn: 232998
* DataLayout is mandatory, update the API to reflect it with references.Mehdi Amini2015-03-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Now that the DataLayout is a mandatory part of the module, let's start cleaning the codebase. This patch is a first attempt at doing that. This patch is not exactly NFC as for instance some places were passing a nullptr instead of the DataLayout, possibly just because there was a default value on the DataLayout argument to many functions in the API. Even though it is not purely NFC, there is no change in the validation. I turned as many pointer to DataLayout to references, this helped figuring out all the places where a nullptr could come up. I had initially a local version of this patch broken into over 30 independant, commits but some later commit were cleaning the API and touching part of the code modified in the previous commits, so it seemed cleaner without the intermediate state. Test Plan: Reviewers: echristo Subscribers: llvm-commits From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231740
OpenPOWER on IntegriCloud