summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [LoopUnroll] Enable option to peel remainder loopSam Parker2017-08-141-7/+24
| | | | | | | | | | | | | | | | | | | | On some targets, the penalty of executing runtime unrolling checks and then not the unrolled loop can be significantly detrimental to performance. This results in the need to be more conservative with the unroll count, keeping a trip count of 2 reduces the overhead as well as increasing the chance of the unrolled body being executed. But being conservative leaves performance gains on the table. This patch enables the unrolling of the remainder loop introduced by runtime unrolling. This can help reduce the overhead of misunrolled loops because the cost of non-taken branches is much less than the cost of the backedge that would normally be executed in the remainder loop. This allows larger unroll factors to be used without suffering performance loses with smaller iteration counts. Differential Revision: https://reviews.llvm.org/D36309 llvm-svn: 310824
* [RuntimeUnroll] NFC: Add a profitability function for mutliexit loopAnna Thomas2017-07-211-9/+24
| | | | | | | | | | | Separated out the profitability from the safety analysis for multiexit loop unrolling. Currently, this is an NFC because profitability is true only if the unroll-runtime-multi-exit is set to true (off-by-default). This is to ease adding the profitability heuristic up for review at D35380. llvm-svn: 308753
* Fix unused variable warning on EXPENSIVE_CHECKS release builds. NFCI.Simon Pilgrim2017-07-131-1/+1
| | | | llvm-svn: 307929
* [RuntimeUnrolling] Update DomTree correctly when exit blocks have successorsAnna Thomas2017-07-131-2/+28
| | | | | | | | | | | | | | | | Summary: When we runtime unroll with multiple exit blocks, we also need to update the immediate dominators of the immediate successors of the exit blocks. Reviewers: reames, mkuper, mzolotukhin, apilipenko Reviewed by: mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35304 llvm-svn: 307909
* [LoopUnrollRuntime] NFC: Refactored safety checks of unrolling multi-exit loopAnna Thomas2017-07-121-47/+58
| | | | | | | | | | | Refactored the code and separated out a function `canSafelyUnrollMultiExitLoop` to reduce redundant checks and make it easier to add profitability heuristics later. Added tests to runtime unrolling to make sure that unrolling for multi-exit loops is not done unless the option -unroll-runtime-multi-exit is true. llvm-svn: 307843
* [LoopUnrollRuntime] NFC: Add some debugging trace messages for why loop ↵Anna Thomas2017-07-111-8/+30
| | | | | | wasn't unrolled. llvm-svn: 307705
* [LoopUnrollRuntime] Avoid multi-exit nested loop with epilog generationAnna Thomas2017-07-111-2/+10
| | | | | | | | | | The loop structure for the outer loop does not contain the epilog preheader when we try to unroll inner loop with multiple exits and epilog code is generated. For now, we just bail out in such cases. Added a test case that shows the problem. Without this bailout, we would trip on assert saying LCSSA form is incorrect for outer loop. llvm-svn: 307676
* [LoopUnrollRuntime] Remove strict assert about VMap requirementAnna Thomas2017-07-101-4/+3
| | | | | | | | | | | | | When unrolling under multiple exits which is under off-by-default option, the assert that checks for VMap entry in loop exit values is too strong. (assert if VMap entry did not exist, the value should be a constant). However, values derived from constants or from values outside loop, does not have a VMap entry too. Removed the assert and added a testcase showcasing the property for non-constant values. llvm-svn: 307542
* [LoopUnrollRuntime] Support multiple exit blocks unrolling when prolog ↵Anna Thomas2017-07-071-2/+0
| | | | | | | | | | | | | | | remainder generated With the NFC refactoring in rL307417 (git SHA 987dd01), all the logic is in place to support multiple exit/exiting blocks when prolog remainder is generated. This patch removed the assert that multiple exit blocks unrolling is only supported when epilog remainder is generated. Also, added test runs and checks with PROLOG prefix in runtime-loop-multiple-exits.ll test cases. llvm-svn: 307435
* [LoopUnrollRuntime] NFC: use the precomputed loop exit in ConnectPrologAnna Thomas2017-07-071-11/+11
| | | | | | | | | Minor refactoring to use the preexisting loop exit that's already calculated. We do not need to recompute the loop exit in ConnectProlog. Apart from avoiding redundant computation, this is required for supporting multiple loop exits when Prolog remainder loops are generated. llvm-svn: 307417
* [LoopUnrollRuntime] Bailout when multiple exiting blocks to the unique latch ↵Anna Thomas2017-07-061-4/+5
| | | | | | | | | | | | | | exit block Currently, we do not support multiple exiting blocks to the latch exit block. However, this bailout wasn't triggered when we had a unique exit block (which is the latch exit), with multiple exiting blocks to that unique exit. Moved the bailout so that it's triggered in both cases and added testcase. llvm-svn: 307291
* [RuntimeUnrolling] Add logic for loops with multiple exit blocksAnna Thomas2017-06-301-23/+101
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Runtime unrolling is done for loops with a single exit block and a single exiting block (and this exiting block should be the latch block). This patch adds logic to support unrolling in the presence of multiple exit blocks (which also means multiple exiting blocks). Currently this is under an off-by-default option and is supported when epilog code is generated. Support in presence of prolog code will be in a future patch (we just need to add more tests, and update comments). This patch is essentially an implementation patch. I have not added any heuristic (in terms of branches added or code size) to decide when this should be enabled. Reviewers: mkuper, sanjoy, reames, evstupac Reviewed by: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33001 llvm-svn: 306846
* [LoopUnrollRuntime] Use SCEV exit count for calculating trip count. NFCIAnna Thomas2017-06-271-1/+5
| | | | | | | Instead of getBackEdgeTakenCount, use getExitCount on the latch exiting block (which is proven to be the only exiting block in the loop to be unrolled). llvm-svn: 306410
* [RuntimeLoopUnrolling] Rename exit block and move assert earlier. NFCAnna Thomas2017-06-231-23/+23
| | | | | | | The single exit block allowed in runtime unrolling is guaranteed to be the Latch's successor, so rename it as LatchExitBlock. llvm-svn: 306105
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* Avoid warning of unused variable in release builds. NFCAnna Thomas2017-05-031-0/+2
| | | | llvm-svn: 302068
* Fix PPC64 warning for missing parantheses. NFC.Anna Thomas2017-05-031-3/+4
| | | | llvm-svn: 302061
* [RuntimeLoopUnroller] Add assert that we dont unroll non-rotated loopsAnna Thomas2017-05-031-0/+7
| | | | | | | | | | | | | | | | | | Summary: Cloning basic blocks in the loop for runtime loop unroller depends on loop being in rotated form (i.e. loop latch target is the exit block). Assert that this is true, so that callers of runtime loop unroller pass in canonical loops. The single caller of this function has that check recently added: https://reviews.llvm.org/rL301239 Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32801 llvm-svn: 302058
* [LoopUnroll] Use addClonedBlockToLoopInfo to clone the top level loop (NFC)Florian Hahn2017-01-311-14/+6
| | | | | | | | | | | | | | | | | | | | | Summary: rL293124 added the necessary infrastructure to properly add the cloned top level loop to LoopInfo, which means we do not have to do it manually in CloneLoopBlocks. @mkuper sorry for not pointing this out during my review of D29156, I just realized that today. Reviewers: mzolotukhin, chandlerc, mkuper Reviewed By: mkuper Subscribers: llvm-commits, mkuper Differential Revision: https://reviews.llvm.org/D29173 llvm-svn: 293615
* [LoopUnroll] Properly update loopinfo for runtime unrolling by 2Michael Kuperstein2017-01-261-5/+10
| | | | | | | | | | | Even when we don't create a remainder loop (that is, when we unroll by 2), we may duplicate nested loops into the remainder. This is complicated by the fact the remainder may itself be either inserted into an outer loop, or at the top level. In the latter case, we may need to create new top-level loops. Differential Revision: https://reviews.llvm.org/D29156 llvm-svn: 293124
* Preserve domtree and loop-simplify for runtime unrolling.Eli Friedman2017-01-181-3/+29
| | | | | | | | | | | | | | | | | Mostly straightforward changes; we just didn't do the computation before. One sort of interesting change in LoopUnroll.cpp: we weren't handling dominance for children of the loop latch correctly, but foldBlockIntoPredecessor hid the problem for complete unrolling. Currently punting on loop peeling; made some minor changes to isolate that problem to LoopUnrollPeel.cpp. Adds a flag -unroll-verify-domtree; it verifies the domtree immediately after we finish updating it. This is on by default for +Asserts builds. Differential Revision: https://reviews.llvm.org/D28073 llvm-svn: 292447
* [loop-unroll] Properly populate LoopInfo for loops cloned in LoopUnrollRuntime.Florian Hahn2017-01-101-3/+5
| | | | | | | | | | | | | | | | Summary: This fixes Transforms/LoopUnroll/runtime-loop3.ll which failed with EXTENSIVE_DEBUG, because the cloned basic blocks were not added to the correct sub-loops in LoopUnrollRuntime.cpp. Reviewers: dexonsmith, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28482 llvm-svn: 291619
* Revert "[LoopUnroll] Properly update loop-info when cloning prologues and ↵Michael Zolotukhin2016-09-081-54/+11
| | | | | | | | | | epilogues." This reverts commit r280901. This caused a bunch of failures, reverting it until I investigate them. llvm-svn: 280905
* [LoopUnroll] Properly update loop-info when cloning prologues and epilogues.Michael Zolotukhin2016-09-081-11/+54
| | | | | | | | | | | | | | | | | | | Summary: When cloning blocks for prologue/epilogue we need to replicate the loop structure from the original loop. It wasn't a problem for the innermost loops, but it led to an incorrect loop info when we unrolled a loop with a child loop - in this case created prologue-loop had a child loop, but loop info didn't reflect that. This fixes PR28888. Reviewers: chandlerc, sanjoy, hfinkel Subscribers: llvm-commits, silvas Differential Revision: https://reviews.llvm.org/D24203 llvm-svn: 280901
* [UNROLL] Postpone ScalarEvolution::forgetLoop after TripCountSC is expandedWei Mi2016-08-251-5/+6
| | | | | | | | | | | | | | when unroll runtime iteration loop. In llvm::UnrollRuntimeLoopRemainder, if the loop to be unrolled is the inner loop inside a loop nest, the scalar evolution needs to be dropped for its parent loop which is done by ScalarEvolution::forgetLoop. However, we can postpone forgetLoop to the end of UnrollRuntimeLoopRemainder so TripCountSC expansion can still reuse existing value. Differential Revision: https://reviews.llvm.org/D23572 llvm-svn: 279748
* [LoopUnroll] Ensure we create prolog loops in simplified form.Michael Zolotukhin2016-08-021-0/+12
| | | | llvm-svn: 277502
* The patch fixes PR27392.Evgeny Stupachenko2016-04-271-9/+10
| | | | | | | | | | | | | | | | | Summary: It is incorrect to compare TripCount (which is BECount + 1) with extraiters (or Count) to check if we should enter unrolled loop or not, because TripCount can potentially overflow (when BECount is max unsigned integer). While comparing BECount with (Count - 1) is overflow safe and therefore correct. Reviewer: hfinkel Differential Revision: http://reviews.llvm.org/D19256 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 267662
* Transforms: Fix bootstrap after r266565Duncan P. N. Exon Smith2016-04-171-4/+4
| | | | | | | | | | | | Apparently there isn't test coverage for all of these. I'd appreciate if someone with could reproduce and send me something to reduce, but for now I've just looked for users of RemapInstruction and MapValue and ensured they don't accidentally insert nullptr. Here is one of the bootstraps that caught: http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/11494 llvm-svn: 266567
* test commitEvgeny Stupachenko2016-04-081-2/+2
| | | | llvm-svn: 265840
* IR: RF_IgnoreMissingValues => RF_IgnoreMissingLocals, NFCDuncan P. N. Exon Smith2016-04-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Clarify what this RemapFlag actually means. - Change the flag name to match its intended behaviour. - Clearly document that it's not supposed to affect globals. - Add a host of FIXMEs to indicate how to fix the behaviour to match the intent of the flag. RF_IgnoreMissingLocals should only affect the behaviour of RemapInstruction for function-local operands; namely, for operands of type Argument, Instruction, and BasicBlock. Currently, it is *only* passed into RemapInstruction calls (and the transitive MapValue calls that it makes). When I split Metadata from Value I didn't understand the flag, and I used it in a bunch of places for "global" metadata. This commit doesn't have any functionality change, but prepares to cleanup MapMetadata and MapValue. llvm-svn: 265628
* Adds the ability to use an epilog remainder loop during loop unrolling and makesDavid L Kreitzer2016-04-051-74/+326
| | | | | | | | | | this the default behavior. Patch by Evgeny Stupachenko (evstupac@gmail.com). Differential Revision: http://reviews.llvm.org/D18158 llvm-svn: 265388
* Enable non-power-of-2 #pragma unroll counts.David L Kreitzer2016-03-251-21/+31
| | | | | | | | Patch by Evgeny Stupachenko. Differential Revision: http://reviews.llvm.org/D18202 llvm-svn: 264407
* [SCEVExpander] Make findExistingExpansion smarterJunmo Park2016-02-161-3/+5
| | | | | | | | | | | | | Summary: Extending findExistingExpansion can use existing value in ExprValueMap. This patch gives 0.3~0.5% performance improvements on benchmarks(test-suite, spec2000, spec2006, commercial benchmark) Reviewers: mzolotukhin, sanjoy, zzheng Differential Revision: http://reviews.llvm.org/D15559 llvm-svn: 260938
* Fix typo in comment.Justin Lebar2016-02-121-1/+1
| | | | llvm-svn: 260731
* rangify; NFCSanjay Patel2016-02-081-7/+5
| | | | llvm-svn: 260151
* fix typos; NFCSanjay Patel2016-02-081-17/+17
| | | | llvm-svn: 260130
* Minor code formatting cleanup. NFC.Junmo Park2016-01-281-1/+1
| | | | llvm-svn: 259010
* LPM: Stop threading `Pass *` through all of the loop utility APIs. NFCJustin Bogner2015-12-151-17/+10
| | | | | | | | | | | | | | | | | | | | | | A large number of loop utility functions take a `Pass *` and reach into it to find out which analyses to preserve. There are a number of problems with this: - The APIs have access to pretty well any Pass state they want, so it's hard to tell what they may or may not do. - Other APIs have copied these and pass around a `Pass *` even though they don't even use it. Some of these just hand a nullptr to the API since the callers don't even have a pass available. - Passes in the new pass manager don't work like the current ones, so the APIs can't be used as is there. Instead, we should explicitly thread the analysis results that we actually care about through these APIs. This is both simpler and more reusable. llvm-svn: 255669
* TransformUtils: Remove implicit ilist iterator conversions, NFCDuncan P. N. Exon Smith2015-10-131-5/+5
| | | | | | | | | | | Continuing the work from last week to remove implicit ilist iterator conversions. First related commit was probably r249767, with some more motivation in r249925. This edition gets LLVMTransformUtils compiling without the implicit conversions. No functional change intended. llvm-svn: 250142
* Fix Clang-tidy modernize-use-nullptr warnings in source directories and ↵Hans Wennborg2015-10-061-1/+1
| | | | | | | | | | generated files; other minor cleanups. Patch by Eugene Zelenko! Differential Revision: http://reviews.llvm.org/D13321 llvm-svn: 249482
* [PM] Port ScalarEvolution to the new pass manager.Chandler Carruth2015-08-171-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change makes ScalarEvolution a stand-alone object and just produces one from a pass as needed. Making this work well requires making the object movable, using references instead of overwritten pointers in a number of places, and other refactorings. I've also wired it up to the new pass manager and added a RUN line to a test to exercise it under the new pass manager. This includes basic printing support much like with other analyses. But there is a big and somewhat scary change here. Prior to this patch ScalarEvolution was never *actually* invalidated!!! Re-running the pass just re-wired up the various other analyses and didn't remove any of the existing entries in the SCEV caches or clear out anything at all. This might seem OK as everything in SCEV that can uses ValueHandles to track updates to the values that serve as SCEV keys. However, this still means that as we ran SCEV over each function in the module, we kept accumulating more and more SCEVs into the cache. At the end, we would have a SCEV cache with every value that we ever needed a SCEV for in the entire module!!! Yowzers. The releaseMemory routine would dump all of this, but that isn't realy called during normal runs of the pipeline as far as I can see. To make matters worse, there *is* actually a key that we don't update with value handles -- there is a map keyed off of Loop*s. Because LoopInfo *does* release its memory from run to run, it is entirely possible to run SCEV over one function, then over another function, and then lookup a Loop* from the second function but find an entry inserted for the first function! Ouch. To make matters still worse, there are plenty of updates that *don't* trip a value handle. It seems incredibly unlikely that today GVN or another pass that invalidates SCEV can update values in *just* such a way that a subsequent run of SCEV will incorrectly find lookups in a cache, but it is theoretically possible and would be a nightmare to debug. With this refactoring, I've fixed all this by actually destroying and recreating the ScalarEvolution object from run to run. Technically, this could increase the amount of malloc traffic we see, but then again it is also technically correct. ;] I don't actually think we're suffering from tons of malloc traffic from SCEV because if we were, the fact that we never clear the memory would seem more likely to have come up as an actual problem before now. So, I've made the simple fix here. If in fact there are serious issues with too much allocation and deallocation, I can work on a clever fix that preserves the allocations (while clearing the data) between each run, but I'd prefer to do that kind of optimization with a test case / benchmark that shows why we need such cleverness (and that can test that we actually make it faster). It's possible that this will make some things faster by making the SCEV caches have higher locality (due to being significantly smaller) so until there is a clear benchmark, I think the simple change is best. Differential Revision: http://reviews.llvm.org/D12063 llvm-svn: 245193
* [PM/AA] Remove all of the dead AliasAnalysis pointers being threadedChandler Carruth2015-07-221-5/+5
| | | | | | | | | | through APIs that are no longer necessary now that the update API has been removed. This will make changes to the AA interfaces significantly less disruptive (I hope). Either way, it seems like a really nice cleanup. llvm-svn: 242882
* [LoopUnroll] Use undef for phis with no value liveDavid Majnemer2015-07-011-1/+1
| | | | | | | | We would create a phi node with a zero initialized operand instead of undef in the case where no value was originally available. This was problematic for x86_mmx which has no null value. llvm-svn: 241143
* [LoopUnroll] Use IRBuilder to create branch instructions.Alexey Samsonov2015-06-111-10/+9
| | | | | | | | | | | | | | | | | | Use IRBuilder::Create(Cond)?Br instead of constructing instructions manually with BranchInst::Create(). It's consistent with other uses of IRBuilder in this pass, and has an additional important benefit: Using IRBuilder will ensure that new branch instruction will get the same debug location as original terminator instruction it will eventually replace. For now I'm not adding a testcase, as currently original terminator instruction also lack debug location due to missing debug location propagation in BasicBlock::splitBasicBlock. That is, the testcase will accompany the fix for the latter I'm going to mail soon. llvm-svn: 239550
* [LoopUnrollRuntime] Avoid high-cost trip count computation.Sanjoy Das2015-04-141-4/+8
| | | | | | | | | | | | | | | | | Summary: Runtime unrolling of loops needs to emit an expression to compute the loop's runtime trip-count. Avoid runtime unrolling if this computation will be expensive. Depends on D8993. Reviewers: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8994 llvm-svn: 234846
* [LoopUnrollRuntime] Clean up a predicate.Sanjoy Das2015-04-121-3/+2
| | | | | | | | Clean up a predicate I added in r229731, fix the relevant comment and add a test case. The earlier version is confusing to read and was also buggy (probably not a coincidence) till Alexey fixed it in r233881. llvm-svn: 234701
* Fix a bug indicated by -fsanitize=shift-exponent.Alexey Samsonov2015-04-021-1/+1
| | | | llvm-svn: 233881
* DataLayout is mandatory, update the API to reflect it with references.Mehdi Amini2015-03-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Now that the DataLayout is a mandatory part of the module, let's start cleaning the codebase. This patch is a first attempt at doing that. This patch is not exactly NFC as for instance some places were passing a nullptr instead of the DataLayout, possibly just because there was a default value on the DataLayout argument to many functions in the API. Even though it is not purely NFC, there is no change in the validation. I turned as many pointer to DataLayout to references, this helped figuring out all the places where a nullptr could come up. I had initially a local version of this patch broken into over 30 independant, commits but some later commit were cleaning the API and touching part of the code modified in the previous commits, so it seemed cleaner without the intermediate state. Test Plan: Reviewers: echristo Subscribers: llvm-commits From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231740
* Revert r231630 - Run LICM pass after loop unrolling pass.Kevin Qin2015-03-091-3/+6
| | | | | | As it broke llvm bootstrap. llvm-svn: 231635
* Run LICM pass after loop unrolling pass.Kevin Qin2015-03-091-6/+3
| | | | | | | | | Runtime unrollng will introduce a runtime check in loop prologue. If the unrolled loop is a inner loop, then the proglogue will be inside the outer loop. LICM pass can help to promote the runtime check out if the checked value is loop invariant. llvm-svn: 231630
OpenPOWER on IntegriCloud