summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopUnroll
Commit message (Collapse)AuthorAgeFilesLines
...
* [LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops.Michael Kruse2018-12-181-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | When using clang with `-fno-unroll-loops` (implicitly added with `-O1`), the LoopUnrollPass is not not added to the (legacy) pass pipeline. This also means that it will not process any loop metadata such as llvm.loop.unroll.enable (which is generated by #pragma unroll or WarnMissedTransformationsPass emits a warning that a forced transformation has not been applied (see https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html). Such explicit transformations should take precedence over disabling heuristics. This patch unconditionally adds LoopUnrollPass to the optimizing pipeline (that is, it is still not added with `-O0`), but passes a flag indicating whether automatic unrolling is dis-/enabled. This is the same approach as LoopVectorize uses. The new pass manager's pipeline builder has no option to disable unrolling, hence the problem does not apply. Differential Revision: https://reviews.llvm.org/D55716 llvm-svn: 349509
* [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.Michael Kruse2018-12-125-0/+184
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g. #pragma clang loop unroll_and_jam(enable) #pragma clang loop distribute(enable) is the same as #pragma clang loop distribute(enable) #pragma clang loop unroll_and_jam(enable) and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used. This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance, !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.unroll_and_jam.enable"} !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3} !3 = !{!"llvm.loop.distribute.enable"} defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop. Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account. For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations. Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated. To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied. With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling). Reviewed By: hfinkel, dmgreen Differential Revision: https://reviews.llvm.org/D49281 Differential Revision: https://reviews.llvm.org/D55288 llvm-svn: 348944
* [LoopUnroll] allow customization for new-pass-manager version of LoopUnrollFedor Sergeev2018-10-312-1/+35
| | | | | | | | | | | | | | | | | Unlike its legacy counterpart new pass manager's LoopUnrollPass does not provide any means to select which flavors of unroll to run (runtime, peeling, partial), relying on global defaults. In some cases having ability to run a restricted LoopUnroll that does more than LoopFullUnroll is needed. Introduced LoopUnrollOptions to select optional unroll behaviors. Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing. Reviewers: chandlerc, tejohnson Differential Revision: https://reviews.llvm.org/D53440 llvm-svn: 345723
* [LoopUnroll] NFC. Factor out runtime-loop.ll common test behavior.Fedor Sergeev2018-10-291-19/+24
| | | | | | | Adding COMMON prefix to get common part handled there. Needed to simplify test changes for D53440. llvm-svn: 345538
* [ARM] Use Cortex-A57 sched model for Cortex-A72Sam Parker2018-10-251-0/+1
| | | | | | | | | | | This mirrors what we already do for AArch64 as the cores are similar. As discussed in the review, enabling the machine scheduler causes more variations in performance changes so it is not enabled for now. This patch improves LNT scores by a geomean of 1.57% at -O3. Differential Revision: https://reviews.llvm.org/D53562 llvm-svn: 345272
* Remove LoopID metadata from the branch instructionVyacheslav Zakharin2018-09-262-15/+57
| | | | | | | | that follows the peeled iterations. Differential Revision: https://reviews.llvm.org/D52176 llvm-svn: 343054
* [LoopUnroll] Add check to Latch's terminator in UnrollRuntimeLoopRemainderDavid Green2018-09-251-0/+27
| | | | | | | | | | | | | In this patch, I'm adding an extra check to the Latch's terminator in llvm::UnrollRuntimeLoopRemainder, similar to how it is already done in the llvm::UnrollLoop. The compiler would crash if this function is called with a malformed loop. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D51486 llvm-svn: 342958
* AMDGPU: Fix tests using old number for constant address spaceMatt Arsenault2018-09-101-2/+2
| | | | llvm-svn: 341770
* [X86] Make Feature64Bit usefulCraig Topper2018-08-301-1/+1
| | | | | | | | | | | | We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit. I've changed the assert to a report_fatal_error so it's not lost in Release builds. The test updates are to fix things that tripped the new error. Differential Revision: https://reviews.llvm.org/D51231 llvm-svn: 341022
* LoopUnroll: Allow analyzing intrinsic call costsMatt Arsenault2018-06-261-0/+77
| | | | | | | | | | | I'm not sure why the code here is skipping calls since TTI does try to do something for general calls, but it at least should allow intrinsics. Skip intrinsics that should not be omitted as calls, which is by far the most common case on AMDGPU. llvm-svn: 335645
* [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.Shiva Chen2018-05-093-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841
* [LoopUnroll] Only peel if a predicate becomes known in the loop body.Florian Hahn2018-04-181-128/+141
| | | | | | | | | | | | | If a predicate does not become known after peeling, peeling is unlikely to be beneficial. Reviewers: mcrosier, efriedma, mkazantsev, junbuml Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D44983 llvm-svn: 330250
* [LoopUnroll] Make LoopPeeling respect the AllowPeeling preference.Chad Rosier2018-04-061-0/+3
| | | | | | | | The SimpleLoopUnrollPass isn't suppose to perform loop peeling. Differential Revision: https://reviews.llvm.org/D45334 llvm-svn: 329395
* peel loops with runtime small trip countsIkhlas Ajbar2018-04-031-0/+37
| | | | | | | | | | For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329042
* Revert "peel loops with runtime small trip counts"Krzysztof Parzyszek2018-03-301-37/+0
| | | | | | This reverts commit r328854, it breaks some Hexagon tests. llvm-svn: 328875
* [Hexagon] add missing lit config fileIkhlas Ajbar2018-03-301-0/+3
| | | | llvm-svn: 328855
* peel loops with runtime small trip countsIkhlas Ajbar2018-03-301-0/+37
| | | | | | | | | | For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 328854
* [LoopUnroll] Fix dangling pointers in SCEVMax Kazantsev2018-03-261-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Current logic of loop SCEV invalidation in Loop Unroller implicitly relies on fact that exit count of outer loops cannot rely on exiting blocks of inner loops, which is true in current implementation of backedge taken count calculation but is wrong in general. As result, when we only forget the loop that we have just unrolled, we may still have cached data for its outer loops (in particular, exit counts) which keeps references on blocks of inner loop that could have been changed or even deleted. The attached test demonstrates a situaton when after unrolling of innermost loop the outermost loop contains a dangling pointer on non-existant block. The problem shows up when we apply patch https://reviews.llvm.org/D44677 that makes SCEV smarter about exit count calculation. I am not sure if the bug exists without this patch, it appears that now it is accidentally correct just because in practice exact backedge taken count for outer loops with complex control flow inside is never calculated. But when SCEV learns to do so, this problem shows up. This patch replaces existing logic of SCEV loop invalidation with a correct one, which happens to be invalidation of outermost loop (which also leads to invalidation of all loops inside of it). It is the only way to ensure that no outer loop keeps dangling pointers on removed blocks, or just outdated information that has changed after unrolling. Differential Revision: https://reviews.llvm.org/D44818 Reviewed By: samparker llvm-svn: 328483
* [LoopUnroll] Simplify induction variables after peeling too.Florian Hahn2018-03-231-15/+10
| | | | | | | | | | | | | Loop peeling also has an impact on the induction variables, so we should benefit from induction variable simplification after peeling too. Reviewers: sanjoy, bogner, mzolotukhin, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D43878 llvm-svn: 328301
* [LoopUnroll] Peel off iterations if it makes conditions true/false.Florian Hahn2018-03-152-1/+614
| | | | | | | | | | | | | | | If the loop body contains conditions of the form IndVar < #constant, we can remove the checks by peeling off #constant iterations. This improves codegen for PR34364. Reviewers: mkuper, mkazantsev, efriedma Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D43876 llvm-svn: 327671
* [LoopUnroll] Ignore ephemeral values when checking full unroll profitability.Andrei Elovikov2018-03-151-0/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Before this patch call graph is like this in the LoopUnrollPass: tryToUnrollLoop ApproximateLoopSize collectEphemeralValues /* Use collected ephemeral values */ computeUnrollCount analyzeLoopUnrollCost /* Bail out from the analysis if loop contains CallInst */ This patch moves collection of the ephemeral values to the tryToUnrollLoop function and passes the collected values into both ApproximateLoopsize (as before) and additionally starts using them in analyzeLoopUnrollCost: tryToUnrollLoop collectEphemeralValues ApproximateLoopSize(EphValues) /* Use EphValues */ computeUnrollCount(EphValues) analyzeLoopUnrollCost(EphValues) /* Ignore ephemeral values - they don't contribute to the final cost */ /* Bail out from the analysis if loop contains CallInst */ Reviewers: mzolotukhin, evstupac, sanjoy Reviewed By: evstupac Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43931 llvm-svn: 327617
* LoopUnroll: respect pragma unroll when AllowRemainder is disabledYaxun Liu2018-03-022-26/+133
| | | | | | | | | | | | | Currently when AllowRemainder is disabled, pragma unroll count is not respected even though there is no remainder. This bug causes a loop fully unrolled in many cases even though the user specifies a unroll count. Especially it affects OpenCL/CUDA since in many cases a loop contains convergent instructions and currently AllowRemainder is disabled for such loops. Differential Revision: https://reviews.llvm.org/D43826 llvm-svn: 326585
* [SimplifyCFG] Re-apply Relax restriction for folding unconditional branchesSerguei Katkov2018-02-081-9/+4
| | | | | | | | | | | | | | | | | | | | | The commit rL308422 introduces a restriction for folding unconditional branches. Specifically if empty block with unconditional branch leads to header of the loop then elimination of this basic block is prohibited. However it seems this condition is redundantly strict. If elimination of this basic block does not introduce more back edges then we can eliminate this block. The patch implements this relax of restriction. The test profile/Linux/counter_promo_nest.c in compiler-rt project is updated to meet this change. Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: pacxx Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324572
* Revert [SimplifyCFG] Relax restriction for folding unconditional branchesSerguei Katkov2018-02-051-4/+9
| | | | | | | | | | | The patch causes the failure of the test compiler-rt/test/profile/Linux/counter_promo_nest.c To unblock buildbot, revert the patch while investigation is in progress. Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324214
* [SimplifyCFG] Relax restriction for folding unconditional branchesSerguei Katkov2018-02-051-9/+4
| | | | | | | | | | | | | | | | | | The commit rL308422 introduces a restriction for folding unconditional branches. Specifically if empty block with unconditional branch leads to header of the loop then elimination of this basic block is prohibited. However it seems this condition is redundantly strict. If elimination of this basic block does not introduce more back edges then we can eliminate this block. The patch implements this relax of restriction. Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: pacxx Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324208
* [AMDGPU] Switch to the new addr space mapping by defaultYaxun Liu2018-02-021-21/+21
| | | | | | | | This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
* [Unroll][DebugInfo] Propagate loop body's debug location to epilog preheaderZhaoshi Zheng2017-12-262-2/+132
| | | | | | | NewExit and epilog PreHeader should has the same debug loc as the original loop body, instead of original loop exit. llvm-svn: 321465
* loop-unroll: teach remapInstruction to update dbg.value intrinsics.Adrian Prantl2017-11-011-0/+75
| | | | | | | | Fixes PR35112. https://bugs.llvm.org/show_bug.cgi?id=35112 llvm-svn: 317138
* [LoopUnroll] Clean up remarks for unroll remainderDavid Green2017-10-311-1/+24
| | | | | | | | | | | | | | | | The optimisation remarks for loop unrolling with an unrolled remainder looks something like: test.c:7:18: remark: completely unrolled loop with 3 iterations [-Rpass=loop-unroll] C[i] += A[i*N+j]; ^ test.c:6:9: remark: unrolled loop by a factor of 4 with run-time trip count [-Rpass=loop-unroll] for(int j = 0; j < N; j++) ^ This removes the first of the two messages. Differential revision: https://reviews.llvm.org/D38725 llvm-svn: 316986
* [ARM] Allow unrolling of multi-block loops.Sam Parker2017-10-231-0/+316
| | | | | | | | | | | Before, loop unrolling was only enabled for loops with a single block. This restriction has been removed and replaced by: - allow a maximum of two exiting blocks, - a four basic block limit for cores with a branch predictor. Differential Revision: https://reviews.llvm.org/D38952 llvm-svn: 316313
* [ValueTracking] Enabling ValueTracking patch by default Nikolai Bozhenov2017-10-201-1/+1
| | | | | | | | | | | | | | | | | (recommit #2 after checking for timeout issue). The original patch was an improvement to IR ValueTracking on non-negative integers. It has been checked in to trunk (D18777, r284022). But was disabled by default due to performance regressions. Perf impact has improved. The patch would be enabled by default. Reviewers: reames, hfinkel Differential Revision: https://reviews.llvm.org/D34101 Patch by: Olga Chupina <olga.chupina@intel.com> llvm-svn: 316208
* The cost of splitting a large vector instruction is not being taken into ↵Graham Yiu2017-10-191-0/+74
| | | | | | | | | | account by the getUserCost function. This was leading to some loops being over unrolled. The cost of a vector instruction is now being multiplied by the cost of the type legalization. This will return a more accurate cost. Committing on behalf on Brad Nemanich (brad.nemanich@ibm.com) Differential Revision: https://reviews.llvm.org/D38961 llvm-svn: 316174
* Use a BumpPtrAllocator for Loop objectsSanjoy Das2017-09-281-2/+2
| | | | | | | | | | | | | | | Summary: And now that we no longer have to explicitly free() the Loop instances, we can (with more ease) use the destructor of LoopBase to do what LoopBase::clear() was doing. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38201 llvm-svn: 314375
* Tighten the invariants around LoopBase::invalidateSanjoy Das2017-09-201-2/+2
| | | | | | | | | | | | | | | | | Summary: With this change: - Methods in LoopBase trip an assert if the receiver has been invalidated - LoopBase::clear frees up the memory held the LoopBase instance This change also shuffles things around as necessary to work with this stricter invariant. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38055 llvm-svn: 313708
* [RuntimeUnroll] Add heuristic for unrolling multi-exit loopAnna Thomas2017-09-151-0/+94
| | | | | | | | | | | | | | | | | | | | | Add a profitability heuristic to enable runtime unrolling of multi-exit loop: There can be atmost two unique exit blocks for the loop and the second exit block should be a deoptimizing block. Also, there can be one other exiting block other than the latch exiting block. The reason for the latter is so that we limit the number of branches in the unrolled code to being at most the unroll factor. Deoptimizing blocks are rarely taken so these additional number of branches created due to the unrolling are predictable, since one of their target is the deopt block. Reviewers: apilipenko, reames, evstupac, mkuper Subscribers: llvm-commits Reviewed by: reames Differential Revision: https://reviews.llvm.org/D35380 llvm-svn: 313363
* [RuntimeUnrolling] Populate the VMap entry correctly when default generated ↵Anna Thomas2017-09-151-0/+44
| | | | | | | | | | | | | | | | | through lookup During runtime unrolling on loops with multiple exits, we update the exit blocks with the correct phi values from both original and remainder loop. In this process, we lookup the VMap for the mapped incoming phi values, but did not update the VMap if a default entry was generated in the VMap during the lookup. This default value is generated when constants or values outside the current loop are looked up. This patch fixes the assertion failure when null entries are present in the VMap because of this lookup. Added a testcase that showcases the problem. llvm-svn: 313358
* [LoopUnroll][DebugInfo] Don't add metadata to unrolled remainder loopSam Parker2017-09-041-2/+2
| | | | | | | | | | | | | | | Debug information can be, and was, corrupted when the runtime remainder loop was fully unrolled. This is because a !null node can be created instead of a unique one describing the loop. In this case, the original node gets incorrectly updated with the NewLoopID metadata. In the case when the remainder loop is going to be quickly fully unrolled, there isn't the need to add loop metadata for it anyway. Differential Revision: https://reviews.llvm.org/D37338 llvm-svn: 312471
* [LoopUnroll] Make the test for PR33437 actually useful.Davide Italiano2017-08-291-14/+27
| | | | | | | | | I forgot to specify -unroll-loop-peel, making this test not really effective. While here, adjust some details (naming and run line). Thanks to Sanjoy and Michael Z. for pointing out in their post-commit reviews. llvm-svn: 312015
* [LoopUnroll] Properly update loop structure in case of successful peeling.Davide Italiano2017-08-281-0/+30
| | | | | | | | | | | When peeling kicks in, it updates the loop preheader. Later, a successful full unroll of the loop needs to update a PHI which i-th argument comes from the loop preheader, so it'd better look at the correct block. Fixes PR33437. Differential Revision: https://reviews.llvm.org/D37153 llvm-svn: 311922
* Changed basic cost of store operation on X86Elena Demikhovsky2017-08-201-0/+104
| | | | | | | | | Store operation takes 2 UOps on X86 processors. The exact cost calculation affects several optimization passes including loop unroling. This change compensates performance degradation caused by https://reviews.llvm.org/D34458 and shows improvements on some benchmarks. Differential Revision: https://reviews.llvm.org/D35888 llvm-svn: 311285
* [ARM] Improve loop unrolling for Cortex-MSam Parker2017-08-161-92/+128
| | | | | | | | | | | - Set the default runtime unroll count to 4 and use the newly added UnrollRemainder option. - Create loop cost and force unroll for a cost less than 12. - Disable unrolling on Thumb1 only targets. Differential Revision: https://reviews.llvm.org/D36134 llvm-svn: 310997
* [LoopUnroll] Enable option to peel remainder loopSam Parker2017-08-141-0/+74
| | | | | | | | | | | | | | | | | | | | On some targets, the penalty of executing runtime unrolling checks and then not the unrolled loop can be significantly detrimental to performance. This results in the need to be more conservative with the unroll count, keeping a trip count of 2 reduces the overhead as well as increasing the chance of the unrolled body being executed. But being conservative leaves performance gains on the table. This patch enables the unrolling of the remainder loop introduced by runtime unrolling. This can help reduce the overhead of misunrolled loops because the cost of non-taken branches is much less than the cost of the backedge that would normally be executed in the remainder loop. This allows larger unroll factors to be used without suffering performance loses with smaller iteration counts. Differential Revision: https://reviews.llvm.org/D36309 llvm-svn: 310824
* [PM] Relax the spelling of a pass name slightly in this test.Chandler Carruth2017-08-081-1/+1
| | | | | | I forgot that MSVC doesn't preserve this typedef, my bad. llvm-svn: 310334
* [PM] Fix new LoopUnroll function pass by invalidating loop analysisChandler Carruth2017-08-081-0/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | results when a loop is completely removed. This is very hard to manifest as a visible bug. You need to arrange for there to be a subsequent allocation of a 'Loop' object which gets the exact same address as the one which the unroll deleted, and you need the LoopAccessAnalysis results to be significant in the way that they're stale. And you need a million other things to align. But when it does, you get a deeply mysterious crash due to actually finding a stale analysis result. This fixes the issue and tests for it by directly checking we successfully invalidate things. I have not been able to get *any* test case to reliably trigger this. Changes to LLVM itself caused the only test case I ever had to cease to crash. I've looked pretty extensively at less brittle ways of fixing this and they are actually very, very hard to do. This is a somewhat strange and unusual case as we have a pass which is deleting an IR unit, but is not running within that IR unit's pass framework (which is what handles this cleanly for the normal loop unroll). And where there isn't a definitive way to clear *all* of the stale cache entries. And where the pass *is* updating the core analysis that provides the IR units! For example, we don't have any of these problems with Function analyses because it is easy to clear out function analyses when the functions themselves may have been deleted -- we clear an entire module's worth! But that is too heavy of a hammer down here in the LoopAnalysisManager layer. A better long-term solution IMO is to require that AnalysisManager's make their keys durable to this kind of thing. Specifically, when caching an analysis for one IR unit that is conceptually "owned" by a higher level IR unit, the AnalysisManager should incorporate this into its data structures so that we can reliably clear these results without having to teach each and every pass to do so manually as we do here. But that is a change for another day as it will be a fairly invasive change to the AnalysisManager infrastructure. Until then, this fortunately seems to be quite rare. llvm-svn: 310333
* Use profile summary to disable peeling for huge working setsTeresa Johnson2017-08-031-18/+37
| | | | | | | | | | | | | | | | | | | | | Summary: Detect when the working set size of a profiled application is huge, by comparing the number of counts required to reach the hot percentile in the profile summary to a large threshold*. When the working set size is determined to be huge, disable peeling to avoid bloating the working set further. *Note that the selected threshold (15K) is significantly larger than the largest working set value in SPEC cpu2006 (which is gcc at around 11K). Reviewers: davidxl Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D36288 llvm-svn: 310005
* Disable loop peeling during full unrolling pass.Teresa Johnson2017-08-031-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Peeling should not occur during the full unrolling invocation early in the pipeline, but rather later with partial and runtime loop unrolling. The later loop unrolling invocation will also eventually utilize profile summary and branch frequency information, which we would like to use to control peeling. And for ThinLTO we want to delay peeling until the backend (post thin link) phase, just as we do for most types of unrolling. Ensure peeling doesn't occur during the full unrolling invocation by adding a parameter to the shared implementation function, similar to the way partial and runtime loop unrolling are disabled. Performance results for ThinLTO suggest this has a neutral to positive effect on some internal benchmarks. Reviewers: chandlerc, davidxl Subscribers: mzolotukhin, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36258 llvm-svn: 309966
* [PM] Split LoopUnrollPass and make partial unroller a function passTeresa Johnson2017-08-0211-58/+58
| | | | | | | | | | | | | | | | | | | | | | Summary: This is largely NFC*, in preparation for utilizing ProfileSummaryInfo and BranchFrequencyInfo analyses. In this patch I am only doing the splitting for the New PM, but I can do the same for the legacy PM as a follow-on if this looks good. *Not NFC since for partial unrolling we lose the updates done to the loop traversal (adding new sibling and child loops) - according to Chandler this is not very useful for partial unrolling, but it also means that the debugging flag -unroll-revisit-child-loops no longer works for partial unrolling. Reviewers: chandlerc Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D36157 llvm-svn: 309886
* [ARM] Enable partial and runtime unrollingSam Parker2017-07-252-0/+213
| | | | | | | | | | Enable runtime and partial loop unrolling of simple loops without calls on M-class cores. The thresholds are calculated based on whether the target is Thumb or Thumb-2. Differential Revision: https://reviews.llvm.org/D34619 llvm-svn: 308956
* [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it ↵Balaram Makam2017-07-191-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | can destroy canonical loop structure. Summary: When simplifying unconditional branches from empty blocks, we pre-test if the BB belongs to a set of loop headers and keep the block to prevent passes from destroying canonical loop structure. However, the current algorithm fails if the destination of the branch is a loop header. Especially when such a loop's latch block is folded into loop header it results in additional backedges and LoopSimplify turns it into a nested loop which prevent later optimizations from being applied (e.g., loop unrolling and loop interleaving). This patch augments the existing algorithm by further checking if the destination of the branch belongs to a set of loop headers and defer eliminating it if yes to LateSimplifyCFG. Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605 Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: efriedma Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35411 llvm-svn: 308422
* [RuntimeUnrolling] Update DomTree correctly when exit blocks have successorsAnna Thomas2017-07-131-0/+126
| | | | | | | | | | | | | | | | Summary: When we runtime unroll with multiple exit blocks, we also need to update the immediate dominators of the immediate successors of the exit blocks. Reviewers: reames, mkuper, mzolotukhin, apilipenko Reviewed by: mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35304 llvm-svn: 307909
OpenPOWER on IntegriCloud