summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Remove trailing spaceFangrui Song2018-07-301-2/+2
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* [UnrollAndJam] New Unroll and Jam passDavid Green2018-07-011-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder Loop So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 336062
* LoopUnroll: Allow analyzing intrinsic call costsMatt Arsenault2018-06-261-2/+7
| | | | | | | | | | | I'm not sure why the code here is skipping calls since TTI does try to do something for general calls, but it at least should allow intrinsics. Skip intrinsics that should not be omitted as calls, which is by far the most common case on AMDGPU. llvm-svn: 335645
* [NFC] fix trivial typos in commentsHiroshi Inoue2018-06-141-3/+3
| | | | llvm-svn: 334687
* Revert 333358 as it's failing on some builders.David Green2018-05-271-10/+9
| | | | | | I'm guessing the tests reply on the ARM backend being built. llvm-svn: 333359
* [UnrollAndJam] Add a new Unroll and Jam passDavid Green2018-05-271-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now-jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 333358
* Rename DEBUG macro to LLVM_DEBUG.Nicola Zaghen2018-05-141-32/+36
| | | | | | | | | | | | | | | | The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-3/+3
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* [Hexagon] peel loops with runtime small trip countsIkhlas Ajbar2018-04-031-3/+0
| | | | | | | | Move the check canPeel() to Hexagon Target before setting PeelCount. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329129
* peel loops with runtime small trip countsIkhlas Ajbar2018-04-031-0/+3
| | | | | | | | | | For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329042
* Transforms: Introduce Transforms/Utils.h rather than spreading the ↵David Blaikie2018-03-281-0/+1
| | | | | | | | | declarations amongst Scalar.h and IPO.h Fixes layering - Transforms/Utils shouldn't depend on including a Scalar or IPO header, because Scalar and IPO depend on Utils. llvm-svn: 328717
* [LoopUnroll] Peel off iterations if it makes conditions true/false.Florian Hahn2018-03-151-1/+1
| | | | | | | | | | | | | | | If the loop body contains conditions of the form IndVar < #constant, we can remove the checks by peeling off #constant iterations. This improves codegen for PR34364. Reviewers: mkuper, mkazantsev, efriedma Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D43876 llvm-svn: 327671
* [LoopUnroll] Ignore ephemeral values when checking full unroll profitability.Andrei Elovikov2018-03-151-20/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Before this patch call graph is like this in the LoopUnrollPass: tryToUnrollLoop ApproximateLoopSize collectEphemeralValues /* Use collected ephemeral values */ computeUnrollCount analyzeLoopUnrollCost /* Bail out from the analysis if loop contains CallInst */ This patch moves collection of the ephemeral values to the tryToUnrollLoop function and passes the collected values into both ApproximateLoopsize (as before) and additionally starts using them in analyzeLoopUnrollCost: tryToUnrollLoop collectEphemeralValues ApproximateLoopSize(EphValues) /* Use EphValues */ computeUnrollCount(EphValues) analyzeLoopUnrollCost(EphValues) /* Ignore ephemeral values - they don't contribute to the final cost */ /* Bail out from the analysis if loop contains CallInst */ Reviewers: mzolotukhin, evstupac, sanjoy Reviewed By: evstupac Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43931 llvm-svn: 327617
* LoopUnroll: respect pragma unroll when AllowRemainder is disabledYaxun Liu2018-03-021-1/+1
| | | | | | | | | | | | | Currently when AllowRemainder is disabled, pragma unroll count is not respected even though there is no remainder. This bug causes a loop fully unrolled in many cases even though the user specifies a unroll count. Especially it affects OpenCL/CUDA since in many cases a loop contains convergent instructions and currently AllowRemainder is disabled for such loops. Differential Revision: https://reviews.llvm.org/D43826 llvm-svn: 326585
* Add hasProfileData() to check if a function has profile data. NFC.Easwaran Raman2017-12-221-1/+1
| | | | | | | | | | | | | | | | | | | Summary: This replaces calls to getEntryCount().hasValue() with hasProfileData that does the same thing. This refactoring is useful to do before adding synthetic function entry counts but also a useful cleanup IMO even otherwise. I have used hasProfileData instead of hasRealProfileData as David had earlier suggested since I think profile implies "real" and I use the phrase "synthetic entry count" and not "synthetic profile count" but I am fine calling it hasRealProfileData if you prefer. Reviewers: davidxl, silvas Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41461 llvm-svn: 321331
* Fix MSVC signed/unsigned comparison warningSimon Pilgrim2017-10-191-1/+1
| | | | llvm-svn: 316161
* [Transforms] Fix some Clang-tidy modernize and Include What You Use ↵Eugene Zelenko2017-10-181-29/+64
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 316128
* [LoopInfo][Refactor] Make SetLoopAlreadyUnrolled a member function of the ↵Hongbin Zheng2017-10-151-38/+1
| | | | | | | | | | Loop Pass, NFC. This avoid code duplication and allow us to add the disable unroll metadata elsewhere. Differential Revision: https://reviews.llvm.org/D38928 llvm-svn: 315850
* [NFC] Convert OptimizationRemarkEmitter old emit() calls to new closureVivek Pandya2017-10-111-27/+37
| | | | | | | | | | | | | | parameterized emit() calls Summary: This is not functional change to adopt new emit() API added in r313691. Reviewed By: anemet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38285 llvm-svn: 315476
* Rename OptimizationDiagnosticInfo.* to OptimizationRemarkEmitter.*Adam Nemet2017-10-091-1/+1
| | | | | | | Sync it up with the name of the class actually defined here. This has been bothering me for a while... llvm-svn: 315249
* [LoopUnroll] Fix use after poison.Benjamin Kramer2017-09-281-1/+3
| | | | llvm-svn: 314418
* Use a BumpPtrAllocator for Loop objectsSanjoy Das2017-09-281-8/+15
| | | | | | | | | | | | | | | Summary: And now that we no longer have to explicitly free() the Loop instances, we can (with more ease) use the destructor of LoopBase to do what LoopBase::clear() was doing. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38201 llvm-svn: 314375
* Fix -Wunused-variable for Release build.Rui Ueyama2017-09-271-1/+1
| | | | llvm-svn: 314353
* Return the LoopUnrollResult from tryToUnrollLoop; NFCSanjoy Das2017-09-271-31/+22
| | | | | | I will use this in a later change. llvm-svn: 314352
* Rename LoopUnrollStatus to LoopUnrollResult; NFCSanjoy Das2017-09-271-3/+3
| | | | | | A "Result" suffix is more appropriate here llvm-svn: 314350
* Tighten the invariants around LoopBase::invalidateSanjoy Das2017-09-201-6/+7
| | | | | | | | | | | | | | | | | Summary: With this change: - Methods in LoopBase trip an assert if the receiver has been invalidated - LoopBase::clear frees up the memory held the LoopBase instance This change also shuffles things around as necessary to work with this stricter invariant. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38055 llvm-svn: 313708
* [LoopUnroll] Add a cl::opt to force peeling, for testing purposes.Davide Italiano2017-08-281-0/+6
| | | | | | Will be used to test the patch proposed in D37153. llvm-svn: 311915
* [LoopUnroll] Enable option to peel remainder loopSam Parker2017-08-141-1/+9
| | | | | | | | | | | | | | | | | | | | On some targets, the penalty of executing runtime unrolling checks and then not the unrolled loop can be significantly detrimental to performance. This results in the need to be more conservative with the unroll count, keeping a trip count of 2 reduces the overhead as well as increasing the chance of the unrolled body being executed. But being conservative leaves performance gains on the table. This patch enables the unrolling of the remainder loop introduced by runtime unrolling. This can help reduce the overhead of misunrolled loops because the cost of non-taken branches is much less than the cost of the backedge that would normally be executed in the remainder loop. This allows larger unroll factors to be used without suffering performance loses with smaller iteration counts. Differential Revision: https://reviews.llvm.org/D36309 llvm-svn: 310824
* [PM] Fix new LoopUnroll function pass by invalidating loop analysisChandler Carruth2017-08-081-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | results when a loop is completely removed. This is very hard to manifest as a visible bug. You need to arrange for there to be a subsequent allocation of a 'Loop' object which gets the exact same address as the one which the unroll deleted, and you need the LoopAccessAnalysis results to be significant in the way that they're stale. And you need a million other things to align. But when it does, you get a deeply mysterious crash due to actually finding a stale analysis result. This fixes the issue and tests for it by directly checking we successfully invalidate things. I have not been able to get *any* test case to reliably trigger this. Changes to LLVM itself caused the only test case I ever had to cease to crash. I've looked pretty extensively at less brittle ways of fixing this and they are actually very, very hard to do. This is a somewhat strange and unusual case as we have a pass which is deleting an IR unit, but is not running within that IR unit's pass framework (which is what handles this cleanly for the normal loop unroll). And where there isn't a definitive way to clear *all* of the stale cache entries. And where the pass *is* updating the core analysis that provides the IR units! For example, we don't have any of these problems with Function analyses because it is easy to clear out function analyses when the functions themselves may have been deleted -- we clear an entire module's worth! But that is too heavy of a hammer down here in the LoopAnalysisManager layer. A better long-term solution IMO is to require that AnalysisManager's make their keys durable to this kind of thing. Specifically, when caching an analysis for one IR unit that is conceptually "owned" by a higher level IR unit, the AnalysisManager should incorporate this into its data structures so that we can reliably clear these results without having to teach each and every pass to do so manually as we do here. But that is a change for another day as it will be a fairly invasive change to the AnalysisManager infrastructure. Until then, this fortunately seems to be quite rare. llvm-svn: 310333
* Use profile summary to disable peeling for huge working setsTeresa Johnson2017-08-031-6/+18
| | | | | | | | | | | | | | | | | | | | | Summary: Detect when the working set size of a profiled application is huge, by comparing the number of counts required to reach the hot percentile in the profile summary to a large threshold*. When the working set size is determined to be huge, disable peeling to avoid bloating the working set further. *Note that the selected threshold (15K) is significantly larger than the largest working set value in SPEC cpu2006 (which is gcc at around 11K). Reviewers: davidxl Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D36288 llvm-svn: 310005
* Disable loop peeling during full unrolling pass.Teresa Johnson2017-08-031-20/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Peeling should not occur during the full unrolling invocation early in the pipeline, but rather later with partial and runtime loop unrolling. The later loop unrolling invocation will also eventually utilize profile summary and branch frequency information, which we would like to use to control peeling. And for ThinLTO we want to delay peeling until the backend (post thin link) phase, just as we do for most types of unrolling. Ensure peeling doesn't occur during the full unrolling invocation by adding a parameter to the shared implementation function, similar to the way partial and runtime loop unrolling are disabled. Performance results for ThinLTO suggest this has a neutral to positive effect on some internal benchmarks. Reviewers: chandlerc, davidxl Subscribers: mzolotukhin, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36258 llvm-svn: 309966
* [PM] Split LoopUnrollPass and make partial unroller a function passTeresa Johnson2017-08-021-26/+96
| | | | | | | | | | | | | | | | | | | | | | Summary: This is largely NFC*, in preparation for utilizing ProfileSummaryInfo and BranchFrequencyInfo analyses. In this patch I am only doing the splitting for the New PM, but I can do the same for the legacy PM as a follow-on if this looks good. *Not NFC since for partial unrolling we lose the updates done to the loop traversal (adding new sibling and child loops) - according to Chandler this is not very useful for partial unrolling, but it also means that the debugging flag -unroll-revisit-child-loops no longer works for partial unrolling. Reviewers: chandlerc Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D36157 llvm-svn: 309886
* [LoopUnroll] Fix bug in computeUnrollCount causing it to not honor MaxCountGeoff Berry2017-06-281-0/+2
| | | | | | | | | | Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: mcrosier, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D34532 llvm-svn: 306564
* [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.Geoff Berry2017-06-281-14/+14
| | | | | | | | | | Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554
* [IR] Redesign the case iterator in SwitchInst to actually be an iteratorChandler Carruth2017-04-121-1/+1
| | | | | | | | | | | | | | | | and to expose a handle to represent the actual case rather than having the iterator return a reference to itself. All of this allows the iterator to be used with common STL facilities, standard algorithms, etc. Doing this exposed some missing facilities in the iterator facade that I've fixed and required some work to the actual iterator to fully support the necessary API. Differential Revision: https://reviews.llvm.org/D31548 llvm-svn: 300032
* [LoopUnrolling] Re-prioritize Peeling and Partial unrollingSanjoy Das2017-03-031-9/+9
| | | | | | | | | | | | | | | | | | | | | | | Summary: In current implementation the loop peeling happens after trip-count based partial unrolling and may sometimes not happen at all due to it (for example, if trip count is known, but UP.Partial = false). This is generally bad, the more than there are some situations where peeling is profitable even if the partial unrolling is disabled. This patch is a NFC which reorders peeling and partial unrolling application and prepares the code for implementation of the said optimizations. Patch by Max Kazantsev! Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Reviewed By: mkuper Subscribers: mkuper, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30243 llvm-svn: 296897
* [LoopUnroll] Enable PGO-based loop peeling by default.Michael Kuperstein2017-02-221-2/+2
| | | | | | | | | This enables peeling of loops with low dynamic iteration count by default, when profile information is available. Differential Revision: https://reviews.llvm.org/D27734 llvm-svn: 295796
* Increases full-unroll threshold.Dehao Chen2017-02-181-23/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge): Performance: 403 0.11% 433 0.51% 445 0.48% 447 3.50% 453 1.49% 464 0.75% Code size: 403 0.56% 433 0.96% 445 2.16% 447 2.96% 453 0.94% 464 8.02% The compiler time overhead is similar with code size. Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc Reviewed By: hfinkel, chandlerc Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D28368 llvm-svn: 295538
* [PM] Simplify the new PM interface to the loop unroller and expose twoChandler Carruth2017-01-261-3/+10
| | | | | | | | | | | | | | factory functions for the two modes the loop unroller is actually used in in-tree: simplified full-unrolling and the entire thing including partial unrolling. I've also wired these up to nice names so you can express both of these being in a pipeline easily. This is a precursor to actually enabling these parts of the O2 pipeline. Differential Revision: https://reviews.llvm.org/D28897 llvm-svn: 293136
* [LoopUnroll] Properly update loopinfo for runtime unrolling by 2Michael Kuperstein2017-01-261-1/+2
| | | | | | | | | | | Even when we don't create a remainder loop (that is, when we unroll by 2), we may duplicate nested loops into the remainder. This is complicated by the fact the remainder may itself be either inserted into an outer loop, or at the top level. In the latter case, we may need to create new top-level loops. Differential Revision: https://reviews.llvm.org/D29156 llvm-svn: 293124
* [PM] Teach LoopUnroll to update the LPM infrastructure as it unrollsChandler Carruth2017-01-251-1/+74
| | | | | | | | | | | | | | | | | | | loops. We do this by reconstructing the newly added loops after the unroll completes to avoid threading pass manager details through all the mess of the unrolling infrastructure. I've enabled some extra assertions in the LPM to try and catch issues here and enabled a bunch of unroller tests to try and make sure this is sane. Currently, I'm manually running loop-simplify when needed. That should go away once it is folded into the LPM infrastructure. Differential Revision: https://reviews.llvm.org/D28848 llvm-svn: 293011
* Introduce -unroll-partial-threshold to separate PartialThreshold from ↵Dehao Chen2017-01-171-5/+9
| | | | | | | | | | | | | | | | Threshold in loop unorller. Summary: Partial unrolling should have separate threshold with full unrolling. Reviewers: efriedma, mzolotukhin Reviewed By: efriedma, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28831 llvm-svn: 292293
* [PM] Introduce an analysis set used to preserve all analyses overChandler Carruth2017-01-151-1/+1
| | | | | | | | | | | | | | | a function's CFG when that CFG is unchanged. This allows transformation passes to simply claim they preserve the CFG and analysis passes to check for the CFG being preserved to remove the fanout of all analyses being listed in all passes. I've gone through and removed or cleaned up as many of the comments reminding us to do this as I could. Differential Revision: https://reviews.llvm.org/D28627 llvm-svn: 292054
* [PM] Separate the LoopAnalysisManager from the LoopPassManager and moveChandler Carruth2017-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | the latter to the Transforms library. While the loop PM uses an analysis to form the IR units, the current plan is to have the PM itself establish and enforce both loop simplified form and LCSSA. This would be a layering violation in the analysis library. Fundamentally, the idea behind the loop PM is to *transform* loops in addition to running passes over them, so it really seemed like the most natural place to sink this was into the transforms library. We can't just move *everything* because we also have loop analyses that rely on a subset of the invariants. So this patch splits the the loop infrastructure into the analysis management that has to be part of the analysis library, and the transform-aware pass manager. This also required splitting the loop analyses' printer passes out to the transforms library, which makes sense to me as running these will transform the code into LCSSA in theory. I haven't split the unittest though because testing one component without the other seems nearly intractable. Differential Revision: https://reviews.llvm.org/D28452 llvm-svn: 291662
* [PM] Rewrite the loop pass manager to use a worklist and augmented runChandler Carruth2017-01-111-27/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arguments much like the CGSCC pass manager. This is a major redesign following the pattern establish for the CGSCC layer to support updates to the set of loops during the traversal of the loop nest and to support invalidation of analyses. An additional significant burden in the loop PM is that so many passes require access to a large number of function analyses. Manually ensuring these are cached, available, and preserved has been a long-standing burden in LLVM even with the help of the automatic scheduling in the old pass manager. And it made the new pass manager extremely unweildy. With this design, we can package the common analyses up while in a function pass and make them immediately available to all the loop passes. While in some cases this is unnecessary, I think the simplicity afforded is worth it. This does not (yet) address loop simplified form or LCSSA form, but those are the next things on my radar and I have a clear plan for them. While the patch is very large, most of it is either mechanically updating loop passes to the new API or the new testing for the loop PM. The code for it is reasonably compact. I have not yet updated all of the loop passes to correctly leverage the update mechanisms demonstrated in the unittests. I'll do that in follow-up patches along with improved FileCheck tests for those passes that ensure things work in more realistic scenarios. In many cases, there isn't much we can do with these until the loop simplified form and LCSSA form are in place. Differential Revision: https://reviews.llvm.org/D28292 llvm-svn: 291651
* Use continuous boosting factor for complete unroll.Dehao Chen2016-12-301-75/+32
| | | | | | | | | | | | | | | | | | | | Summary: The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously. The patch also simplified the code by reducing one parameter in UP. The patch only affects code-gen of two speccpu2006 benchmark: 445.gobmk binary size decreases 0.08%, no performance change. 464.h264ref binary size increases 0.24%, no performance change. Reviewers: mzolotukhin, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26989 llvm-svn: 290737
* Revert @llvm.assume with operator bundles (r289755-r289757)Daniel Jasper2016-12-191-7/+15
| | | | | | | This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086
* Remove the AssumptionCacheHal Finkel2016-12-151-15/+7
| | | | | | | | | After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756
* Change LoopUnrollPass cost from int to unsigned to make it consistent. (NFC)Dehao Chen2016-12-021-5/+5
| | | | llvm-svn: 288463
* [LoopUnroll] Implement profile-based loop peelingMichael Kuperstein2016-11-301-14/+35
| | | | | | | | | | | | | | | | | | | This implements PGO-driven loop peeling. The basic idea is that when the average dynamic trip-count of a loop is known, based on PGO, to be low, we can expect a performance win by peeling off the first several iterations of that loop. Unlike unrolling based on a known trip count, or a trip count multiple, this doesn't save us the conditional check and branch on each iteration. However, it does allow us to simplify the straight-line code we get (constant-folding, etc.). This is important given that we know that we will usually only hit this code, and not the actual loop. This is currently disabled by default. Differential Revision: https://reviews.llvm.org/D25963 llvm-svn: 288274
OpenPOWER on IntegriCloud