bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Use continuous boosting factor for complete unroll.	Dehao Chen	2016-12-30	1	-75/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously. The patch also simplified the code by reducing one parameter in UP. The patch only affects code-gen of two speccpu2006 benchmark: 445.gobmk binary size decreases 0.08%, no performance change. 464.h264ref binary size increases 0.24%, no performance change. Reviewers: mzolotukhin, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26989 llvm-svn: 290737
*	Revert @llvm.assume with operator bundles (r289755-r289757)	Daniel Jasper	2016-12-19	1	-7/+15
\| \| \| \| \| \| \|	This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086
*	Remove the AssumptionCache	Hal Finkel	2016-12-15	1	-15/+7
\| \| \| \| \| \| \| \| \|	After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756
*	Change LoopUnrollPass cost from int to unsigned to make it consistent. (NFC)	Dehao Chen	2016-12-02	1	-5/+5
\| \| \| \|	llvm-svn: 288463
*	[LoopUnroll] Implement profile-based loop peeling	Michael Kuperstein	2016-11-30	1	-14/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements PGO-driven loop peeling. The basic idea is that when the average dynamic trip-count of a loop is known, based on PGO, to be low, we can expect a performance win by peeling off the first several iterations of that loop. Unlike unrolling based on a known trip count, or a trip count multiple, this doesn't save us the conditional check and branch on each iteration. However, it does allow us to simplify the straight-line code we get (constant-folding, etc.). This is important given that we know that we will usually only hit this code, and not the actual loop. This is currently disabled by default. Differential Revision: https://reviews.llvm.org/D25963 llvm-svn: 288274
*	[LoopUnroll] Move code to exit early. NFC.	Haicheng Wu	2016-11-23	1	-10/+8
\| \| \| \| \| \| \| \|	Just to save some compilation time. Differential Revision: https://reviews.llvm.org/D26784 llvm-svn: 287800
*	Use profile info to adjust loop unroll threshold.	Dehao Chen	2016-11-17	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold. For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling. Reviewers: davidxl, mzolotukhin Subscribers: sanjoy, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D26527 llvm-svn: 287186
*	Minor unroll pass refacoring.	Evgeny Stupachenko	2016-11-09	1	-35/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Unrolled Loop Size calculations moved to a function. Constant representing number of optimized instructions when "back edge" becomes "fall through" replaced with variable. Some comments added. Reviewers: mzolotukhin Differential Revision: http://reviews.llvm.org/D21719 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 286389
*	[LoopUnroll] Check partial unrolling is enabled before initialization. NFC.	Haicheng Wu	2016-10-27	1	-2/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D23891 llvm-svn: 285330
*	Fix 80-char violations. NFC.	Michael Kuperstein	2016-10-25	1	-5/+10
\| \| \| \|	llvm-svn: 285092
*	[LoopUnroll] Keep the loop test only on the first iteration of max-or-zero loops	John Brawn	2016-10-21	1	-10/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we have a loop with a known upper bound on the number of iterations, and furthermore know that either the number of iterations will be either exactly that upper bound or zero, then we can fully unroll up to that upper bound keeping only the first loop test to check for the zero iteration case. Most of the work here is in plumbing this 'max-or-zero' information from the part of scalar evolution where it's detected through to loop unrolling. I've also gone for the safe default of 'false' everywhere but howManyLessThans which could probably be improved. Differential Revision: https://reviews.llvm.org/D25682 llvm-svn: 284818
*	Reapply "[LoopUnroll] Use the upper bound of the loop trip count to fullly ↵	Haicheng Wu	2016-10-12	1	-27/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	unroll a loop" Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049. The original summary: This patch tries to fully unroll loops having break statement like this for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } } GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts. The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code. The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count. llvm-svn: 284053
*	Revert "[LoopUnroll] Use the upper bound of the loop trip count to fullly ↵	Haicheng Wu	2016-10-12	1	-73/+27
\| \| \| \| \| \| \| \|	unroll a loop" This reverts commit r284044. llvm-svn: 284051
*	[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop	Haicheng Wu	2016-10-12	1	-27/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch tries to fully unroll loops having break statement like this for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } } GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts. The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code. The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count. Differential Revision: https://reviews.llvm.org/D24790 llvm-svn: 284044
*	Update loop unroller cost model to make sure debug info does not affect ↵	Dehao Chen	2016-09-30	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	optimization decisions. Summary: Debug info should not affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info. Reviewers: davidxl, mzolotukhin Subscribers: haicheng, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D25098 llvm-svn: 282894
*	[LoopUnroll] Port to the new streaming interface for opt remarks.	Adam Nemet	2016-09-30	1	-21/+28
\| \| \| \|	llvm-svn: 282834
*	[SystemZ] Implementation of getUnrollingPreferences().	Jonas Paulsson	2016-09-28	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit enables more unrolling for SystemZ by implementing the SystemZTargetTransformInfo::getUnrollingPreferences() method. It has been found that it is better to only unroll moderately, so the DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order to set this to a lower value for SystemZ (4). Reviewers: Evgeny Stupachenko, Ulrich Weigand. https://reviews.llvm.org/D24451 llvm-svn: 282570
*	[LoopUnroll] Correct a debug message. NFC.	Haicheng Wu	2016-09-07	1	-1/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D24299 llvm-svn: 280865
*	[LoopUnroll] Use OptimizationRemarkEmitter directly not via the analysis pass	Adam Nemet	2016-08-26	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can't mark ORE (a function pass) preserved as required by the loop passes because that is how we ensure that the required passes like LazyBFI are all available any time ORE is used. See the new comments in the patch. Instead we use it directly just like the inliner does in D22694. As expected there is some additional overhead after removing the caching provided by analysis passes. The worst case, I measured was LNT/CINT2006_ref/401.bzip2 which regresses by 12%. As before, this only affects -Rpass-with-hotness and not default compilation. llvm-svn: 279829
*	[LoopUnroll] By default disable unrolling when optimizing for size.	Michael Zolotukhin	2016-08-23	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In clang commit r268509 we started to invoke loop-unroll pass from the driver even under -Os. However, we happen to not initialize optsize thresholds properly, which si fixed with this change. r268509 led to some big compile time regressions, because we started to unroll some loops that we didn't unroll before. With this change I hope to recover most of the regressions. We still are slightly slower than before, because we do some checks here and there in loop-unrolling before we bail out, but at least the slowdown is not that huge now. Reviewers: hfinkel, chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D23388 llvm-svn: 279585
*	[LoopUnroll] Move a simple check earlier. NFC.	Haicheng Wu	2016-08-17	1	-5/+5
\| \| \| \| \| \| \| \|	Move the check of CallInst earlier to skip expensive recursive operations. Differential Revision: https://reviews.llvm.org/D23611 llvm-svn: 278998
*	Consistently use LoopAnalysisManager	Sean Silva	2016-08-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One exception here is LoopInfo which must forward-declare it (because the typedef is in LoopPassManager.h which depends on LoopInfo). Also, some includes for LoopPassManager.h were needed since that file provides the typedef. Besides a general consistently benefit, the extra layer of indirection allows the mechanical part of https://reviews.llvm.org/D23256 that requires touching every transformation and analysis to be factored out cleanly. Thanks to David for the suggestion. llvm-svn: 278079
*	[LoopUnroll] Include hotness of region in opt remark	Adam Nemet	2016-07-29	1	-22/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter is added to the common function analysis passes that loop passes depend on. The BFI and indirectly BPI used in this pass is computed lazily so no overhead should be observed unless -pass-remarks-with-hotness is used. This is how the patch affects the O3 pipeline: Dominator Tree Construction Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Rotate Loops Loop Invariant Code Motion Unswitch loops Simplify the CFG Dominator Tree Construction Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Combine redundant instructions Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Induction Variable Simplification Recognize loop idioms Delete dead loops Unroll loops ... llvm-svn: 277203
*	[PM] Port LoopUnroll.	Sean Silva	2016-07-19	1	-0/+33
\| \| \| \| \| \| \| \| \|	We just set PreserveLCSSA to always true since we don't have an analogous method `mustPreserveAnalysisID(LCSSA)`. Also port LoopInfo verifier pass to test LoopUnrollPass. llvm-svn: 276063
*	[LoopUnroll] Don't crash trying to unroll loop with EH pad exit	David Majnemer	2016-06-15	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We do not support splitting cleanuppad or catchswitches. This is problematic for passes which assume that a loop is in loop simplify form (the loop would have a dedicated exit block instead of sharing it). While it isn't great that we don't support this for cleanups, we still cannot make loop-simplify form an assertable precondition because indirectbr will also disable these sorts of CFG cleanups. This fixes PR28132. llvm-svn: 272739
*	The patch set unroll disable pragma when unroll	Evgeny Stupachenko	2016-06-08	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with user specified count has been applied. Summary: Previously SetLoopAlreadyUnrolled() set the disable pragma only if there was some loop metadata. Now it set the pragma in all cases. This helps to prevent multiple unroll when -unroll-count=N is given. Reviewers: mzolotukhin Differential Revision: http://reviews.llvm.org/D20765 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 272195
*	[LoopUnroll] Set correct thresholds for new recently enabled unrolling ↵	Michael Zolotukhin	2016-06-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	heuristic. In r270478, where I enabled the new heuristic I posted testing results, which I got when explicitly passed the thresholds values via CL options. However, setting the CL options init-values is not enough to change the default values of thresholds, so I'm changing them in another place now. llvm-svn: 271615
*	The patch fixes r271071	Evgeny Stupachenko	2016-05-28	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \|	Summary: unused variables in Release mode: BasicBlock *Header unsigned OrigCount put under DEBUG From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 271076
*	The patch refactors unroll pass.	Evgeny Stupachenko	2016-05-27	1	-201/+244
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Unroll factor (Count) calculations moved to a new function. Early exits on pragma and "-unroll-count" defined factor added. New type of unrolling "Force" introduced (previously used implicitly). New unroll preference "AllowRemainder" introduced and set "true" by default. (should be set to false for architectures that suffers from it). Reviewers: hfinkel, mzolotukhin, zzheng Differential Revision: http://reviews.llvm.org/D19553 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 271071
*	Apply clang-tidy's misc-move-constructor-init throughout LLVM.	Benjamin Kramer	2016-05-27	1	-2/+4
\| \| \| \| \| \|	No functionality change intended, maybe a tiny performance improvement. llvm-svn: 270997
*	[LoopUnrollAnalyzer] Fix a crash in analyzeLoopUnrollCost.	Michael Zolotukhin	2016-05-26	1	-20/+16
\| \| \| \| \| \| \| \| \|	Condition might be simplified to a Constant, but it doesn't have to be ConstantInt, so we should dyn_cast, instead of cast. This fixes PR27886. llvm-svn: 270924
*	Re-enable "[LoopUnroll] Enable advanced unrolling analysis by default" one ↵	Michael Zolotukhin	2016-05-24	1	-3/+3
\| \| \| \| \| \| \| \|	more time. This reverts commit r270577. llvm-svn: 270630
*	Revert r270518, which re-enabled "[LoopUnroll] Enable advanced unrolling ↵	Hans Wennborg	2016-05-24	1	-3/+3
\| \| \| \| \| \| \| \|	analysis by default. Chromium builds are still hitting the assert in PR27874. llvm-svn: 270577
*	Revert "Revert r270478 "[LoopUnroll] Enable advanced unrolling analysis by ↵	Michael Zolotukhin	2016-05-24	1	-3/+3
\| \| \| \| \| \| \| \| \|	default."" This reverts commit r270512 and reapplies r270478. Originally it caused PR27847, but it was fixed in r270517. llvm-svn: 270518
*	Revert r270478 "[LoopUnroll] Enable advanced unrolling analysis by default."	Hans Wennborg	2016-05-23	1	-3/+3
\| \| \| \| \| \|	This caused PR27847. llvm-svn: 270512
*	[LoopUnroll] Enable advanced unrolling analysis by default.	Michael Zolotukhin	2016-05-23	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch turns on LoopUnrollAnalyzer by default. To mitigate compile time regressions, I chose very conservative thresholds for now. Later we can make them more aggressive, but it might require being smarter in which loops we're optimizing. E.g. currently the biggest issue is that with more agressive thresholds we unroll many cold loops, which increases compile time for no performance benefit (performance of those loops is improved, but it doesn't matter since they are cold). Test results for compile time(using 4 samples to reduce noise): ``` MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 5.19% SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect 4.19% MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow 3.39% MultiSource/Applications/JM/lencod/lencod 1.47% MultiSource/Benchmarks/Fhourstones-3_1/fhourstones3_1 -6.06% ``` I didn't see any performance changes in the testsuite, but it improves some internal tests. Reviewers: hfinkel, chandlerc Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D20482 llvm-svn: 270478
*	[LoopUnrollAnalyzer] Take into account cost of instructions controlling ↵	Michael Zolotukhin	2016-05-18	1	-0/+1
\| \| \| \| \| \| \| \| \|	branches, along with their operands. Previously, we didn't add their and their operands cost, which could've resulted in unrolling loops for no actual benefit. llvm-svn: 269985
*	Revert "Revert "[Unroll] Implement a conservative and monotonically ↵	Michael Zolotukhin	2016-05-13	1	-13/+178
\| \| \| \| \| \| \| \| \| \|	increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the..."" This reverts commit r269395. Try to reapply with a fix from chapuni. llvm-svn: 269486
*	Revert "[Unroll] Implement a conservative and monotonically increasing cost ↵	Michael Zolotukhin	2016-05-13	1	-178/+13
\| \| \| \| \| \| \| \| \| \| \|	tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the..." This reverts commit r269388. It caused some bots to fail, I'm reverting it until I investigate the issue. llvm-svn: 269395
*	[Unroll] Implement a conservative and monotonically increasing cost tracking ↵	Michael Zolotukhin	2016-05-13	1	-13/+178
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the... Summary: ...loop after the last iteration. This is really hard to do correctly. The core problem is that we need to model liveness through the induction PHIs from iteration to iteration in order to get the correct results, and we need to correctly de-duplicate the common subgraphs of instructions feeding some subset of the induction PHIs. All of this can be driven either from a side effect at some iteration or from the loop values used after the loop finishes. This patch implements this by storing the forward-propagating analysis of each instruction in a cache to recall whether it was free and whether it has become live and thus counted toward the total unroll cost. Then, at each sink for a value in the loop, we recursively walk back through every value that feeds the sink, including looping back through the iterations as needed, until we have marked the entire input graph as live. Because we cache this, we never visit instructions more than twice -- once when we analyze them and put them into the cache, and once when we count their cost towards the unrolled loop. Also, because the cache is only two bits and because we are dealing with relatively small iteration counts, we can store all of this very densely in memory to avoid this from becoming an excessively slow analysis. The code here is still pretty gross. I would appreciate suggestions about better ways to factor or split this up, I've stared too long at the algorithmic side to really have a good sense of what the design should probably look at. Also, it might seem like we should do all of this bottom-up, but I think that is a red herring. Specifically, the simplification power is much greater working top-down. We can forward propagate very effectively, even across strange and interesting recurrances around the backedge. Because we use data to propagate, this doesn't cause a state space explosion. Doing this level of constant folding, etc, would be very expensive to do bottom-up because it wouldn't be until the last moment that you could collapse everything. The current solution is essentially a top-down simplification with a bottom-up cost accounting which seems to get the best of both worlds. It makes the simplification incremental and powerful while leaving everything dead until we know it is needed. Finally, a core property of this approach is its monotonicity. At all times, the current UnrolledCost is a conservatively low estimate. This ensures that we will never early-exit from the analysis due to exceeding a threshold when if we had continued, the cost would have gone back below the threshold. These kinds of bugs can cause incredibly hard to track down random changes to behavior. We could use a techinque similar (but much simpler) within the inliner as well to avoid considering speculated code in the inline cost. Reviewers: chandlerc Subscribers: sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D11758 llvm-svn: 269388
*	Loop unroller: set thresholds for optsize and minsize functions to zero	Hans Wennborg	2016-05-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before r268509, Clang would disable the loop unroll pass when optimizing for size. That commit enabled it to be able to support unroll pragmas in -Os builds. However, this regressed binary size in one of Chromium's DLLs with ~100 KB. This restores the original behaviour of no unrolling at -Os, but doing it in LLVM instead of Clang makes more sense, and also allows the pragmas to keep working. Differential revision: http://reviews.llvm.org/D20115 llvm-svn: 269124
*	clang-format some files in preparation of coming patch reviews.	Dehao Chen	2016-05-05	1	-30/+31
\| \| \| \|	llvm-svn: 268583
*	Re-commit optimization bisect support (r267022) without new pass manager ↵	Andrew Kaylor	2016-04-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	support. The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling). Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267231
*	Revert "Initial implementation of optimization bisect support."	Vedant Kumar	2016-04-22	1	-1/+1
\| \| \| \| \| \| \| \|	This reverts commit r267022, due to an ASan failure: http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549 llvm-svn: 267115
*	Initial implementation of optimization bisect support.	Andrew Kaylor	2016-04-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations. The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used. The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way. Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267022
*	Loop Unroll: add options and tweak to make Partial unrolling more useful	Fiona Glaser	2016-04-06	1	-3/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Add FullUnrollMaxCount option that works like MaxCount, but also limits the unroll count for fully unrolled loops. So if a loop has an iteration count over this, it won't fully unroll. 2. Add CLI options for MaxCount and the new option, so they can be tested (plus a test). 3. Make partial unrolling obey MaxCount. An example use-case (the out of tree one this is originally designed for) is a target’s TTI can analyze a loop and decide on a max unroll count separate from the size threshold, e.g. based on register pressure, then constrain LoopUnroll to not exceed that, regardless of the size of the unrolled loop. llvm-svn: 265562
*	LoopUnroll: only allow non-modulo Partial unrolling when Runtime=true	Fiona Glaser	2016-04-06	1	-2/+4
\| \| \| \| \| \|	Patch by Evgeny Stupachenko <evstupac@gmail.com>. llvm-svn: 265558
*	Enable unroll for constant bound loops when TripCount is not modulo of ↵	Zia Ansari	2016-04-04	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	unroll factor, reducing it to maximum power-of-2 that satisfies threshold limit. Commit for Evgeny Stupachenko (evstupac@gmail.com) Differential Revision: http://reviews.llvm.org/D18290 llvm-svn: 265337
*	Enable non-power-of-2 #pragma unroll counts.	David L Kreitzer	2016-03-25	1	-5/+4
\| \| \| \| \| \| \| \|	Patch by Evgeny Stupachenko. Differential Revision: http://reviews.llvm.org/D18202 llvm-svn: 264407
*	[LoopUnroll] Respect the convergent attribute.	Justin Lebar	2016-03-14	1	-4/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Specifically, when we perform runtime loop unrolling of a loop that contains a convergent op, we can only unroll k times, where k divides the loop trip multiple. Without this change, we'll happily unroll e.g. the following loop for (int i = 0; i < N; ++i) { if (i == 0) convergent_op(); foo(); } into int i = 0; if (N % 2 == 1) { convergent_op(); foo(); ++i; } for (; i < N - 1; i += 2) { if (i == 0) convergent_op(); foo(); foo(); }. This is unsafe, because we've just added a control-flow dependency to the convergent op in the prelude. In general, runtime unrolling loops that contain convergent ops is safe only if we don't have emit a prelude, which occurs when the unroll count divides the trip multiple. Reviewers: resistor Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17526 llvm-svn: 263509