bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	LoopUnroll: Allow analyzing intrinsic call costs	Matt Arsenault	2018-06-26	1	-0/+77
\| \| \| \| \| \| \| \| \| \| \|	I'm not sure why the code here is skipping calls since TTI does try to do something for general calls, but it at least should allow intrinsics. Skip intrinsics that should not be omitted as calls, which is by far the most common case on AMDGPU. llvm-svn: 335645
*	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.	Shiva Chen	2018-05-09	3	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841
*	[LoopUnroll] Only peel if a predicate becomes known in the loop body.	Florian Hahn	2018-04-18	1	-128/+141
\| \| \| \| \| \| \| \| \| \| \| \| \|	If a predicate does not become known after peeling, peeling is unlikely to be beneficial. Reviewers: mcrosier, efriedma, mkazantsev, junbuml Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D44983 llvm-svn: 330250
*	[LoopUnroll] Make LoopPeeling respect the AllowPeeling preference.	Chad Rosier	2018-04-06	1	-0/+3
\| \| \| \| \| \| \| \|	The SimpleLoopUnrollPass isn't suppose to perform loop peeling. Differential Revision: https://reviews.llvm.org/D45334 llvm-svn: 329395
*	peel loops with runtime small trip counts	Ikhlas Ajbar	2018-04-03	1	-0/+37
\| \| \| \| \| \| \| \| \| \|	For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329042
*	Revert "peel loops with runtime small trip counts"	Krzysztof Parzyszek	2018-03-30	1	-37/+0
\| \| \| \| \| \|	This reverts commit r328854, it breaks some Hexagon tests. llvm-svn: 328875
*	[Hexagon] add missing lit config file	Ikhlas Ajbar	2018-03-30	1	-0/+3
\| \| \| \|	llvm-svn: 328855
*	peel loops with runtime small trip counts	Ikhlas Ajbar	2018-03-30	1	-0/+37
\| \| \| \| \| \| \| \| \| \|	For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 328854
*	[LoopUnroll] Fix dangling pointers in SCEV	Max Kazantsev	2018-03-26	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current logic of loop SCEV invalidation in Loop Unroller implicitly relies on fact that exit count of outer loops cannot rely on exiting blocks of inner loops, which is true in current implementation of backedge taken count calculation but is wrong in general. As result, when we only forget the loop that we have just unrolled, we may still have cached data for its outer loops (in particular, exit counts) which keeps references on blocks of inner loop that could have been changed or even deleted. The attached test demonstrates a situaton when after unrolling of innermost loop the outermost loop contains a dangling pointer on non-existant block. The problem shows up when we apply patch https://reviews.llvm.org/D44677 that makes SCEV smarter about exit count calculation. I am not sure if the bug exists without this patch, it appears that now it is accidentally correct just because in practice exact backedge taken count for outer loops with complex control flow inside is never calculated. But when SCEV learns to do so, this problem shows up. This patch replaces existing logic of SCEV loop invalidation with a correct one, which happens to be invalidation of outermost loop (which also leads to invalidation of all loops inside of it). It is the only way to ensure that no outer loop keeps dangling pointers on removed blocks, or just outdated information that has changed after unrolling. Differential Revision: https://reviews.llvm.org/D44818 Reviewed By: samparker llvm-svn: 328483
*	[LoopUnroll] Simplify induction variables after peeling too.	Florian Hahn	2018-03-23	1	-15/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Loop peeling also has an impact on the induction variables, so we should benefit from induction variable simplification after peeling too. Reviewers: sanjoy, bogner, mzolotukhin, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D43878 llvm-svn: 328301
*	[LoopUnroll] Peel off iterations if it makes conditions true/false.	Florian Hahn	2018-03-15	2	-1/+614
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the loop body contains conditions of the form IndVar < #constant, we can remove the checks by peeling off #constant iterations. This improves codegen for PR34364. Reviewers: mkuper, mkazantsev, efriedma Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D43876 llvm-svn: 327671
*	[LoopUnroll] Ignore ephemeral values when checking full unroll profitability.	Andrei Elovikov	2018-03-15	1	-0/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Before this patch call graph is like this in the LoopUnrollPass: tryToUnrollLoop ApproximateLoopSize collectEphemeralValues /* Use collected ephemeral values / computeUnrollCount analyzeLoopUnrollCost / Bail out from the analysis if loop contains CallInst / This patch moves collection of the ephemeral values to the tryToUnrollLoop function and passes the collected values into both ApproximateLoopsize (as before) and additionally starts using them in analyzeLoopUnrollCost: tryToUnrollLoop collectEphemeralValues ApproximateLoopSize(EphValues) / Use EphValues / computeUnrollCount(EphValues) analyzeLoopUnrollCost(EphValues) / Ignore ephemeral values - they don't contribute to the final cost / / Bail out from the analysis if loop contains CallInst */ Reviewers: mzolotukhin, evstupac, sanjoy Reviewed By: evstupac Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43931 llvm-svn: 327617
*	LoopUnroll: respect pragma unroll when AllowRemainder is disabled	Yaxun Liu	2018-03-02	2	-26/+133
\| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when AllowRemainder is disabled, pragma unroll count is not respected even though there is no remainder. This bug causes a loop fully unrolled in many cases even though the user specifies a unroll count. Especially it affects OpenCL/CUDA since in many cases a loop contains convergent instructions and currently AllowRemainder is disabled for such loops. Differential Revision: https://reviews.llvm.org/D43826 llvm-svn: 326585
*	[SimplifyCFG] Re-apply Relax restriction for folding unconditional branches	Serguei Katkov	2018-02-08	1	-9/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The commit rL308422 introduces a restriction for folding unconditional branches. Specifically if empty block with unconditional branch leads to header of the loop then elimination of this basic block is prohibited. However it seems this condition is redundantly strict. If elimination of this basic block does not introduce more back edges then we can eliminate this block. The patch implements this relax of restriction. The test profile/Linux/counter_promo_nest.c in compiler-rt project is updated to meet this change. Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: pacxx Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324572
*	Revert [SimplifyCFG] Relax restriction for folding unconditional branches	Serguei Katkov	2018-02-05	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \|	The patch causes the failure of the test compiler-rt/test/profile/Linux/counter_promo_nest.c To unblock buildbot, revert the patch while investigation is in progress. Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324214
*	[SimplifyCFG] Relax restriction for folding unconditional branches	Serguei Katkov	2018-02-05	1	-9/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The commit rL308422 introduces a restriction for folding unconditional branches. Specifically if empty block with unconditional branch leads to header of the loop then elimination of this basic block is prohibited. However it seems this condition is redundantly strict. If elimination of this basic block does not introduce more back edges then we can eliminate this block. The patch implements this relax of restriction. Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: pacxx Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324208
*	[AMDGPU] Switch to the new addr space mapping by default	Yaxun Liu	2018-02-02	1	-21/+21
\| \| \| \| \| \| \| \|	This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101
*	[Unroll][DebugInfo] Propagate loop body's debug location to epilog preheader	Zhaoshi Zheng	2017-12-26	2	-2/+132
\| \| \| \| \| \| \|	NewExit and epilog PreHeader should has the same debug loc as the original loop body, instead of original loop exit. llvm-svn: 321465
*	loop-unroll: teach remapInstruction to update dbg.value intrinsics.	Adrian Prantl	2017-11-01	1	-0/+75
\| \| \| \| \| \| \| \|	Fixes PR35112. https://bugs.llvm.org/show_bug.cgi?id=35112 llvm-svn: 317138
*	[LoopUnroll] Clean up remarks for unroll remainder	David Green	2017-10-31	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The optimisation remarks for loop unrolling with an unrolled remainder looks something like: test.c:7:18: remark: completely unrolled loop with 3 iterations [-Rpass=loop-unroll] C[i] += A[i*N+j]; ^ test.c:6:9: remark: unrolled loop by a factor of 4 with run-time trip count [-Rpass=loop-unroll] for(int j = 0; j < N; j++) ^ This removes the first of the two messages. Differential revision: https://reviews.llvm.org/D38725 llvm-svn: 316986
*	[ARM] Allow unrolling of multi-block loops.	Sam Parker	2017-10-23	1	-0/+316
\| \| \| \| \| \| \| \| \| \| \|	Before, loop unrolling was only enabled for loops with a single block. This restriction has been removed and replaced by: - allow a maximum of two exiting blocks, - a four basic block limit for cores with a branch predictor. Differential Revision: https://reviews.llvm.org/D38952 llvm-svn: 316313
*	[ValueTracking] Enabling ValueTracking patch by default	Nikolai Bozhenov	2017-10-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(recommit #2 after checking for timeout issue). The original patch was an improvement to IR ValueTracking on non-negative integers. It has been checked in to trunk (D18777, r284022). But was disabled by default due to performance regressions. Perf impact has improved. The patch would be enabled by default. Reviewers: reames, hfinkel Differential Revision: https://reviews.llvm.org/D34101 Patch by: Olga Chupina <olga.chupina@intel.com> llvm-svn: 316208
*	The cost of splitting a large vector instruction is not being taken into ↵	Graham Yiu	2017-10-19	1	-0/+74
\| \| \| \| \| \| \| \| \| \|	account by the getUserCost function. This was leading to some loops being over unrolled. The cost of a vector instruction is now being multiplied by the cost of the type legalization. This will return a more accurate cost. Committing on behalf on Brad Nemanich (brad.nemanich@ibm.com) Differential Revision: https://reviews.llvm.org/D38961 llvm-svn: 316174
*	Use a BumpPtrAllocator for Loop objects	Sanjoy Das	2017-09-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: And now that we no longer have to explicitly free() the Loop instances, we can (with more ease) use the destructor of LoopBase to do what LoopBase::clear() was doing. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38201 llvm-svn: 314375
*	Tighten the invariants around LoopBase::invalidate	Sanjoy Das	2017-09-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: With this change: - Methods in LoopBase trip an assert if the receiver has been invalidated - LoopBase::clear frees up the memory held the LoopBase instance This change also shuffles things around as necessary to work with this stricter invariant. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38055 llvm-svn: 313708
*	[RuntimeUnroll] Add heuristic for unrolling multi-exit loop	Anna Thomas	2017-09-15	1	-0/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a profitability heuristic to enable runtime unrolling of multi-exit loop: There can be atmost two unique exit blocks for the loop and the second exit block should be a deoptimizing block. Also, there can be one other exiting block other than the latch exiting block. The reason for the latter is so that we limit the number of branches in the unrolled code to being at most the unroll factor. Deoptimizing blocks are rarely taken so these additional number of branches created due to the unrolling are predictable, since one of their target is the deopt block. Reviewers: apilipenko, reames, evstupac, mkuper Subscribers: llvm-commits Reviewed by: reames Differential Revision: https://reviews.llvm.org/D35380 llvm-svn: 313363
*	[RuntimeUnrolling] Populate the VMap entry correctly when default generated ↵	Anna Thomas	2017-09-15	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	through lookup During runtime unrolling on loops with multiple exits, we update the exit blocks with the correct phi values from both original and remainder loop. In this process, we lookup the VMap for the mapped incoming phi values, but did not update the VMap if a default entry was generated in the VMap during the lookup. This default value is generated when constants or values outside the current loop are looked up. This patch fixes the assertion failure when null entries are present in the VMap because of this lookup. Added a testcase that showcases the problem. llvm-svn: 313358
*	[LoopUnroll][DebugInfo] Don't add metadata to unrolled remainder loop	Sam Parker	2017-09-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Debug information can be, and was, corrupted when the runtime remainder loop was fully unrolled. This is because a !null node can be created instead of a unique one describing the loop. In this case, the original node gets incorrectly updated with the NewLoopID metadata. In the case when the remainder loop is going to be quickly fully unrolled, there isn't the need to add loop metadata for it anyway. Differential Revision: https://reviews.llvm.org/D37338 llvm-svn: 312471
*	[LoopUnroll] Make the test for PR33437 actually useful.	Davide Italiano	2017-08-29	1	-14/+27
\| \| \| \| \| \| \| \| \|	I forgot to specify -unroll-loop-peel, making this test not really effective. While here, adjust some details (naming and run line). Thanks to Sanjoy and Michael Z. for pointing out in their post-commit reviews. llvm-svn: 312015
*	[LoopUnroll] Properly update loop structure in case of successful peeling.	Davide Italiano	2017-08-28	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \|	When peeling kicks in, it updates the loop preheader. Later, a successful full unroll of the loop needs to update a PHI which i-th argument comes from the loop preheader, so it'd better look at the correct block. Fixes PR33437. Differential Revision: https://reviews.llvm.org/D37153 llvm-svn: 311922
*	Changed basic cost of store operation on X86	Elena Demikhovsky	2017-08-20	1	-0/+104
\| \| \| \| \| \| \| \| \|	Store operation takes 2 UOps on X86 processors. The exact cost calculation affects several optimization passes including loop unroling. This change compensates performance degradation caused by https://reviews.llvm.org/D34458 and shows improvements on some benchmarks. Differential Revision: https://reviews.llvm.org/D35888 llvm-svn: 311285
*	[ARM] Improve loop unrolling for Cortex-M	Sam Parker	2017-08-16	1	-92/+128
\| \| \| \| \| \| \| \| \| \| \|	- Set the default runtime unroll count to 4 and use the newly added UnrollRemainder option. - Create loop cost and force unroll for a cost less than 12. - Disable unrolling on Thumb1 only targets. Differential Revision: https://reviews.llvm.org/D36134 llvm-svn: 310997
*	[LoopUnroll] Enable option to peel remainder loop	Sam Parker	2017-08-14	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On some targets, the penalty of executing runtime unrolling checks and then not the unrolled loop can be significantly detrimental to performance. This results in the need to be more conservative with the unroll count, keeping a trip count of 2 reduces the overhead as well as increasing the chance of the unrolled body being executed. But being conservative leaves performance gains on the table. This patch enables the unrolling of the remainder loop introduced by runtime unrolling. This can help reduce the overhead of misunrolled loops because the cost of non-taken branches is much less than the cost of the backedge that would normally be executed in the remainder loop. This allows larger unroll factors to be used without suffering performance loses with smaller iteration counts. Differential Revision: https://reviews.llvm.org/D36309 llvm-svn: 310824
*	[PM] Relax the spelling of a pass name slightly in this test.	Chandler Carruth	2017-08-08	1	-1/+1
\| \| \| \| \| \|	I forgot that MSVC doesn't preserve this typedef, my bad. llvm-svn: 310334
*	[PM] Fix new LoopUnroll function pass by invalidating loop analysis	Chandler Carruth	2017-08-08	1	-0/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	results when a loop is completely removed. This is very hard to manifest as a visible bug. You need to arrange for there to be a subsequent allocation of a 'Loop' object which gets the exact same address as the one which the unroll deleted, and you need the LoopAccessAnalysis results to be significant in the way that they're stale. And you need a million other things to align. But when it does, you get a deeply mysterious crash due to actually finding a stale analysis result. This fixes the issue and tests for it by directly checking we successfully invalidate things. I have not been able to get any test case to reliably trigger this. Changes to LLVM itself caused the only test case I ever had to cease to crash. I've looked pretty extensively at less brittle ways of fixing this and they are actually very, very hard to do. This is a somewhat strange and unusual case as we have a pass which is deleting an IR unit, but is not running within that IR unit's pass framework (which is what handles this cleanly for the normal loop unroll). And where there isn't a definitive way to clear all of the stale cache entries. And where the pass is updating the core analysis that provides the IR units! For example, we don't have any of these problems with Function analyses because it is easy to clear out function analyses when the functions themselves may have been deleted -- we clear an entire module's worth! But that is too heavy of a hammer down here in the LoopAnalysisManager layer. A better long-term solution IMO is to require that AnalysisManager's make their keys durable to this kind of thing. Specifically, when caching an analysis for one IR unit that is conceptually "owned" by a higher level IR unit, the AnalysisManager should incorporate this into its data structures so that we can reliably clear these results without having to teach each and every pass to do so manually as we do here. But that is a change for another day as it will be a fairly invasive change to the AnalysisManager infrastructure. Until then, this fortunately seems to be quite rare. llvm-svn: 310333
*	Use profile summary to disable peeling for huge working sets	Teresa Johnson	2017-08-03	1	-18/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Detect when the working set size of a profiled application is huge, by comparing the number of counts required to reach the hot percentile in the profile summary to a large threshold. When the working set size is determined to be huge, disable peeling to avoid bloating the working set further. Note that the selected threshold (15K) is significantly larger than the largest working set value in SPEC cpu2006 (which is gcc at around 11K). Reviewers: davidxl Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D36288 llvm-svn: 310005
*	Disable loop peeling during full unrolling pass.	Teresa Johnson	2017-08-03	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Peeling should not occur during the full unrolling invocation early in the pipeline, but rather later with partial and runtime loop unrolling. The later loop unrolling invocation will also eventually utilize profile summary and branch frequency information, which we would like to use to control peeling. And for ThinLTO we want to delay peeling until the backend (post thin link) phase, just as we do for most types of unrolling. Ensure peeling doesn't occur during the full unrolling invocation by adding a parameter to the shared implementation function, similar to the way partial and runtime loop unrolling are disabled. Performance results for ThinLTO suggest this has a neutral to positive effect on some internal benchmarks. Reviewers: chandlerc, davidxl Subscribers: mzolotukhin, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36258 llvm-svn: 309966
*	[PM] Split LoopUnrollPass and make partial unroller a function pass	Teresa Johnson	2017-08-02	11	-58/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is largely NFC, in preparation for utilizing ProfileSummaryInfo and BranchFrequencyInfo analyses. In this patch I am only doing the splitting for the New PM, but I can do the same for the legacy PM as a follow-on if this looks good. Not NFC since for partial unrolling we lose the updates done to the loop traversal (adding new sibling and child loops) - according to Chandler this is not very useful for partial unrolling, but it also means that the debugging flag -unroll-revisit-child-loops no longer works for partial unrolling. Reviewers: chandlerc Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D36157 llvm-svn: 309886
*	[ARM] Enable partial and runtime unrolling	Sam Parker	2017-07-25	2	-0/+213
\| \| \| \| \| \| \| \| \| \|	Enable runtime and partial loop unrolling of simple loops without calls on M-class cores. The thresholds are calculated based on whether the target is Thumb or Thumb-2. Differential Revision: https://reviews.llvm.org/D34619 llvm-svn: 308956
*	[SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it ↵	Balaram Makam	2017-07-19	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	can destroy canonical loop structure. Summary: When simplifying unconditional branches from empty blocks, we pre-test if the BB belongs to a set of loop headers and keep the block to prevent passes from destroying canonical loop structure. However, the current algorithm fails if the destination of the branch is a loop header. Especially when such a loop's latch block is folded into loop header it results in additional backedges and LoopSimplify turns it into a nested loop which prevent later optimizations from being applied (e.g., loop unrolling and loop interleaving). This patch augments the existing algorithm by further checking if the destination of the branch belongs to a set of loop headers and defer eliminating it if yes to LateSimplifyCFG. Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605 Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: efriedma Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35411 llvm-svn: 308422
*	[RuntimeUnrolling] Update DomTree correctly when exit blocks have successors	Anna Thomas	2017-07-13	1	-0/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When we runtime unroll with multiple exit blocks, we also need to update the immediate dominators of the immediate successors of the exit blocks. Reviewers: reames, mkuper, mzolotukhin, apilipenko Reviewed by: mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35304 llvm-svn: 307909
*	[LoopUnrollRuntime] NFC: Refactored safety checks of unrolling multi-exit loop	Anna Thomas	2017-07-12	1	-0/+68
\| \| \| \| \| \| \| \| \| \| \|	Refactored the code and separated out a function `canSafelyUnrollMultiExitLoop` to reduce redundant checks and make it easier to add profitability heuristics later. Added tests to runtime unrolling to make sure that unrolling for multi-exit loops is not done unless the option -unroll-runtime-multi-exit is true. llvm-svn: 307843
*	[LoopUnrollRuntime] Avoid multi-exit nested loop with epilog generation	Anna Thomas	2017-07-11	1	-0/+44
\| \| \| \| \| \| \| \| \| \|	The loop structure for the outer loop does not contain the epilog preheader when we try to unroll inner loop with multiple exits and epilog code is generated. For now, we just bail out in such cases. Added a test case that shows the problem. Without this bailout, we would trip on assert saying LCSSA form is incorrect for outer loop. llvm-svn: 307676
*	[LoopUnrollRuntime] Remove strict assert about VMap requirement	Anna Thomas	2017-07-10	1	-1/+42
\| \| \| \| \| \| \| \| \| \| \| \| \|	When unrolling under multiple exits which is under off-by-default option, the assert that checks for VMap entry in loop exit values is too strong. (assert if VMap entry did not exist, the value should be a constant). However, values derived from constants or from values outside loop, does not have a VMap entry too. Removed the assert and added a testcase showcasing the property for non-constant values. llvm-svn: 307542
*	[LoopUnrollRuntime] Support multiple exit blocks unrolling when prolog ↵	Anna Thomas	2017-07-07	1	-79/+165
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	remainder generated With the NFC refactoring in rL307417 (git SHA 987dd01), all the logic is in place to support multiple exit/exiting blocks when prolog remainder is generated. This patch removed the assert that multiple exit blocks unrolling is only supported when epilog remainder is generated. Also, added test runs and checks with PROLOG prefix in runtime-loop-multiple-exits.ll test cases. llvm-svn: 307435
*	[LoopUnrollRuntime] Bailout when multiple exiting blocks to the unique latch ↵	Anna Thomas	2017-07-06	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	exit block Currently, we do not support multiple exiting blocks to the latch exit block. However, this bailout wasn't triggered when we had a unique exit block (which is the latch exit), with multiple exiting blocks to that unique exit. Moved the bailout so that it's triggered in both cases and added testcase. llvm-svn: 307291
*	[RuntimeUnrolling] Add logic for loops with multiple exit blocks	Anna Thomas	2017-06-30	1	-0/+279
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Runtime unrolling is done for loops with a single exit block and a single exiting block (and this exiting block should be the latch block). This patch adds logic to support unrolling in the presence of multiple exit blocks (which also means multiple exiting blocks). Currently this is under an off-by-default option and is supported when epilog code is generated. Support in presence of prolog code will be in a future patch (we just need to add more tests, and update comments). This patch is essentially an implementation patch. I have not added any heuristic (in terms of branches added or code size) to decide when this should be enabled. Reviewers: mkuper, sanjoy, reames, evstupac Reviewed by: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33001 llvm-svn: 306846
*	[AArch64][Falkor] Try to avoid exhausting HW prefetcher resources when ↵	Geoff Berry	2017-06-28	1	-0/+169
\| \| \| \| \| \| \| \| \| \| \| \|	unrolling. Reviewers: t.p.northover, mcrosier Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34533 llvm-svn: 306584
*	[LoopUnroll] Fix bug in computeUnrollCount causing it to not honor MaxCount	Geoff Berry	2017-06-28	1	-0/+31
\| \| \| \| \| \| \| \| \| \|	Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: mcrosier, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D34532 llvm-svn: 306564
*	[LoopUnroll] Fix a test. REQUIRE should be REQUIRES.	Davide Italiano	2017-05-12	1	-1/+1
\| \| \| \| \| \|	Found by inspection. llvm-svn: 302909