summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopUnroll
Commit message (Collapse)AuthorAgeFilesLines
* Don't recompute LCSSA after loop-unrolling when possible.Michael Zolotukhin2015-11-141-0/+119
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Currently we always recompute LCSSA for outer loops after unrolling an inner loop. That leads to compile time problem when we have big loop nests, and we can solve it by avoiding unnecessary work. For instance, if w eonly do partial unrolling, we don't break LCSSA, so we don't need to rebuild it. Also, if all exits from the inner loop are inside the enclosing loop, then complete unrolling won't break LCSSA either. I replaced unconditional LCSSA recomputation with conditional recomputation + unconditional assert and added several tests, which were failing when I experimented with it. Soon I plan to follow up with a similar patch for recalculation of dominators tree. Reviewers: hfinkel, dexonsmith, bogner, joker.eph, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14526 llvm-svn: 253126
* DI: Reverse direction of subprogram -> function edge.Peter Collingbourne2015-11-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Previously, subprograms contained a metadata reference to the function they described. Because most clients need to get or set a subprogram for a given function rather than the other way around, this created unneeded inefficiency. For example, many passes needed to call the function llvm::makeSubprogramMap() to build a mapping from functions to subprograms, and the IR linker needed to fix up function references in a way that caused quadratic complexity in the IR linking phase of LTO. This change reverses the direction of the edge by storing the subprogram as function-level metadata and removing DISubprogram's function field. Since this is an IR change, a bitcode upgrade has been provided. Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is attached to the PR. Differential Revision: http://reviews.llvm.org/D14265 llvm-svn: 252219
* Revert "[IndVarSimplify] Rewrite loop exit values with their initial values ↵Tobias Grosser2015-11-031-1/+1
| | | | | | | | | | | | | | | | | | | from loop preheader" Commit 251839 triggers miscompiles on some bots: http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly-fast/builds/13723 (The commit is listed in 13722, but due to an existing failure introduced in 13721 and reverted in 13723 the failure is only visible in 13723) To verify r251839 is indeed the only change that triggered the buildbot failures and to ensure the buildbots remain green while investigating I temporarily revert this commit. At the current state it is unclear if this commit introduced some miscompile or if it only exposed code to Polly that is subsequently miscompiled by Polly. llvm-svn: 251901
* [IndVarSimplify] Rewrite loop exit values with their initial values from ↵Chen Li2015-11-021-1/+1
| | | | | | | | | | | | | | | | | loop preheader Summary: This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch. Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13974 llvm-svn: 251839
* Revert r251492 "[IndVarSimplify] Rewrite loop exit values with their Chen Li2015-10-281-1/+1
| | | | | | initial values from loop preheader", because it broke some bots. llvm-svn: 251498
* [IndVarSimplify] Rewrite loop exit values with their initial values from ↵Chen Li2015-10-281-1/+1
| | | | | | | | | | | | | | | | | loop preheader Summary: This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch. Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13974 llvm-svn: 251492
* [Tests] Add one more case to LoopUnroll/pr18861.ll for better coverage.Michael Zolotukhin2015-10-021-0/+31
| | | | llvm-svn: 249174
* [Tests] Give meaningful names to blocks in LoopUnroll/pr18861.ll, add a ↵Michael Zolotukhin2015-10-021-13/+37
| | | | | | description of what's going on. llvm-svn: 249173
* [Tests] Slightly reduce test LoopUnroll/pr18861.ll.Michael Zolotukhin2015-10-021-16/+4
| | | | llvm-svn: 249172
* [Unroll] Do not crash trying to propagate a value to vector load.Michael Zolotukhin2015-09-221-0/+19
| | | | llvm-svn: 248333
* [Unroll] Follow-up for r247769: fix a bug in UnrolledInstAnalyzer::visitLoad.Michael Zolotukhin2015-09-221-2/+30
| | | | | | | | Apart from checking that GlobalVariable is a constant, we should check that it's not a weak constant, in which case we can't propagate its value. llvm-svn: 248327
* [Unroll] Fix a bug in UnrolledInstAnalyzer::visitLoad.Michael Zolotukhin2015-09-161-0/+29
| | | | | | | | We only checked that a global is initialized with constants, which is incorrect. We should be checking that GlobalVariable *is* a constant, not just initialized with it. llvm-svn: 247769
* DI: Require subprogram definitions to be distinctDuncan P. N. Exon Smith2015-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | As a follow-up to r246098, require `DISubprogram` definitions (`isDefinition: true`) to be 'distinct'. Specifically, add an assembler check, a verifier check, and bitcode upgrading logic to combat testcase bitrot after the `DIBuilder` change. While working on the testcases, I realized that test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its purpose was to check for a corner case in PR22792 where two subprogram definitions match exactly and share the same metadata node. The new verifier check, requiring that subprogram definitions are 'distinct', precludes that possibility. I updated almost all the IR with the following script: git grep -l -E -e '= !DISubprogram\(.* isDefinition: true' | grep -v test/Bitcode | xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true\)/= distinct \1/' Likely some variant of would work for out-of-tree testcases. llvm-svn: 246327
* Add new llvm.loop.unroll.enable metadata.Mark Heffernan2015-08-101-0/+66
| | | | | | | | | | | | | This change adds the unroll metadata "llvm.loop.unroll.enable" which directs the optimizer to unroll a loop fully if the trip count is known at compile time, and unroll partially if the trip count is not known at compile time. This differs from "llvm.loop.unroll.full" which explicitly does not unroll a loop if the trip count is not known at compile time. The "llvm.loop.unroll.enable" is intended to be added for loops annotated with "#pragma unroll". llvm-svn: 244466
* [Unroll] Improve the brute force loop unroll estimate by propagatingChandler Carruth2015-08-031-0/+23
| | | | | | | | | | | | | | | | | | | | | | | through PHI nodes across iterations. This patch teaches the new advanced loop unrolling heuristics to propagate constants into the loop from the preheader and around the backedge after simulating each iteration. This lets us brute force solve simple recurrances that aren't modeled effectively by SCEV. It also makes it more clear why we need to process the loop in-order rather than bottom-up which might otherwise make much more sense (for example, for DCE). This came out of an attempt I'm making to develop a principled way to account for dead code in the unroll estimation. When I implemented a forward-propagating version of that it produced incorrect results due to failing to propagate *cost* between loop iterations through the PHI nodes, and it occured to me we really should at least propagate simplifications across those edges, and it is quite easy thanks to the loop being in canonical and LCSSA form. Differential Revision: http://reviews.llvm.org/D11706 llvm-svn: 243900
* [Unroll] Handle SwitchInst properly.Michael Zolotukhin2015-07-291-0/+24
| | | | | | Previously successor selection was simply wrong. llvm-svn: 243545
* [Unroll] Don't crash when simplified branch condition is undef.Michael Zolotukhin2015-07-291-0/+25
| | | | llvm-svn: 243544
* Rename test full-unroll-bad-geps.ll to full-unroll-crashers.ll.Michael Zolotukhin2015-07-291-0/+0
| | | | | | | No reason to limit it only to GEP-related crashes. More tests are to come here. llvm-svn: 243543
* Roll forward r243250Jingyue Wu2015-07-261-3/+6
| | | | | | | | | r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build slaves, but I couldn't reproduce this failure locally. Probably a false positive as I saw this test was broken by r243246 or r243247 too but passed later without people fixing anything. llvm-svn: 243253
* Revert r243250Jingyue Wu2015-07-261-6/+3
| | | | | | breaks tests llvm-svn: 243251
* [TTI/CostModel] improve TTI::getGEPCost and use it in ↵Jingyue Wu2015-07-261-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CostModel::getInstructionCost Summary: This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider addressing modes. It now returns TCC_Free when the GEP can be completely folded to an addresing mode. I started this patch as I refactored SLSR. Function isGEPFoldable looks common and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost. Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best testing bed seems CostModel, but its getInstructionCost method invokes getAddressComputationCost for GEPs which provides very coarse estimation. So this patch also makes getInstructionCost call the updated getGEPCost for GEPs. This change inevitably breaks some tests because the cost model changes, but nothing looks seriously wrong -- if we believe the new cost model is the right way to go, these tests should be updated. This patch is not perfect yet -- the comments in some tests need to be updated. I want to know whether this is a right approach before fixing those details. Reviewers: chandlerc, hfinkel Subscribers: aschwaighofer, llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D9819 llvm-svn: 243250
* Handle resolvable branches in complete loop unroll heuristic.Michael Zolotukhin2015-07-241-0/+207
| | | | | | | | | | | | | | Summary: Resolving a branch allows us to ignore blocks that won't be executed, and thus make our estimate more accurate. This patch is intended to be applied after D10205 (though it could be applied independently). Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10206 llvm-svn: 243084
* Tidy-up test case from r242257.Michael Zolotukhin2015-07-151-5/+8
| | | | llvm-svn: 242268
* [LoopUnrolling] Handle cast instructions.Michael Zolotukhin2015-07-151-0/+94
| | | | | | | | | During estimation of unrolling effect we should be able to propagate constants through casts. Differential Revision: http://reviews.llvm.org/D10207 llvm-svn: 242257
* Enable runtime unrolling with unroll pragma metadataMark Heffernan2015-07-131-14/+13
| | | | | | | | | | Enable runtime unrolling for loops with unroll count metadata ("#pragma unroll N") and a runtime trip count. Also, do not unroll loops with unroll full metadata if the loop has a runtime loop count. Previously, such loops would be unrolled with a very large threshold (pragma-unroll-threshold) if runtime unrolled happened to be enabled resulting in a very large (and likely unwise) unroll factor. llvm-svn: 242047
* [LoopUnroll] Use undef for phis with no value liveDavid Majnemer2015-07-011-0/+24
| | | | | | | | We would create a phi node with a zero initialized operand instead of undef in the case where no value was originally available. This was problematic for x86_mmx which has no null value. llvm-svn: 241143
* Set proper debug location for branch added in BasicBlock::splitBasicBlock().Alexey Samsonov2015-06-111-12/+33
| | | | | | | | | | | | | | | | This improves debug locations in passes that do a lot of basic block transformations. Important case is LoopUnroll pass, the test for correct debug locations accompanies this change. Test Plan: regression test suite Reviewers: dblaikie, sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10367 llvm-svn: 239551
* [LoopUnroll] Fix truncation bug in canUnrollCompletely.Sanjoy Das2015-06-061-0/+58
| | | | | | | | | | | | | | | | | Summary: canUnrollCompletely takes `unsigned` values for `UnrolledCost` and `RolledDynamicCost` but is passed in `uint64_t`s that are silently truncated. Because of this, when `UnrolledSize` is a large integer that has a small remainder with UINT32_MAX, LLVM tries to completely unroll loops with high trip counts. Reviewers: mzolotukhin, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10293 llvm-svn: 239218
* [Unroll] Rework the naming and structure of the new unroll heuristics.Chandler Carruth2015-06-052-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new naming is (to me) much easier to understand. Here is a summary of the new state of the world: - '*Threshold' is the threshold for full unrolling. It is measured against the estimated unrolled cost as computed by getUserCost in TTI (or CodeMetrics, etc). We will exceed this threshold when unrolling loops where unrolling exposes a significant degree of simplification of the logic within the loop. - '*PercentDynamicCostSavedThreshold' is the percentage of the loop's estimated dynamic execution cost which needs to be saved by unrolling to apply a discount to the estimated unrolled cost. - '*DynamicCostSavingsDiscount' is the discount applied to the estimated unrolling cost when the dynamic savings are expected to be high. When actually analyzing the loop, we now produce both an estimated unrolled cost, and an estimated rolled cost. The rolled cost is notably a dynamic estimate based on our analysis of the expected execution of each iteration. While we're still working to build up the infrastructure for making these estimates, to me it is much more clear *how* to make them better when they have reasonably descriptive names. For example, we may want to apply estimated (from heuristics or profiles) dynamic execution weights to the *dynamic* cost estimates. If we start doing that, we would also need to track the static unrolled cost and the dynamic unrolled cost, as only the latter could reasonably be weighted by profile information. This patch is sadly not without functionality change for the new unroll analysis logic. Buried in the heuristic management were several things that surprised me. For example, we never subtracted the optimized instruction count off when comparing against the unroll heursistics! I don't know if this just got lost somewhere along the way or what, but with the new accounting of things, this is much easier to keep track of and we use the post-simplification cost estimate to compare to the thresholds, and use the dynamic cost reduction ratio to select whether we can exceed the baseline threshold. The old values of these flags also don't necessarily make sense. My impression is that none of these thresholds or discounts have been tuned yet, and so they're just arbitrary placehold numbers. As such, I've not bothered to adjust for the fact that this is now a discount and not a tow-tier threshold model. We need to tune all these values once the logic is ready to be enabled. Differential Revision: http://reviews.llvm.org/D9966 llvm-svn: 239164
* [PPC/LoopUnrollRuntime] Don't avoid high-cost trip count computation on the ↵Hal Finkel2015-05-211-0/+27
| | | | | | | | | | | | | PPC/A2 On X86 (and similar OOO cores) unrolling is very limited, and even if the runtime unrolling is otherwise profitable, the expense of a division to compute the trip count could greatly outweigh the benefits. On the A2, we unroll a lot, and the benefits of unrolling are more significant (seeing a 5x or 6x speedup is not uncommon), so we're more able to tolerate the expense, on average, of a division to compute the trip count. llvm-svn: 237947
* Add another InstCombine pass after LoopUnroll.Wei Mi2015-05-141-0/+85
| | | | | | | | This is to cleanup some redundency generated by LoopUnroll pass. Such redundency may not be cleaned up by existing passes after LoopUnroll. Differential Revision: http://reviews.llvm.org/D9777 llvm-svn: 237395
* Reimplement heuristic for estimating complete-unroll optimization effects.Michael Zolotukhin2015-05-122-2/+36
| | | | | | | | | | | | | | | | | | | | Summary: This patch reimplements heuristic that tries to estimate optimization beneftis from complete loop unrolling. In this patch I kept the minimal changes - e.g. I removed code handling branches and folding compares. That's a promising area, but now there are too many questions to discuss before we can enable it. Test Plan: Tests are included in the patch. Reviewers: hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8816 llvm-svn: 237156
* [LoopUnrollRuntime] Avoid high-cost trip count computation.Sanjoy Das2015-04-143-1/+31
| | | | | | | | | | | | | | | | | Summary: Runtime unrolling of loops needs to emit an expression to compute the loop's runtime trip-count. Avoid runtime unrolling if this computation will be expensive. Depends on D8993. Reviewers: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8994 llvm-svn: 234846
* [LoopUnrollRuntime] Clean up a predicate.Sanjoy Das2015-04-121-0/+45
| | | | | | | | Clean up a predicate I added in r229731, fix the relevant comment and add a test case. The earlier version is confusing to read and was also buggy (probably not a coincidence) till Alexey fixed it in r233881. llvm-svn: 234701
* Reapply 'Run LICM pass after loop unrolling pass.'Kevin Qin2015-03-121-0/+43
| | | | | | | | | It's firstly committed at r231630, and reverted at r231635. Function pass InstructionSimplifier is inserted as barrier to make sure loop unroll pass won't affect on LICM pass. llvm-svn: 232011
* Revert r231630 - Run LICM pass after loop unrolling pass.Kevin Qin2015-03-091-43/+0
| | | | | | As it broke llvm bootstrap. llvm-svn: 231635
* [AArch64] Enable partial & runtime unrolling on cortex-a57Kevin Qin2015-03-093-0/+112
| | | | | | | | For inner one of nested loops, it is more likely to be a hot loop, and the runtime check can be promoted out from patch 0001, so the overhead is less, we can try a doubled threshold to unroll more loops. llvm-svn: 231632
* Introduce runtime unrolling disable matadata and use it to mark the scalar ↵Kevin Qin2015-03-091-0/+33
| | | | | | | | | | | loop from vectorization. Runtime unrolling is an expensive optimization which can bring benefit only if the loop is hot and iteration number is relatively large enough. For some loops, we know they are not worth to be runtime unrolled. The scalar loop from vectorization is one of the cases. llvm-svn: 231631
* Run LICM pass after loop unrolling pass.Kevin Qin2015-03-091-0/+43
| | | | | | | | | Runtime unrollng will introduce a runtime check in loop prologue. If the unrolled loop is a inner loop, then the proglogue will be inside the outer loop. LICM pass can help to promote the runtime check out if the checked value is loop invariant. llvm-svn: 231630
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-2715-41/+41
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-2717-55/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float*> %x, ... ->getelementptr float, <4 x float*> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name *.ll | xargs ./apply.sh From llvm/src/tools/clang: find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll | xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786
* Partial fix for bug 22589Sanjoy Das2015-02-183-15/+20
| | | | | | | | | | | Don't spend the entire iteration space in the scalar loop prologue if computing the trip count overflows. This change also gets rid of the backedge check in the prologue loop and the extra check for overflowing trip-count. Differential Revision: http://reviews.llvm.org/D7715 llvm-svn: 229731
* [unroll] Concede defeat and disable the unroll analyzer for now.Chandler Carruth2015-02-131-4/+4
| | | | | | | | | | The issues with the new unroll analyzer are more fundamental than code cleanup, algorithm, or data structure changes. I've sent an email to the original commit thread with details and a proposal for how to redesign things. I'm disabling this for now so that we don't spend time debugging issues with it in its current state. llvm-svn: 229064
* Testcase for r228988.Michael Zolotukhin2015-02-131-0/+3
| | | | llvm-svn: 228995
* Add a test case for new unrolling heuristics.Michael Zolotukhin2015-02-101-0/+59
| | | | | | THe heuristics were added in r228265 and r228434. llvm-svn: 228713
* Use a smaller pragma unroll threshold to reduce test execution time.Alexander Potapenko2015-01-211-2/+2
| | | | | | | When opt is compiled with AddressSanitizer it takes more than 30 seconds to unroll the loop in unroll_1M(). llvm-svn: 226660
* [PowerPC] Readjust the loop unrolling thresholdHal Finkel2015-01-101-2/+50
| | | | | | | | Now that the way that the partial unrolling threshold for small loops is used to compute the unrolling factor as been corrected, a slightly smaller threshold is preferable. This is expected; other targets may need to re-tune as well. llvm-svn: 225566
* [LoopUnroll] Fix the partial unrolling threshold for small loop sizesHal Finkel2015-01-102-2/+19
| | | | | | | | | | | | | When we compute the size of a loop, we include the branch on the backedge and the comparison feeding the conditional branch. Under normal circumstances, these don't get replicated with the rest of the loop body when we unroll. This led to the somewhat surprising behavior that really small loops would not get unrolled enough -- they could be unrolled more and the resulting loop would be below the threshold, because we were assuming they'd take (LoopSize * UnrollingFactor) instructions after unrolling, instead of (((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation. llvm-svn: 225565
* [PowerPC] Enable late partial unrolling on the POWER7Hal Finkel2015-01-091-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | The P7 benefits from not have really-small loops so that we either have multiple dispatch groups in the loop and/or the ability to form more-full dispatch groups during scheduling. Setting the partial unrolling threshold to 44 seems good, empirically, for the P7. Compared to using no late partial unrolling, this yields the following test-suite speedups: SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -66.3253% +/- 24.1975% SingleSource/Benchmarks/Misc-C++/oopack_v1p8 -44.0169% +/- 29.4881% SingleSource/Benchmarks/Misc/pi -27.8351% +/- 12.2712% SingleSource/Benchmarks/Stanford/Bubblesort -30.9898% +/- 22.4647% I've speculatively added a similar setting for the P8. Also, I've noticed that the unroller does not quite calculate the unrolling factor correctly for really tiny loops because it neglects to account for the fact that not every loop body replicant contains an ending branch and counter increment. I'll fix that later. llvm-svn: 225522
* IR: Add 'distinct' MDNodes to bitcode and assemblyDuncan P. N. Exon Smith2015-01-082-6/+6
| | | | | | | | | | | | | | | | | | Propagate whether `MDNode`s are 'distinct' through the other types of IR (assembly and bitcode). This adds the `distinct` keyword to assembly. Currently, no one actually calls `MDNode::getDistinct()`, so these nodes only get created for: - self-references, which are never uniqued, and - nodes whose operands are replaced that hit a uniquing collision. The concept of distinct nodes is still not quite first-class, since distinct-ness doesn't yet survive across `MapMetadata()`. Part of PR22111. llvm-svn: 225474
OpenPOWER on IntegriCloud