summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Utils
diff options
context:
space:
mode:
authorVedant Kumar <vsk@apple.com>2018-10-24 22:15:41 +0000
committerVedant Kumar <vsk@apple.com>2018-10-24 22:15:41 +0000
commitc29900687983824ca66fbd83491390873523339e (patch)
treef84721b45033a99a5e70d7186d53471788b318ae /llvm/lib/Transforms/Utils
parent444363e931d427719ddaa91819a9b133b8ac70e8 (diff)
downloadbcm5719-llvm-c29900687983824ca66fbd83491390873523339e.tar.gz
bcm5719-llvm-c29900687983824ca66fbd83491390873523339e.zip
[HotColdSplitting] Identify larger cold regions using domtree queries
The current splitting algorithm works in three stages: 1) Identify cold blocks, then 2) Use forward/backward propagation to mark hot blocks, then 3) Grow a SESE region of blocks *outside* of the set of hot blocks and start outlining. While testing this pass on Apple internal frameworks I noticed that some kinds of control flow (e.g. loops) are never outlined, even though they unconditionally lead to / follow cold blocks. I noticed two other issues related to how cold regions are identified: - An inconsistency can arise in the internal state of the hotness propagation stage, as a block may end up in both the ColdBlocks set and the HotBlocks set. Further inconsistencies can arise as these sets do not match what's in ProfileSummaryInfo. - It isn't necessary to limit outlining to single-exit regions. This patch teaches the splitting algorithm to identify maximal cold regions and outline them. A maximal cold region is defined as the set of blocks post-dominated by a cold sink block, or dominated by that sink block. This approach can successfully outline loops in the cold path. As a side benefit, it maintains less internal state than the current approach. Due to a limitation in CodeExtractor, blocks within the maximal cold region which aren't dominated by a single entry point (a so-called "max ancestor") are filtered out. Results: - X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to 47KB pre-patch, or a ~3x improvement. Did not see a performance impact across two runs. - AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB of TEXT were outlined. Ditto re: performance impact. - Outlining results improve marginally in the internal frameworks I tested. Follow-ups: - Outline more than once per function, outline large single basic blocks, & try to remove unconditional branches in outlined functions. Differential Revision: https://reviews.llvm.org/D53627 llvm-svn: 345209
Diffstat (limited to 'llvm/lib/Transforms/Utils')
-rw-r--r--llvm/lib/Transforms/Utils/CodeExtractor.cpp40
1 files changed, 24 insertions, 16 deletions
diff --git a/llvm/lib/Transforms/Utils/CodeExtractor.cpp b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
index 328fe1fac65..462dc588cd5 100644
--- a/llvm/lib/Transforms/Utils/CodeExtractor.cpp
+++ b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
@@ -1273,24 +1273,32 @@ Function *CodeExtractor::extractCodeRegion() {
// Look at all successors of the codeReplacer block. If any of these blocks
// had PHI nodes in them, we need to update the "from" block to be the code
// replacer, not the original block in the extracted region.
- std::vector<BasicBlock *> Succs(succ_begin(codeReplacer),
- succ_end(codeReplacer));
- for (unsigned i = 0, e = Succs.size(); i != e; ++i)
- for (BasicBlock::iterator I = Succs[i]->begin(); isa<PHINode>(I); ++I) {
- PHINode *PN = cast<PHINode>(I);
- std::set<BasicBlock*> ProcessedPreds;
- for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i)
- if (Blocks.count(PN->getIncomingBlock(i))) {
- if (ProcessedPreds.insert(PN->getIncomingBlock(i)).second)
- PN->setIncomingBlock(i, codeReplacer);
- else {
- // There were multiple entries in the PHI for this block, now there
- // is only one, so remove the duplicated entries.
- PN->removeIncomingValue(i, false);
- --i; --e;
- }
+ for (BasicBlock *SuccBB : successors(codeReplacer)) {
+ for (PHINode &PN : SuccBB->phis()) {
+ Value *IncomingCodeReplacerVal = nullptr;
+ SmallVector<unsigned, 2> IncomingValsToRemove;
+ for (unsigned I = 0, E = PN.getNumIncomingValues(); I != E; ++I) {
+ BasicBlock *IncomingBB = PN.getIncomingBlock(I);
+
+ // Ignore incoming values from outside of the extracted region.
+ if (!Blocks.count(IncomingBB))
+ continue;
+
+ // Ensure that there is only one incoming value from codeReplacer.
+ if (!IncomingCodeReplacerVal) {
+ PN.setIncomingBlock(I, codeReplacer);
+ IncomingCodeReplacerVal = PN.getIncomingValue(I);
+ } else {
+ assert(IncomingCodeReplacerVal == PN.getIncomingValue(I) &&
+ "PHI has two incompatbile incoming values from codeRepl");
+ IncomingValsToRemove.push_back(I);
}
+ }
+
+ for (unsigned I : reverse(IncomingValsToRemove))
+ PN.removeIncomingValue(I, /*DeletePHIIfEmpty=*/false);
}
+ }
// Erase debug info intrinsics. Variable updates within the new function are
// invisible to debuggers. This could be improved by defining a DISubprogram
OpenPOWER on IntegriCloud