diff options
author | Vedant Kumar <vsk@apple.com> | 2018-10-24 22:15:41 +0000 |
---|---|---|
committer | Vedant Kumar <vsk@apple.com> | 2018-10-24 22:15:41 +0000 |
commit | c29900687983824ca66fbd83491390873523339e (patch) | |
tree | f84721b45033a99a5e70d7186d53471788b318ae /llvm/lib/Transforms/Utils | |
parent | 444363e931d427719ddaa91819a9b133b8ac70e8 (diff) | |
download | bcm5719-llvm-c29900687983824ca66fbd83491390873523339e.tar.gz bcm5719-llvm-c29900687983824ca66fbd83491390873523339e.zip |
[HotColdSplitting] Identify larger cold regions using domtree queries
The current splitting algorithm works in three stages:
1) Identify cold blocks, then
2) Use forward/backward propagation to mark hot blocks, then
3) Grow a SESE region of blocks *outside* of the set of hot blocks and
start outlining.
While testing this pass on Apple internal frameworks I noticed that some
kinds of control flow (e.g. loops) are never outlined, even though they
unconditionally lead to / follow cold blocks. I noticed two other issues
related to how cold regions are identified:
- An inconsistency can arise in the internal state of the hotness
propagation stage, as a block may end up in both the ColdBlocks set
and the HotBlocks set. Further inconsistencies can arise as these sets
do not match what's in ProfileSummaryInfo.
- It isn't necessary to limit outlining to single-exit regions.
This patch teaches the splitting algorithm to identify maximal cold
regions and outline them. A maximal cold region is defined as the set of
blocks post-dominated by a cold sink block, or dominated by that sink
block. This approach can successfully outline loops in the cold path. As
a side benefit, it maintains less internal state than the current
approach.
Due to a limitation in CodeExtractor, blocks within the maximal cold
region which aren't dominated by a single entry point (a so-called "max
ancestor") are filtered out.
Results:
- X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to
47KB pre-patch, or a ~3x improvement. Did not see a performance impact
across two runs.
- AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB
of TEXT were outlined. Ditto re: performance impact.
- Outlining results improve marginally in the internal frameworks I
tested.
Follow-ups:
- Outline more than once per function, outline large single basic
blocks, & try to remove unconditional branches in outlined functions.
Differential Revision: https://reviews.llvm.org/D53627
llvm-svn: 345209
Diffstat (limited to 'llvm/lib/Transforms/Utils')
-rw-r--r-- | llvm/lib/Transforms/Utils/CodeExtractor.cpp | 40 |
1 files changed, 24 insertions, 16 deletions
diff --git a/llvm/lib/Transforms/Utils/CodeExtractor.cpp b/llvm/lib/Transforms/Utils/CodeExtractor.cpp index 328fe1fac65..462dc588cd5 100644 --- a/llvm/lib/Transforms/Utils/CodeExtractor.cpp +++ b/llvm/lib/Transforms/Utils/CodeExtractor.cpp @@ -1273,24 +1273,32 @@ Function *CodeExtractor::extractCodeRegion() { // Look at all successors of the codeReplacer block. If any of these blocks // had PHI nodes in them, we need to update the "from" block to be the code // replacer, not the original block in the extracted region. - std::vector<BasicBlock *> Succs(succ_begin(codeReplacer), - succ_end(codeReplacer)); - for (unsigned i = 0, e = Succs.size(); i != e; ++i) - for (BasicBlock::iterator I = Succs[i]->begin(); isa<PHINode>(I); ++I) { - PHINode *PN = cast<PHINode>(I); - std::set<BasicBlock*> ProcessedPreds; - for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i) - if (Blocks.count(PN->getIncomingBlock(i))) { - if (ProcessedPreds.insert(PN->getIncomingBlock(i)).second) - PN->setIncomingBlock(i, codeReplacer); - else { - // There were multiple entries in the PHI for this block, now there - // is only one, so remove the duplicated entries. - PN->removeIncomingValue(i, false); - --i; --e; - } + for (BasicBlock *SuccBB : successors(codeReplacer)) { + for (PHINode &PN : SuccBB->phis()) { + Value *IncomingCodeReplacerVal = nullptr; + SmallVector<unsigned, 2> IncomingValsToRemove; + for (unsigned I = 0, E = PN.getNumIncomingValues(); I != E; ++I) { + BasicBlock *IncomingBB = PN.getIncomingBlock(I); + + // Ignore incoming values from outside of the extracted region. + if (!Blocks.count(IncomingBB)) + continue; + + // Ensure that there is only one incoming value from codeReplacer. + if (!IncomingCodeReplacerVal) { + PN.setIncomingBlock(I, codeReplacer); + IncomingCodeReplacerVal = PN.getIncomingValue(I); + } else { + assert(IncomingCodeReplacerVal == PN.getIncomingValue(I) && + "PHI has two incompatbile incoming values from codeRepl"); + IncomingValsToRemove.push_back(I); } + } + + for (unsigned I : reverse(IncomingValsToRemove)) + PN.removeIncomingValue(I, /*DeletePHIIfEmpty=*/false); } + } // Erase debug info intrinsics. Variable updates within the new function are // invisible to debuggers. This could be improved by defining a DISubprogram |