summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/MachineCombiner.cpp
diff options
context:
space:
mode:
authorFlorian Hahn <florian.hahn@arm.com>2017-12-06 20:27:33 +0000
committerFlorian Hahn <florian.hahn@arm.com>2017-12-06 20:27:33 +0000
commit001c3dd202b6dde7a745092ce44b817d4378f052 (patch)
tree9feb7c4175cc31cc590f3e20afe6aa089452b2be /llvm/lib/CodeGen/MachineCombiner.cpp
parent9e776fb0dc697c83951ea4f8eb6291aba34ed86e (diff)
downloadbcm5719-llvm-001c3dd202b6dde7a745092ce44b817d4378f052.tar.gz
bcm5719-llvm-001c3dd202b6dde7a745092ce44b817d4378f052.zip
[MachineCombiner] Add up latencies of all instructions in new pattern.
Summary: When calculating the RootLatency, we add up all the latencies of the deleted instructions. But for NewRootLatency we only add the latency of the new root instructions, ignoring the latencies of the other instructions inserted. This leads the combiner to underestimate the cost of patterns which add multiple instructions. This patch fixes that by summing up the latencies of all new instructions. For NewRootNode, the more complex getLatency function is used. Note that we may be slightly more precise than just summing up all latencies. For example, consider a pattern like r1 = INS1 .. r2 = INS2 .. r3 = INS3 r1, r2 I think in some other places, the total latency of the pattern would be estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider that worth changing, I think it would be best to do in a follow-up patch. Reviewers: Gerolf, sebpop, spop, fhahn Reviewed By: fhahn Subscribers: evandro, llvm-commits Differential Revision: https://reviews.llvm.org/D40307 llvm-svn: 319951
Diffstat (limited to 'llvm/lib/CodeGen/MachineCombiner.cpp')
-rw-r--r--llvm/lib/CodeGen/MachineCombiner.cpp11
1 files changed, 9 insertions, 2 deletions
diff --git a/llvm/lib/CodeGen/MachineCombiner.cpp b/llvm/lib/CodeGen/MachineCombiner.cpp
index f61db309ed7..26bee98c9aa 100644
--- a/llvm/lib/CodeGen/MachineCombiner.cpp
+++ b/llvm/lib/CodeGen/MachineCombiner.cpp
@@ -282,9 +282,16 @@ bool MachineCombiner::improvesCriticalPathLen(
// of the original code sequence. This may allow the transform to proceed
// even if the instruction depths (data dependency cycles) become worse.
- unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace);
- unsigned RootLatency = 0;
+ // Account for the latency of the inserted and deleted instructions by
+ // adding up their latencies. This assumes that the inserted and deleted
+ // instructions are dependent instruction chains, which might not hold
+ // in all cases.
+ unsigned NewRootLatency = 0;
+ for (unsigned i = 0; i < InsInstrs.size() - 1; i++)
+ NewRootLatency += TSchedModel.computeInstrLatency(InsInstrs[i]);
+ NewRootLatency += getLatency(Root, NewRoot, BlockTrace);
+ unsigned RootLatency = 0;
for (auto I : DelInstrs)
RootLatency += TSchedModel.computeInstrLatency(I);
OpenPOWER on IntegriCloud