[MachineCombiner] Add up latencies of all instructions in new pattern.

Summary: When calculating the RootLatency, we add up all the latencies of the deleted instructions. But for NewRootLatency we only add the latency of the new root instructions, ignoring the latencies of the other instructions inserted. This leads the combiner to underestimate the cost of patterns which add multiple instructions. This patch fixes that by summing up the latencies of all new instructions. For NewRootNode, the more complex getLatency function is used. Note that we may be slightly more precise than just summing up all latencies. For example, consider a pattern like r1 = INS1 .. r2 = INS2 .. r3 = INS3 r1, r2 I think in some other places, the total latency of the pattern would be estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider that worth changing, I think it would be best to do in a follow-up patch. Reviewers: Gerolf, sebpop, spop, fhahn Reviewed By: fhahn Subscribers: evandro, llvm-commits Differential Revision: https://reviews.llvm.org/D40307 llvm-svn: 319951
author: Florian Hahn <florian.hahn@arm.com> 2017-12-06 20:27:33 +0000
committer: Florian Hahn <florian.hahn@arm.com> 2017-12-06 20:27:33 +0000
commit: 001c3dd202b6dde7a745092ce44b817d4378f052 (patch)
tree: 9feb7c4175cc31cc590f3e20afe6aa089452b2be /llvm/lib/CodeGen/MachineCombiner.cpp
parent: 9e776fb0dc697c83951ea4f8eb6291aba34ed86e (diff)
download: bcm5719-llvm-001c3dd202b6dde7a745092ce44b817d4378f052.tar.gz
bcm5719-llvm-001c3dd202b6dde7a745092ce44b817d4378f052.zip
1 files changed, 9 insertions, 2 deletions
diff --git a/llvm/lib/CodeGen/MachineCombiner.cpp b/llvm/lib/CodeGen/MachineCombiner.cpp
index f61db309ed7..26bee98c9aa 100644
--- a/llvm/lib/CodeGen/MachineCombiner.cpp
+++ b/llvm/lib/CodeGen/MachineCombiner.cpp
@@ -282,9 +282,16 @@ bool MachineCombiner::improvesCriticalPathLen(
   // of the original code sequence. This may allow the transform to proceed
   // even if the instruction depths (data dependency cycles) become worse.
 
-  unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace);
-  unsigned RootLatency = 0;
+  // Account for the latency of the inserted and deleted instructions by
+  // adding up their latencies. This assumes that the inserted and deleted
+  // instructions are dependent instruction chains, which might not hold
+  // in all cases.
+  unsigned NewRootLatency = 0;
+  for (unsigned i = 0; i < InsInstrs.size() - 1; i++)
+    NewRootLatency += TSchedModel.computeInstrLatency(InsInstrs[i]);
+  NewRootLatency += getLatency(Root, NewRoot, BlockTrace);
 
+  unsigned RootLatency = 0;
   for (auto I : DelInstrs)
     RootLatency += TSchedModel.computeInstrLatency(I);
author	Florian Hahn <florian.hahn@arm.com>	2017-12-06 20:27:33 +0000
committer	Florian Hahn <florian.hahn@arm.com>	2017-12-06 20:27:33 +0000
commit	001c3dd202b6dde7a745092ce44b817d4378f052 (patch)
tree	9feb7c4175cc31cc590f3e20afe6aa089452b2be /llvm/lib/CodeGen/MachineCombiner.cpp
parent	9e776fb0dc697c83951ea4f8eb6291aba34ed86e (diff)
download	bcm5719-llvm-001c3dd202b6dde7a745092ce44b817d4378f052.tar.gz bcm5719-llvm-001c3dd202b6dde7a745092ce44b817d4378f052.zip