summaryrefslogtreecommitdiffstats
path: root/llvm/docs/CommandGuide
diff options
context:
space:
mode:
authorAndrea Di Biagio <Andrea_DiBiagio@sn.scee.net>2018-08-29 17:56:39 +0000
committerAndrea Di Biagio <Andrea_DiBiagio@sn.scee.net>2018-08-29 17:56:39 +0000
commita2eee4745083e805a7c869d250d5c787b827b75b (patch)
treedb1fe3376f305f3cd8b6164a6428a6495118b161 /llvm/docs/CommandGuide
parent5221e17fd62b465e94631bf522d6789dd13d5854 (diff)
downloadbcm5719-llvm-a2eee4745083e805a7c869d250d5c787b827b75b.tar.gz
bcm5719-llvm-a2eee4745083e805a7c869d250d5c787b827b75b.zip
[llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView.
This patch adds two new fields to the perf report generated by the SummaryView. Fields are now logically organized into two small groups; only the second group contains throughput indicators. Example: ``` Iterations: 100 Instructions: 300 Total Cycles: 414 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 1.69 IPC: 0.72 Block RThroughput: 4.0 ``` This patch also updates the docs for llvm-mca. Due to the nature of this change, several tests in the tools/llvm-mca directory were affected, and had to be updated using script `update_mca_test_checks.py`. llvm-svn: 340946
Diffstat (limited to 'llvm/docs/CommandGuide')
-rw-r--r--llvm/docs/CommandGuide/llvm-mca.rst53
1 files changed, 33 insertions, 20 deletions
diff --git a/llvm/docs/CommandGuide/llvm-mca.rst b/llvm/docs/CommandGuide/llvm-mca.rst
index 5dcd97fb113..43e64c329c9 100644
--- a/llvm/docs/CommandGuide/llvm-mca.rst
+++ b/llvm/docs/CommandGuide/llvm-mca.rst
@@ -238,7 +238,10 @@ the following command using the example located at
Iterations: 300
Instructions: 900
Total Cycles: 610
+ Total uOps: 900
+
Dispatch Width: 2
+ uOps Per Cycle: 1.48
IPC: 1.48
Block RThroughput: 2.0
@@ -285,35 +288,45 @@ the following command using the example located at
- - - 1.00 - 1.00 - - - - - - - - vhaddps %xmm3, %xmm3, %xmm4
According to this report, the dot-product kernel has been executed 300 times,
-for a total of 900 dynamically executed instructions.
+for a total of 900 simulated instructions. The total number of simulated micro
+opcodes (uOps) is also 900.
The report is structured in three main sections. The first section collects a
few performance numbers; the goal of this section is to give a very quick
-overview of the performance throughput. In this example, the two important
-performance indicators are **IPC** and **Block RThroughput** (Block Reciprocal
+overview of the performance throughput. Important performance indicators are
+**IPC**, **uOps Per Cycle**, and **Block RThroughput** (Block Reciprocal
Throughput).
IPC is computed dividing the total number of simulated instructions by the total
-number of cycles. A delta between Dispatch Width and IPC is an indicator of a
-performance issue. In the absence of loop-carried data dependencies, the
+number of cycles. In the absence of loop-carried data dependencies, the
observed IPC tends to a theoretical maximum which can be computed by dividing
the number of instructions of a single iteration by the *Block RThroughput*.
-IPC is bounded from above by the dispatch width. That is because the dispatch
-width limits the maximum size of a dispatch group. IPC is also limited by the
-amount of hardware parallelism. The availability of hardware resources affects
-the resource pressure distribution, and it limits the number of instructions
-that can be executed in parallel every cycle. A delta between Dispatch
-Width and the theoretical maximum IPC is an indicator of a performance
-bottleneck caused by the lack of hardware resources. In general, the lower the
-Block RThroughput, the better.
-
-In this example, ``Instructions per iteration/Block RThroughput`` is 1.50. Since
-there are no loop-carried dependencies, the observed IPC is expected to approach
-1.50 when the number of iterations tends to infinity. The delta between the
-Dispatch Width (2.00), and the theoretical maximum IPC (1.50) is an indicator of
-a performance bottleneck caused by the lack of hardware resources, and the
-*Resource pressure view* can help to identify the problematic resource usage.
+Field 'uOps Per Cycle' is computed dividing the total number of simulated micro
+opcodes by the total number of cycles. A delta between Dispatch Width and this
+field is an indicator of a performance issue. In the absence of loop-carried
+data dependencies, the observed 'uOps Per Cycle' should tend to a theoretical
+maximum throughput which can be computed by dividing the number of uOps of a
+single iteration by the *Block RThroughput*.
+
+Field *uOps Per Cycle* is bounded from above by the dispatch width. That is
+because the dispatch width limits the maximum size of a dispatch group. Both IPC
+and 'uOps Per Cycle' are limited by the amount of hardware parallelism. The
+availability of hardware resources affects the resource pressure distribution,
+and it limits the number of instructions that can be executed in parallel every
+cycle. A delta between Dispatch Width and the theoretical maximum uOps per
+Cycle (computed by dividing the number of uOps of a single iteration by the
+*Block RTrhoughput*) is an indicator of a performance bottleneck caused by the
+lack of hardware resources.
+In general, the lower the Block RThroughput, the better.
+
+In this example, ``uOps per iteration/Block RThroughput`` is 1.50. Since there
+are no loop-carried dependencies, the observed *uOps Per Cycle* is expected to
+approach 1.50 when the number of iterations tends to infinity. The delta between
+the Dispatch Width (2.00), and the theoretical maximum uOp throughput (1.50) is
+an indicator of a performance bottleneck caused by the lack of hardware
+resources, and the *Resource pressure view* can help to identify the problematic
+resource usage.
The second section of the report shows the latency and reciprocal
throughput of every instruction in the sequence. That section also reports
OpenPOWER on IntegriCloud