summaryrefslogtreecommitdiffstats
path: root/llvm/test/tools/llvm-mca
diff options
context:
space:
mode:
authorAndrea Di Biagio <Andrea_DiBiagio@sn.scee.net>2019-03-04 11:52:34 +0000
committerAndrea Di Biagio <Andrea_DiBiagio@sn.scee.net>2019-03-04 11:52:34 +0000
commitbe3281a281e36c416df469ed81a4e398132da953 (patch)
tree27b3b7c0c61326410c4a693836e43068bc0f77db /llvm/test/tools/llvm-mca
parent09d8ea5282505251a3da5cebb6ec7c7e0e685db2 (diff)
downloadbcm5719-llvm-be3281a281e36c416df469ed81a4e398132da953.tar.gz
bcm5719-llvm-be3281a281e36c416df469ed81a4e398132da953.zip
[MCA] Highlight kernel bottlenecks in the summary view.
This patch adds a new flag named -bottleneck-analysis to print out information about throughput bottlenecks. MCA knows how to identify and classify dynamic dispatch stalls. However, it doesn't know how to analyze and highlight kernel bottlenecks. The goal of this patch is to teach MCA how to correlate increases in backend pressure to backend stalls (and therefore, the loss of throughput). From a Scheduler point of view, backend pressure is a function of the scheduler buffer usage (i.e. how the number of uOps in the scheduler buffers changes over time). Backend pressure increases (or decreases) when there is a mismatch between the number of opcodes dispatched, and the number of opcodes issued in the same cycle. Since buffer resources are limited, continuous increases in backend pressure would eventually leads to dispatch stalls. So, there is a strong correlation between dispatch stalls, and how backpressure changed over time. This patch teaches how to identify situations where backend pressure increases due to: - unavailable pipeline resources. - data dependencies. Data dependencies may delay execution of instructions and therefore increase the time that uOps have to spend in the scheduler buffers. That often translates to an increase in backend pressure which may eventually lead to a bottleneck. Contention on pipeline resources may also delay execution of instructions, and lead to a temporary increase in backend pressure. Internally, the Scheduler classifies instructions based on whether register / memory operands are available or not. An instruction is marked as "ready to execute" only if data dependencies are fully resolved. Every cycle, the Scheduler attempts to execute all instructions that are ready to execute. If an instruction cannot execute because of unavailable pipeline resources, then the Scheduler internally updates a BusyResourceUnits mask with the ID of each unavailable resource. ExecuteStage is responsible for tracking changes in backend pressure. If backend pressure increases during a cycle because of contention on pipeline resources, then ExecuteStage sends a "backend pressure" event to the listeners. That event would contain information about instructions delayed by resource pressure, as well as the BusyResourceUnits mask. Note that ExecuteStage also knows how to identify situations where backpressure increased because of delays introduced by data dependencies. The SummaryView observes "backend pressure" events and prints out a "bottleneck report". Example of bottleneck report: ``` Cycles with backend pressure increase [ 99.89% ] Throughput Bottlenecks: Resource Pressure [ 0.00% ] Data Dependencies: [ 99.89% ] - Register Dependencies [ 0.00% ] - Memory Dependencies [ 99.89% ] ``` A bottleneck report is printed out only if increases in backend pressure eventually caused backend stalls. About the time complexity: Time complexity is linear in the number of instructions in the Scheduler::PendingSet. The average slowdown tends to be in the range of ~5-6%. For memory intensive kernels, the slowdown can be significant if flag -noalias=false is specified. In the worst case scenario I have observed a slowdown of ~30% when flag -noalias=false was specified. We can definitely recover part of that slowdown if we optimize class LSUnit (by doing extra bookkeeping to speedup queries). For now, this new analysis is disabled by default, and it can be enabled via flag -bottleneck-analysis. Users of MCA as a library can enable the generation of pressure events through the constructor of ExecuteStage. This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494 Differential Revision: https://reviews.llvm.org/D58728 llvm-svn: 355308
Diffstat (limited to 'llvm/test/tools/llvm-mca')
-rw-r--r--llvm/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-1.s85
-rw-r--r--llvm/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-2.s72
-rw-r--r--llvm/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-3.s106
3 files changed, 263 insertions, 0 deletions
diff --git a/llvm/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-1.s b/llvm/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-1.s
new file mode 100644
index 00000000000..16577cf8b39
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-1.s
@@ -0,0 +1,85 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=1 -bottleneck-analysis < %s | FileCheck %s
+
+add %eax, %ebx
+add %ebx, %ecx
+add %ecx, %edx
+add %edx, %eax
+
+# CHECK: Iterations: 100
+# CHECK-NEXT: Instructions: 400
+# CHECK-NEXT: Total Cycles: 403
+# CHECK-NEXT: Total uOps: 400
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.99
+# CHECK-NEXT: IPC: 0.99
+# CHECK-NEXT: Block RThroughput: 2.0
+
+# CHECK: Cycles with backend pressure increase [ 94.04% ]
+# CHECK-NEXT: Throughput Bottlenecks:
+# CHECK-NEXT: Resource Pressure [ 0.00% ]
+# CHECK-NEXT: Data Dependencies: [ 94.04% ]
+# CHECK-NEXT: - Register Dependencies [ 94.04% ]
+# CHECK-NEXT: - Memory Dependencies [ 0.00% ]
+
+# CHECK: Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
+# CHECK-NEXT: 1 1 0.50 addl %eax, %ebx
+# CHECK-NEXT: 1 1 0.50 addl %ebx, %ecx
+# CHECK-NEXT: 1 1 0.50 addl %ecx, %edx
+# CHECK-NEXT: 1 1 0.50 addl %edx, %eax
+
+# CHECK: Resources:
+# CHECK-NEXT: [0] - JALU0
+# CHECK-NEXT: [1] - JALU1
+# CHECK-NEXT: [2] - JDiv
+# CHECK-NEXT: [3] - JFPA
+# CHECK-NEXT: [4] - JFPM
+# CHECK-NEXT: [5] - JFPU0
+# CHECK-NEXT: [6] - JFPU1
+# CHECK-NEXT: [7] - JLAGU
+# CHECK-NEXT: [8] - JMul
+# CHECK-NEXT: [9] - JSAGU
+# CHECK-NEXT: [10] - JSTC
+# CHECK-NEXT: [11] - JVALU0
+# CHECK-NEXT: [12] - JVALU1
+# CHECK-NEXT: [13] - JVIMUL
+
+# CHECK: Resource pressure per iteration:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
+# CHECK-NEXT: 2.00 2.00 - - - - - - - - - - - -
+
+# CHECK: Resource pressure by instruction:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
+# CHECK-NEXT: - 1.00 - - - - - - - - - - - - addl %eax, %ebx
+# CHECK-NEXT: 1.00 - - - - - - - - - - - - - addl %ebx, %ecx
+# CHECK-NEXT: - 1.00 - - - - - - - - - - - - addl %ecx, %edx
+# CHECK-NEXT: 1.00 - - - - - - - - - - - - - addl %edx, %eax
+
+# CHECK: Timeline view:
+# CHECK-NEXT: Index 0123456
+
+# CHECK: [0,0] DeER .. addl %eax, %ebx
+# CHECK-NEXT: [0,1] D=eER.. addl %ebx, %ecx
+# CHECK-NEXT: [0,2] .D=eER. addl %ecx, %edx
+# CHECK-NEXT: [0,3] .D==eER addl %edx, %eax
+
+# CHECK: Average Wait times (based on the timeline view):
+# CHECK-NEXT: [0]: Executions
+# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# CHECK: [0] [1] [2] [3]
+# CHECK-NEXT: 0. 1 1.0 1.0 0.0 addl %eax, %ebx
+# CHECK-NEXT: 1. 1 2.0 0.0 0.0 addl %ebx, %ecx
+# CHECK-NEXT: 2. 1 2.0 0.0 0.0 addl %ecx, %edx
+# CHECK-NEXT: 3. 1 3.0 0.0 0.0 addl %edx, %eax
diff --git a/llvm/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-2.s b/llvm/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-2.s
new file mode 100644
index 00000000000..83444d422ad
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-2.s
@@ -0,0 +1,72 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=100 -timeline -timeline-max-iterations=1 -bottleneck-analysis < %s | FileCheck %s
+
+vhaddps %xmm0, %xmm0, %xmm1
+
+# CHECK: Iterations: 100
+# CHECK-NEXT: Instructions: 100
+# CHECK-NEXT: Total Cycles: 106
+# CHECK-NEXT: Total uOps: 100
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.94
+# CHECK-NEXT: IPC: 0.94
+# CHECK-NEXT: Block RThroughput: 1.0
+
+# CHECK: Cycles with backend pressure increase [ 76.42% ]
+# CHECK-NEXT: Throughput Bottlenecks:
+# CHECK-NEXT: Resource Pressure [ 76.42% ]
+# CHECK-NEXT: - JFPA [ 76.42% ]
+# CHECK-NEXT: - JFPU0 [ 76.42% ]
+# CHECK-NEXT: Data Dependencies: [ 0.00% ]
+# CHECK-NEXT: - Register Dependencies [ 0.00% ]
+# CHECK-NEXT: - Memory Dependencies [ 0.00% ]
+
+# CHECK: Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
+# CHECK-NEXT: 1 4 1.00 vhaddps %xmm0, %xmm0, %xmm1
+
+# CHECK: Resources:
+# CHECK-NEXT: [0] - JALU0
+# CHECK-NEXT: [1] - JALU1
+# CHECK-NEXT: [2] - JDiv
+# CHECK-NEXT: [3] - JFPA
+# CHECK-NEXT: [4] - JFPM
+# CHECK-NEXT: [5] - JFPU0
+# CHECK-NEXT: [6] - JFPU1
+# CHECK-NEXT: [7] - JLAGU
+# CHECK-NEXT: [8] - JMul
+# CHECK-NEXT: [9] - JSAGU
+# CHECK-NEXT: [10] - JSTC
+# CHECK-NEXT: [11] - JVALU0
+# CHECK-NEXT: [12] - JVALU1
+# CHECK-NEXT: [13] - JVIMUL
+
+# CHECK: Resource pressure per iteration:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
+# CHECK-NEXT: - - - 1.00 - 1.00 - - - - - - - -
+
+# CHECK: Resource pressure by instruction:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
+# CHECK-NEXT: - - - 1.00 - 1.00 - - - - - - - - vhaddps %xmm0, %xmm0, %xmm1
+
+# CHECK: Timeline view:
+# CHECK-NEXT: Index 0123456
+
+# CHECK: [0,0] DeeeeER vhaddps %xmm0, %xmm0, %xmm1
+
+# CHECK: Average Wait times (based on the timeline view):
+# CHECK-NEXT: [0]: Executions
+# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# CHECK: [0] [1] [2] [3]
+# CHECK-NEXT: 0. 1 1.0 1.0 0.0 vhaddps %xmm0, %xmm0, %xmm1
diff --git a/llvm/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-3.s b/llvm/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-3.s
new file mode 100644
index 00000000000..6cd613a52fc
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-3.s
@@ -0,0 +1,106 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -noalias=false -timeline -timeline-max-iterations=1 -bottleneck-analysis < %s | FileCheck %s
+
+vmovaps (%rsi), %xmm0
+vmovaps %xmm0, (%rdi)
+vmovaps 16(%rsi), %xmm0
+vmovaps %xmm0, 16(%rdi)
+vmovaps 32(%rsi), %xmm0
+vmovaps %xmm0, 32(%rdi)
+vmovaps 48(%rsi), %xmm0
+vmovaps %xmm0, 48(%rdi)
+
+# CHECK: Iterations: 1500
+# CHECK-NEXT: Instructions: 12000
+# CHECK-NEXT: Total Cycles: 36003
+# CHECK-NEXT: Total uOps: 12000
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.33
+# CHECK-NEXT: IPC: 0.33
+# CHECK-NEXT: Block RThroughput: 4.0
+
+# CHECK: Cycles with backend pressure increase [ 99.89% ]
+# CHECK-NEXT: Throughput Bottlenecks:
+# CHECK-NEXT: Resource Pressure [ 0.00% ]
+# CHECK-NEXT: Data Dependencies: [ 99.89% ]
+# CHECK-NEXT: - Register Dependencies [ 0.00% ]
+# CHECK-NEXT: - Memory Dependencies [ 99.89% ]
+
+# CHECK: Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
+# CHECK-NEXT: 1 5 1.00 * vmovaps (%rsi), %xmm0
+# CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, (%rdi)
+# CHECK-NEXT: 1 5 1.00 * vmovaps 16(%rsi), %xmm0
+# CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, 16(%rdi)
+# CHECK-NEXT: 1 5 1.00 * vmovaps 32(%rsi), %xmm0
+# CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, 32(%rdi)
+# CHECK-NEXT: 1 5 1.00 * vmovaps 48(%rsi), %xmm0
+# CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, 48(%rdi)
+
+# CHECK: Resources:
+# CHECK-NEXT: [0] - JALU0
+# CHECK-NEXT: [1] - JALU1
+# CHECK-NEXT: [2] - JDiv
+# CHECK-NEXT: [3] - JFPA
+# CHECK-NEXT: [4] - JFPM
+# CHECK-NEXT: [5] - JFPU0
+# CHECK-NEXT: [6] - JFPU1
+# CHECK-NEXT: [7] - JLAGU
+# CHECK-NEXT: [8] - JMul
+# CHECK-NEXT: [9] - JSAGU
+# CHECK-NEXT: [10] - JSTC
+# CHECK-NEXT: [11] - JVALU0
+# CHECK-NEXT: [12] - JVALU1
+# CHECK-NEXT: [13] - JVIMUL
+
+# CHECK: Resource pressure per iteration:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
+# CHECK-NEXT: - - - 2.00 2.00 4.00 4.00 4.00 - 4.00 4.00 - - -
+
+# CHECK: Resource pressure by instruction:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
+# CHECK-NEXT: - - - - 1.00 1.00 - 1.00 - - - - - - vmovaps (%rsi), %xmm0
+# CHECK-NEXT: - - - - - - 1.00 - - 1.00 1.00 - - - vmovaps %xmm0, (%rdi)
+# CHECK-NEXT: - - - 1.00 - 1.00 - 1.00 - - - - - - vmovaps 16(%rsi), %xmm0
+# CHECK-NEXT: - - - - - - 1.00 - - 1.00 1.00 - - - vmovaps %xmm0, 16(%rdi)
+# CHECK-NEXT: - - - - 1.00 1.00 - 1.00 - - - - - - vmovaps 32(%rsi), %xmm0
+# CHECK-NEXT: - - - - - - 1.00 - - 1.00 1.00 - - - vmovaps %xmm0, 32(%rdi)
+# CHECK-NEXT: - - - 1.00 - 1.00 - 1.00 - - - - - - vmovaps 48(%rsi), %xmm0
+# CHECK-NEXT: - - - - - - 1.00 - - 1.00 1.00 - - - vmovaps %xmm0, 48(%rdi)
+
+# CHECK: Timeline view:
+# CHECK-NEXT: 0123456789
+# CHECK-NEXT: Index 0123456789 0123456
+
+# CHECK: [0,0] DeeeeeER . . . .. vmovaps (%rsi), %xmm0
+# CHECK-NEXT: [0,1] D=====eER . . . .. vmovaps %xmm0, (%rdi)
+# CHECK-NEXT: [0,2] .D=====eeeeeER . . .. vmovaps 16(%rsi), %xmm0
+# CHECK-NEXT: [0,3] .D==========eER. . .. vmovaps %xmm0, 16(%rdi)
+# CHECK-NEXT: [0,4] . D==========eeeeeER. .. vmovaps 32(%rsi), %xmm0
+# CHECK-NEXT: [0,5] . D===============eER .. vmovaps %xmm0, 32(%rdi)
+# CHECK-NEXT: [0,6] . D===============eeeeeER. vmovaps 48(%rsi), %xmm0
+# CHECK-NEXT: [0,7] . D====================eER vmovaps %xmm0, 48(%rdi)
+
+# CHECK: Average Wait times (based on the timeline view):
+# CHECK-NEXT: [0]: Executions
+# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# CHECK: [0] [1] [2] [3]
+# CHECK-NEXT: 0. 1 1.0 1.0 0.0 vmovaps (%rsi), %xmm0
+# CHECK-NEXT: 1. 1 6.0 0.0 0.0 vmovaps %xmm0, (%rdi)
+# CHECK-NEXT: 2. 1 6.0 0.0 0.0 vmovaps 16(%rsi), %xmm0
+# CHECK-NEXT: 3. 1 11.0 0.0 0.0 vmovaps %xmm0, 16(%rdi)
+# CHECK-NEXT: 4. 1 11.0 0.0 0.0 vmovaps 32(%rsi), %xmm0
+# CHECK-NEXT: 5. 1 16.0 0.0 0.0 vmovaps %xmm0, 32(%rdi)
+# CHECK-NEXT: 6. 1 16.0 0.0 0.0 vmovaps 48(%rsi), %xmm0
+# CHECK-NEXT: 7. 1 21.0 0.0 0.0 vmovaps %xmm0, 48(%rdi)
OpenPOWER on IntegriCloud