| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As disscused in https://bugs.llvm.org/show_bug.cgi?id=43219,
i believe it may be somewhat useful to show //some// aggregates
over all the sea of statistics provided.
Example:
```
Average Wait times (based on the timeline view):
[0]: Executions
[1]: Average time spent waiting in a scheduler's queue
[2]: Average time spent waiting in a scheduler's queue while ready
[3]: Average time elapsed from WB until retire stage
[0] [1] [2] [3]
0. 3 1.0 1.0 4.7 vmulps %xmm0, %xmm1, %xmm2
1. 3 2.7 0.0 2.3 vhaddps %xmm2, %xmm2, %xmm3
2. 3 6.0 0.0 0.0 vhaddps %xmm3, %xmm3, %xmm4
3 3.2 0.3 2.3 <total>
```
I.e. we average the averages.
Reviewers: andreadb, mattd, RKSimon
Reviewed By: andreadb
Subscribers: gbedwell, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68714
llvm-svn: 374361
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Single operand MUL instructions that implicitly set EAX have the following
latency/throughput profile (see below):
imul %cl # latency: 3cy - uOPs: 1 - 1 JMul
imul %cx # latency: 3cy - uOPs: 3 - 3 JMul
imul %ecx # latency: 3cy - uOPs: 2 - 2 JMul
imul %rcx # latency: 6cy - uOPs: 2 - 4 JMul
mul %cl # latency: 3cy - uOPs: 1 - 1 JMul
mul %cx # latency: 3cy - uOPs: 3 - 3 JMul
mul %ecx # latency: 3cy - uOPs: 2 - 2 JMul
mul %rcx # latency: 6cy - uOPs: 2 - 4 JMul
Excluding the 64bit variant, which has a latency of 6cy, every other instruction
has a latency of 3cy. However, the number of decoded macro-opcodes (as well as
the resource cyles) depend on the MUL size.
The two operand MULs have a more predictable profile (see below):
imul %dx, %dx # latency: 3cy - uOPs: 1 - 1 JMul
imul %edx, %edx # latency: 3cy - uOPs: 1 - 1 JMul
imul %rdx, %rdx # latency: 6cy - uOPs: 1 - 4 JMul
imul $3, %dx, %dx # latency: 4cy - uOPs: 2 - 2 JMul
imul $3, %ecx, %ecx # latency: 3cy - uOPs: 1 - 1 JMul
imul $3, %rdx, %rdx # latency: 6cy - uOPs: 1 - 4 JMul
This patch updates the values in the Jaguar scheduling model and regenerates
llvm-mca tests.
Differential Revision: https://reviews.llvm.org/D66547
llvm-svn: 369661
|
|
This patch fixes an invalid memory read introduced by r346487.
Before this patch, partial register write had to query the latency of the
dependent full register write by calling a method on the full write descriptor.
However, if the full write is from an already retired instruction, chances are
that the EntryStage already reclaimed its memory.
In some parial register write tests, valgrind was reporting an invalid
memory read.
This change fixes the invalid memory access problem. Writes are now responsible
for tracking dependent partial register writes, and notify them in the event of
instruction issued.
That means, partial register writes no longer need to query their associated
full write to check when they are ready to execute.
Added test X86/BtVer2/partial-reg-update-7.s
llvm-svn: 347459
|