| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As disscused in https://bugs.llvm.org/show_bug.cgi?id=43219,
i believe it may be somewhat useful to show //some// aggregates
over all the sea of statistics provided.
Example:
```
Average Wait times (based on the timeline view):
[0]: Executions
[1]: Average time spent waiting in a scheduler's queue
[2]: Average time spent waiting in a scheduler's queue while ready
[3]: Average time elapsed from WB until retire stage
[0] [1] [2] [3]
0. 3 1.0 1.0 4.7 vmulps %xmm0, %xmm1, %xmm2
1. 3 2.7 0.0 2.3 vhaddps %xmm2, %xmm2, %xmm3
2. 3 6.0 0.0 0.0 vhaddps %xmm3, %xmm3, %xmm4
3 3.2 0.3 2.3 <total>
```
I.e. we average the averages.
Reviewers: andreadb, mattd, RKSimon
Reviewed By: andreadb
Subscribers: gbedwell, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68714
llvm-svn: 374361
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bottleneck analysis.
This patch introduces a cut-off threshold for dependency edge frequences with
the goal of simplifying the critical sequence computation. This patch also
removes the cost normalization for loop carried dependencies. We didn't really
need to artificially amplify the cost of loop-carried dependencies since it is
already computed as the integral over time of the delay (in cycle).
In the absence of backend stalls there is no need for computing a critical
sequence. With this patch we early exit from the critical sequence computation
if no bottleneck was reported during the simulation.
llvm-svn: 372337
|
|
|
|
|
|
| |
D66424 adds the base support for LOCK so we should be able to add special case support for all these cases in future patches
llvm-svn: 369367
|
|
|
|
|
|
|
|
|
|
| |
resources-x86_64.s files. NFC
In D66424 it has been requested to move all the new tests added by r369278 into
resources-x86_64.s. That is because only the 8b/16 ops should be tested by
resources-cmpxchg.s. This partially reverts r369278.
llvm-svn: 369288
|
|
|
|
|
|
| |
Addresses a review comment in D66424
llvm-svn: 369279
|
|
|
|
|
|
|
|
| |
r364045. NFC
This should unbreak the ppc64 buildbots.
llvm-svn: 364048
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions based on the simulation.
This patch teaches the bottleneck analysis how to identify and print the most
expensive sequence of instructions according to the simulation. Fixes PR37494.
The goal is to help users identify the sequence of instruction which is most
critical for performance.
A dependency graph is internally used by the bottleneck analysis to describe
data dependencies and processor resource interferences between instructions.
There is one node in the graph for every instruction in the input assembly
sequence. The number of nodes in the graph is independent from the number of
iterations simulated by the tool. It means that a single node of the graph
represents all the possible instances of a same instruction contributed by the
simulated iterations.
Edges are dynamically "discovered" by the bottleneck analysis by observing
instruction state transitions and "backend pressure increase" events generated
by the Execute stage. Information from the events is used to identify critical
dependencies, and materialize edges in the graph. A dependency edge is uniquely
identified by a pair of node identifiers plus an instance of struct
DependencyEdge::Dependency (which provides more details about the actual
dependency kind).
The bottleneck analysis internally ranks dependency edges based on their impact
on the runtime (see field DependencyEdge::Dependency::Cost). To this end, each
edge of the graph has an associated cost. By default, the cost of an edge is a
function of its latency (in cycles). In practice, the cost of an edge is also a
function of the number of cycles where the dependency has been seen as
'contributing to backend pressure increases'. The idea is that the higher the
cost of an edge, the higher is the impact of the dependency on performance. To
put it in another way, the cost of an edge is a measure of criticality for
performance.
Note how a same edge may be found in multiple iteration of the simulated loop.
The logic that adds new edges to the graph checks if an equivalent dependency
already exists (duplicate edges are not allowed). If an equivalent dependency
edge is found, field DependencyEdge::Frequency of that edge is incremented by
one, and the new cost is cumulatively added to the existing edge cost.
At the end of simulation, costs are propagated to nodes through the edges of the
graph. The goal is to identify a critical sequence from a node of the root-set
(composed by node of the graph with no predecessors) to a 'sink node' with no
successors. Note that the graph is intentionally kept acyclic to minimize the
complexity of the critical sequence computation algorithm (complexity is
currently linear in the number of nodes in the graph).
The critical path is finally computed as a sequence of dependency edges. For
edges describing processor resource interferences, the view also prints a
so-called "interference probability" value (by dividing field
DependencyEdge::Frequency by the total number of iterations).
Examples of critical sequence computations can be found in tests added/modified
by this patch.
On output streams that support colored output, instructions from the critical
sequence are rendered with a different color.
Strictly speaking the analysis conducted by the bottleneck analysis view is not
a critical path analysis. The cost of an edge doesn't only depend on the
dependency latency. More importantly, the cost of a same edge may be computed
differently by different iterations.
The number of dependencies is discovered dynamically based on the events
generated by the simulator. However, their number is not fixed. This is
especially true for edges that model processor resource interferences; an
interference may not occur in every iteration. For that reason, it makes sense
to also print out a "probability of interference".
By construction, the accuracy of this analysis (as always) is strongly dependent
on the simulation (and therefore the quality of the information available in the
scheduling model).
That being said, the critical sequence effectively identifies a performance
criticality. Instructions from that sequence are expected to have a very big
impact on performance. So, users can take advantage of this information to focus
their attention on specific interactions between instructions.
In my experience, it works quite well in practice, and produces useful
output (in a reasonable amount time).
Differential Revision: https://reviews.llvm.org/D63543
llvm-svn: 364045
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
llvm.x86.sse.stmxcsr only writes to memory.
llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE.
Reviewers: craig.topper, RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62896
llvm-svn: 363773
|
|
|
|
|
|
|
|
|
|
| |
models. Add 256-bit fp xor to sandybridge zero idioms
This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate.
Differential Revision: https://reviews.llvm.org/D62360
llvm-svn: 361690
|
|
|
|
|
|
| |
This pre-commits tests for D62360
llvm-svn: 361689
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
printing.
We require d/q suffixes on the memory form of these instructions to disambiguate the memory size.
We don't require it on the register forms, but need to support parsing both with and without it.
Previously we always printed the d/q suffix on the register forms, but it's redundant and
inconsistent with gcc and objdump.
After this patch we should support the d/q for parsing, but not print it when its unneeded.
llvm-svn: 360085
|
|
|
|
|
|
| |
This is defined as part of SSE1, XMM PMOVMSKB doesn't appear until SSE2
llvm-svn: 359477
|
|
|
|
| |
llvm-svn: 359397
|
|
|
|
|
|
|
|
|
|
| |
custom printing and custom parsing to achieve the same result and more
Similar to previous change done for VPCOM and VPCMP
Differential Revision: https://reviews.llvm.org/D59468
llvm-svn: 356384
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Broadwell, and Skylake.
Rotate with explicit immediate is a single uop from Haswell on. An immediate of 1 has a dependency on the previous writer of flags, but the other immediate values do not.
The implicit rotate by 1 instruction is 2 uops. But the flags are merged after the rotate uop so the data result does not see the flag dependency. But I don't think we have any way of modeling that.
RORX is 1 uop without the load. 2 uops with the load. We currently model these with WriteShift/WriteShiftLd.
Differential Revision: https://reviews.llvm.org/D59077
llvm-svn: 355636
|
|
|
|
|
|
| |
Some targets have fast-path handling for these patterns that we should model.
llvm-svn: 355498
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Skylake.
Summary:
The AX/EAX/RAX with immediate forms are 2 uops just like the AL with immediate.
The modrm form with r8 and immediate is a single uop just like r16/r32/r64 with immediate.
Reviewers: RKSimon, andreadb
Reviewed By: RKSimon
Subscribers: gbedwell, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58581
llvm-svn: 354754
|
|
|
|
|
|
|
|
|
|
| |
arguments where on is %st.
All of these instructions consume one encoded register and the other register is %st. They either write the result to %st or the encoded register. Previously we printed both arguments when the encoded register was written. And we printed one argument when the result was written to %st. For the stack popping forms the encoded register is always the destination and we didn't print both operands. This was inconsistent with gcc and objdump and just makes the output assembly code harder to read.
This patch changes things to always print both operands making us consistent with gcc and objdump. The parser should still be able to handle the single register forms just as it did before. This also matches the GNU assembler behavior.
llvm-svn: 353061
|
|
|
|
|
|
|
|
| |
printing it as %st(0) when its encoded in the instruction.
This is a step back from the change I made in r352985. This appears to be more consistent with gcc and objdump behavior.
llvm-svn: 353015
|
|
|
|
|
|
|
|
|
|
| |
as the clobber name to make MS inline asm work correctly"
Looking into gcc and objdump behavior more this was overly aggressive. If the register is encoded in the instruction we should print %st(0), if its implicit we should print %st.
I'll be making a more directed change in a future patch.
llvm-svn: 353013
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
name to make MS inline asm work correctly
Summary:
When calculating clobbers for MS style inline assembly we fail if the asm clobbers stack top because we print st(0) and try to pass it through the gcc register name check. This was found with when I attempted to make a emms/femms clobber all ST registers. If you use emms/femms in MS inline asm we would try to use st(0) as the clobber name but clang would think that wasn't a valid clobber name.
This also matches what objdump disassembly prints. It's also what is printed by gcc -S.
Reviewers: RKSimon, rnk, efriedma, spatel, andreadb, lebedev.ri
Reviewed By: rnk
Subscribers: eraman, gbedwell, lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D57621
llvm-svn: 352985
|
|
|
|
|
|
| |
We're getting pretty close to matching/exceeding test coverage of the test\CodeGen\X86\*-schedule.ll files, which should allow us to get rid of -print-schedule and fix PR37160
llvm-svn: 351836
|
|
|
|
|
|
| |
and rdtsc/rdtscp tests
llvm-svn: 351835
|
|
|
|
| |
llvm-svn: 351831
|
|
|
|
|
|
| |
These technically should be under a MONITOR cpuid bit, but we tag them as SSE3 so I've done that here as well.
llvm-svn: 351829
|
|
|
|
| |
llvm-svn: 351828
|
|
|
|
| |
llvm-svn: 351827
|
|
|
|
|
|
| |
Add missing non-VEX instructions
llvm-svn: 348623
|
|
|
|
| |
llvm-svn: 348622
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Starting from SNB, VZEROUPPER is handled by the renamer and uses no proc resources.
After HSW, it also has zero latency.
This fixes PR35606.
To reproduce:
Uops:
llvm-exegesis -mode=uops -opcode-name=VZEROUPPER
Latency:
echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper' | /tmp/llvm-exegesis -mode=latency -snippets-file=-
echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper\naddps %xmm0, %xmm1' | /tmp/llvm-exegesis -mode=latency -snippets-file=-
Reviewers: RKSimon, craig.topper, andreadb
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D54107
llvm-svn: 346482
|
|
|
|
| |
llvm-svn: 343421
|
|
|
|
|
|
|
|
| |
Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases.
This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions.
llvm-svn: 342892
|
|
|
|
|
|
|
|
| |
Confirmed with Craig Topper - fix a typo that was missing a Port4 uop for ROR*mCL instructions on some Intel models.
Yet another step on the scheduler model cleanup marathon......
llvm-svn: 342846
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49912
llvm-svn: 339145
|
|
|
|
|
|
| |
I've put CMPXCHG8B/CMPXCHG16B in the same file, even though technically they are under separate CPUID bits all targets seem to support both (or neither).
llvm-svn: 338595
|
|
|
|
|
|
| |
These aren't just available via 3DNow! so test for them separately as well.
llvm-svn: 338584
|
|
|
|
|
|
| |
Renamed the btver2 file that already contained them - the other targets were only testing the AVX versions
llvm-svn: 338583
|
|
|
|
| |
llvm-svn: 338576
|
|
|
|
|
|
| |
We already added these to btver2, now add them to other targets, even though none of their models treat them specially (yet).
llvm-svn: 338565
|
|
|
|
|
|
| |
CPUID, IN/OUT, INS/OUTS, INT, PAUSE, SCAS, UD2, XLAT
llvm-svn: 338563
|
|
|
|
| |
llvm-svn: 338550
|
|
|
|
| |
llvm-svn: 338532
|
|
|
|
| |
llvm-svn: 338514
|
|
|
|
|
|
| |
These aren't exhaustive, but cover some instructions that are only available in 32-bit mode (where would we be without good BCD math performance?).
llvm-svn: 338404
|
|
|
|
| |
llvm-svn: 337586
|
|
|
|
|
|
| |
x86_64 resource tests
llvm-svn: 337306
|
|
|
|
|
|
| |
SNB doesn't support MOVBE but the numbers in Generic (which use the SNB model) look sane.
llvm-svn: 337305
|
|
|
|
| |
llvm-svn: 337302
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was
inferred directly from the "default" pattern associated with the instruction
definition.
r336728 removed special node X86Movlps, and all the patterns associated to it.
Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the
'mayLoad/hasSideEffects' flags are left unset.
When the instruction info is emitted by tablegen, method
CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a
pattern, and flags are undefined. So, it conservatively sets the
"hasSideEffects" flag for it.
As a consequence, we were losing the 'mayLoad' flag, and we were gaining a
'hasSideEffect' flag in its place.
This patch fixes the issue (originally reported by Michael Holmen).
The mca tests show the differences in the instruction info flags. Instructions
that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm.
Differential Revision: https://reviews.llvm.org/D49182
llvm-svn: 336818
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in the Instruction Info View. NFC
This makes easier to identify changes in the instruction info flags. It also
helps spotting potential regressions similar to the one recently introduced at
r336728.
Using the same character to mark MayLoad/MayStore/HasSideEffects is problematic
for llvm-lit. When pattern matching substrings, llvm-lit consumes tabs and
spaces. A change in position of the flag marker may not trigger a test failure.
This patch only changes the character used for flag `hasSideEffects`. The reason
why I didn't touch other flags is because I want to avoid spamming the mailing
because of the massive diff due to the numerous tests affected by this change.
In future, each instruction flag should be associated with a different character
in the Instruction Info View.
llvm-svn: 336797
|