| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
This change adds a couple of tests to verify the change introduced by revision
328823 ([X86] Correct the placement of ReadAfterLd in BEXTR and BZHI).
llvm-svn: 328859
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tool was passing the wrong operand index to method
MCSubtargetInfo::getReadAdvanceCycles(). That method requires a "UseIdx", and
not the operand index. This was found when testing X86 code where instructions
had a memory folded operand.
This patch fixes the issue and adds test read-advance-1.s to ensure that
the ReadAfterLd (a ReadAdvance of 3cy) information is correctly used.
llvm-svn: 328790
|
|
|
|
|
|
|
|
|
|
|
| |
instructions.
Similar to r328694. The number of micro opcodes should be 2 for those
instructions.
This was found when testing AVX code for BtVer2 using llvm-mca.
llvm-svn: 328698
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Jaguar backend natively supports 128-bit data types. Operations on YMM
registers are split into two COPs (complex operations). Each COP consumes a slot
in the dispatch group, and in the reorder buffer.
The scheduling model for Jaguar should mark those instructions as `let
NumMicroOps = 2`.
This was found when testing AVX code for BtVer2 using llvm-mca.
llvm-svn: 328694
|
|
|
|
|
|
|
|
|
| |
We were incorrectly initializing the array of used registers in method checkRAT.
As a consequence, the number of register file stalls was misreported.
Added a test to cover this case.
llvm-svn: 328629
|
|
|
|
|
|
| |
Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write)
llvm-svn: 328573
|
|
|
|
|
|
| |
We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR.....
llvm-svn: 328551
|
|
|
|
| |
llvm-svn: 328541
|
|
|
|
|
|
| |
JALU0 for GPR PRF write)
llvm-svn: 328536
|
|
|
|
|
|
|
|
| |
Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write)
This also adds missing vcvttss2si tests
llvm-svn: 328505
|
|
|
|
|
|
| |
These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults.
llvm-svn: 328501
|
|
|
|
|
|
|
| |
This should fix the stack-use-after-scope reported by the asan buildbots after
revision 328493.
llvm-svn: 328499
|
|
|
|
|
|
| |
The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps
llvm-svn: 328497
|
|
|
|
|
|
| |
info view.
llvm-svn: 328493
|
|
|
|
|
|
| |
Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model.
llvm-svn: 328491
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pressure distribution for instructions (PR36874)
The goal of this patch is to address most of PR36874. To fully fix PR36874 we
need to split the "InstructionInfo" view from the "SummaryView". That would make
easy to check the latency and rthroughput as well.
The patch reuses all the logic from ResourcePressureView to print out the
"instruction tables".
We have an entry for every instruction in the input sequence. Each entry reports
the theoretical resource pressure distribution. Resource pressure is uniformly
distributed across all the processor resource units of a group.
At the moment, the backend pipeline is not configurable, so the only way to fix
this is by creating a different driver that simply sends instruction events to
the resource pressure view. That means, we don't use the Backend interface.
Instead, it is simpler to just have a different code-path for when flag
-instruction-tables is specified.
Once Clement addresses bug 36663, then we can port the "instruction tables"
logic into a stage of our configurable pipeline.
Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag
-instruction-tables to each modified test.
Differential Revision: https://reviews.llvm.org/D44839
llvm-svn: 328487
|
|
|
|
|
|
| |
unit
llvm-svn: 328343
|
|
|
|
|
|
| |
Add missing non-VEX and (V)PMOVMSKB instructions to the pattern
llvm-svn: 328338
|
|
|
|
|
|
| |
function unit
llvm-svn: 328331
|
|
|
|
|
|
| |
JSAGU/JSTC function units
llvm-svn: 328328
|
|
|
|
| |
llvm-svn: 328324
|
|
|
|
|
|
| |
This was due to a misunderstanding over what llvm calls a micro-op (retirement unit) is actually called a macro-op on the AMD/Jaguar target. Folded loads don't affect num macro ops.
llvm-svn: 328320
|
|
|
|
|
|
|
|
| |
correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units
Fixes throughput to match Agner/Fam16h-SoG as well.
llvm-svn: 328318
|
|
|
|
|
|
| |
pipe and JFPX/JVALU function unit as well as the AGUs
llvm-svn: 328304
|
|
|
|
|
|
| |
Change pblendvb/blendvps/blendvpd to use WriteFVarBlend
llvm-svn: 328294
|
|
|
|
| |
llvm-svn: 328293
|
|
|
|
|
|
| |
scheduled through the JFPU1 pipe
llvm-svn: 328226
|
|
|
|
|
|
| |
The ymm instructions are double pumped as well.
llvm-svn: 328222
|
|
|
|
|
|
| |
pipe
llvm-svn: 328217
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BackendStatistics view.
With this patch, the "instruction dispatched" event now provides information
related to the number of microarchitectural registers used in each register
file. Similarly, the "instruction retired" event is now able to tell how may
registers are freed in each register file.
Currently, the BackendStatistics view is the only consumer of register
usage/pressure information. BackendStatistics uses that info to print out a few
general statistics (i.e. max number of mappings used; total mapping created).
Before this patch, the BackendStatistics was forced to query the Backend to
obtain the register pressure information.
This helps removes that dependency. Now views are completely independent from
the Backend. As a consequence, it should be easier to address PR36663 and
further modularize the pipeline.
Added a couple of test cases in the BtVer2 specific directory.
llvm-svn: 328129
|
|
|
|
|
|
| |
The default is currently FAdd for some reason
llvm-svn: 327807
|
|
|
|
| |
llvm-svn: 327805
|
|
|
|
|
|
| |
functional pipe
llvm-svn: 327804
|
|
|
|
|
|
| |
(JFPA/JFPM) functional pipes
llvm-svn: 327803
|
|
|
|
| |
llvm-svn: 327801
|
|
|
|
|
|
| |
JVALU0/JVALU1 functional pipes
llvm-svn: 327794
|
|
|
|
|
|
| |
float cluster (FPA/FPM) not the integer.
llvm-svn: 327793
|
|
|
|
|
|
|
|
|
|
| |
unit for JWriteResFpuPair defs
Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit.
This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later.
llvm-svn: 327791
|
|
|
|
|
|
|
|
| |
instructions
Hopefully these tests can be easily reused should any other subtarget get in depth llvm-mca coverage (we can either copy the tests or move them into a common dir and run it with multiple prefixes).
llvm-svn: 327788
|
|
|
|
|
|
| |
It gives us a better view of pipe usage in the timeline which is what the test is trying to show.
llvm-svn: 327685
|
|
|
|
|
|
|
|
| |
YMM FDiv/FSqrt are dispatched on pipe JFPU1 but should be performed on the JFPM unit - that is where most of the cycles are spent.
This matches the pipes for WriteFSqrt/WriteFDiv definitions.
llvm-svn: 327682
|
|
|
|
|
|
|
|
| |
pipes/units
Try to demonstrate the scheduling from fpu0/fpu1 pipes to the valu0/vimul/fpa or valu1/stc/fpm functional units
llvm-svn: 327676
|
|
|
|
|
|
| |
no scheduler resources were consumed.
llvm-svn: 327215
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a problem found when testing zero latency instructions on
target AArch64 -mcpu=exynos-m3 / -mcpu=exynos-m1.
On Exynos-m3/m1, direct branches are zero-latency instructions that don't consume
any processor resources. The DispatchUnit marks zero-latency instructions as
"executed", so that no scheduling is required. The event of instruction
executed is then notified to all the listeners, and the reorder buffer (managed
by the RetireControlUnit) is updated. In particular, the entry associated to the
zero-latency instruction in the reorder buffer is marked as executed.
Before this patch, the DispatchUnit forgot to assign a retire control unit token
(RCUToken) to the zero-latency instruction. As a consequence, the RCUToken was
used uninitialized. This was causing a crash in the RetireControlUnit logic.
Fixes PR36650.
llvm-svn: 327056
|
|
|
|
|
|
|
| |
In future, both the summary information and the 'instruction info' table should
be moved into a separate "Summary" view.
llvm-svn: 327010
|
|
llvm-mca is an LLVM based performance analysis tool that can be used to
statically measure the performance of code, and to help triage potential
problems with target scheduling models.
llvm-mca uses information which is already available in LLVM (e.g. scheduling
models) to statically measure the performance of machine code in a specific cpu.
Performance is measured in terms of throughput as well as processor resource
consumption. The tool currently works for processors with an out-of-order
backend, for which there is a scheduling model available in LLVM.
The main goal of this tool is not just to predict the performance of the code
when run on the target, but also help with diagnosing potential performance
issues.
Given an assembly code sequence, llvm-mca estimates the IPC (instructions per
cycle), as well as hardware resources pressure. The analysis and reporting style
were mostly inspired by the IACA tool from Intel.
This patch is related to the RFC on llvm-dev visible at this link:
http://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html
Differential Revision: https://reviews.llvm.org/D43951
llvm-svn: 326998
|