| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
Also, update a few tests to minimize the diff in D45369.
No functional change intended.
llvm-svn: 329403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the ability to describe properties of the hardware retire
control unit.
Tablegen class RetireControlUnit has been added for this purpose (see
TargetSchedule.td).
A RetireControlUnit specifies the size of the reorder buffer, as well as the
maximum number of opcodes that can be retired every cycle.
A zero (or negative) value for the reorder buffer size means: "the size is
unknown". If the size is unknown, then llvm-mca defaults it to the value of
field SchedMachineModel::MicroOpBufferSize. A zero or negative number of
opcodes retired per cycle means: "there is no restriction on the number of
instructions that can be retired every cycle".
Models can optionally specify an instance of RetireControlUnit. There can only
be up-to one RetireControlUnit definition per scheduling model.
Information related to the RCU (RetireControlUnit) is stored in (two new fields
of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes
the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp).
This patch fixes PR36661.
Differential Revision: https://reviews.llvm.org/D45259
llvm-svn: 329304
|
|
|
|
| |
llvm-svn: 329192
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
view. NFCI
Before this patch, the "BackendStatistics" view was responsible for printing the
register file usage (as well as many other statistics).
Now users can enable register file usage statistics using the command line flag
`-register-file-stats`. By default, the tool doesn't print register file
statistics.
llvm-svn: 329083
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
scheduling model for llvm-mca
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.
A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
- The total number of physical registers.
- Which target registers are accessible through the register file.
- The cost of allocating a register at register renaming stage.
Example (from this patch - see file X86/X86ScheduleBtVer2.td)
def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>
Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.
The syntax allows to specify an empty set of register classes. An empty set of
register classes means: this register file models all the registers specified by
the Target. For each register class, users can specify an optional register
cost. By default, register costs default to 1. A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".
This patch is structured in two parts.
* Part 1 - MC/Tablegen *
A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.
Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.
* Part 2 - llvm-mca related *
The second part of this patch is related to changes to llvm-mca.
The main differences are:
1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.
The effect of point 3. is also visible in tests register-files-[1..5].s.
Differential Revision: https://reviews.llvm.org/D44980
llvm-svn: 329067
|
|
|
|
|
|
|
|
|
|
| |
ReadAdvance entries.
Before, the instruction builder incorrectly assumed that only explicit reads
could have been associated with ReadAdvance entries.
This patch fixes the issue and adds a test to verify it.
llvm-svn: 328972
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.
This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.
I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.
Reviewers: RKSimon, GGanesh, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, gbedwell, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D44972
llvm-svn: 328914
|
|
|
|
|
|
|
|
|
| |
VSQRT instructions.
There were still a few AVX instructions with an incorrect number of opcodes.
These should be fixed now.
llvm-svn: 328892
|
|
|
|
| |
llvm-svn: 328886
|
|
|
|
|
|
|
|
|
| |
most vector logic instructions.
Fixed a few InstRW that forgot to specify a ReadAfterLd for the register input
operand.
llvm-svn: 328867
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions.
In the Btver2 model, there are a few InstRW overrides that don't specify a
ReadAfterLd for the register input operand.
As a result, a few AVX variants of horizontal operations and most vector logic
operations with a folded memory operand don't have a ReadAdvance info associated
to their input register operands.
llvm-svn: 328865
|
|
|
|
|
|
|
|
|
| |
Verify that the ReadAfterLd is correctly applied to FMA and 4-ops variable blend
instructions.
As Craig pointed out in D44726, some Intel models still have to be fixed.
llvm-svn: 328861
|
|
|
|
|
|
|
| |
This change adds a couple of tests to verify the change introduced by revision
328823 ([X86] Correct the placement of ReadAfterLd in BEXTR and BZHI).
llvm-svn: 328859
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tool was passing the wrong operand index to method
MCSubtargetInfo::getReadAdvanceCycles(). That method requires a "UseIdx", and
not the operand index. This was found when testing X86 code where instructions
had a memory folded operand.
This patch fixes the issue and adds test read-advance-1.s to ensure that
the ReadAfterLd (a ReadAdvance of 3cy) information is correctly used.
llvm-svn: 328790
|
|
|
|
|
|
|
|
|
|
|
| |
instructions.
Similar to r328694. The number of micro opcodes should be 2 for those
instructions.
This was found when testing AVX code for BtVer2 using llvm-mca.
llvm-svn: 328698
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Jaguar backend natively supports 128-bit data types. Operations on YMM
registers are split into two COPs (complex operations). Each COP consumes a slot
in the dispatch group, and in the reorder buffer.
The scheduling model for Jaguar should mark those instructions as `let
NumMicroOps = 2`.
This was found when testing AVX code for BtVer2 using llvm-mca.
llvm-svn: 328694
|
|
|
|
|
|
|
|
|
| |
We were incorrectly initializing the array of used registers in method checkRAT.
As a consequence, the number of register file stalls was misreported.
Added a test to cover this case.
llvm-svn: 328629
|
|
|
|
|
|
| |
Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write)
llvm-svn: 328573
|
|
|
|
|
|
| |
We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR.....
llvm-svn: 328551
|
|
|
|
| |
llvm-svn: 328541
|
|
|
|
|
|
| |
JALU0 for GPR PRF write)
llvm-svn: 328536
|
|
|
|
|
|
|
|
| |
Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write)
This also adds missing vcvttss2si tests
llvm-svn: 328505
|
|
|
|
|
|
| |
These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults.
llvm-svn: 328501
|
|
|
|
|
|
|
| |
This should fix the stack-use-after-scope reported by the asan buildbots after
revision 328493.
llvm-svn: 328499
|
|
|
|
|
|
| |
The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps
llvm-svn: 328497
|
|
|
|
|
|
| |
info view.
llvm-svn: 328493
|
|
|
|
|
|
| |
Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model.
llvm-svn: 328491
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pressure distribution for instructions (PR36874)
The goal of this patch is to address most of PR36874. To fully fix PR36874 we
need to split the "InstructionInfo" view from the "SummaryView". That would make
easy to check the latency and rthroughput as well.
The patch reuses all the logic from ResourcePressureView to print out the
"instruction tables".
We have an entry for every instruction in the input sequence. Each entry reports
the theoretical resource pressure distribution. Resource pressure is uniformly
distributed across all the processor resource units of a group.
At the moment, the backend pipeline is not configurable, so the only way to fix
this is by creating a different driver that simply sends instruction events to
the resource pressure view. That means, we don't use the Backend interface.
Instead, it is simpler to just have a different code-path for when flag
-instruction-tables is specified.
Once Clement addresses bug 36663, then we can port the "instruction tables"
logic into a stage of our configurable pipeline.
Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag
-instruction-tables to each modified test.
Differential Revision: https://reviews.llvm.org/D44839
llvm-svn: 328487
|
|
|
|
|
|
| |
unit
llvm-svn: 328343
|
|
|
|
|
|
| |
Add missing non-VEX and (V)PMOVMSKB instructions to the pattern
llvm-svn: 328338
|
|
|
|
|
|
| |
function unit
llvm-svn: 328331
|
|
|
|
|
|
| |
JSAGU/JSTC function units
llvm-svn: 328328
|
|
|
|
| |
llvm-svn: 328324
|
|
|
|
|
|
| |
This was due to a misunderstanding over what llvm calls a micro-op (retirement unit) is actually called a macro-op on the AMD/Jaguar target. Folded loads don't affect num macro ops.
llvm-svn: 328320
|
|
|
|
|
|
|
|
| |
correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units
Fixes throughput to match Agner/Fam16h-SoG as well.
llvm-svn: 328318
|
|
|
|
|
|
| |
pipe and JFPX/JVALU function unit as well as the AGUs
llvm-svn: 328304
|
|
|
|
|
|
| |
Change pblendvb/blendvps/blendvpd to use WriteFVarBlend
llvm-svn: 328294
|
|
|
|
| |
llvm-svn: 328293
|
|
|
|
|
|
| |
scheduled through the JFPU1 pipe
llvm-svn: 328226
|
|
|
|
|
|
| |
The ymm instructions are double pumped as well.
llvm-svn: 328222
|
|
|
|
|
|
| |
pipe
llvm-svn: 328217
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BackendStatistics view.
With this patch, the "instruction dispatched" event now provides information
related to the number of microarchitectural registers used in each register
file. Similarly, the "instruction retired" event is now able to tell how may
registers are freed in each register file.
Currently, the BackendStatistics view is the only consumer of register
usage/pressure information. BackendStatistics uses that info to print out a few
general statistics (i.e. max number of mappings used; total mapping created).
Before this patch, the BackendStatistics was forced to query the Backend to
obtain the register pressure information.
This helps removes that dependency. Now views are completely independent from
the Backend. As a consequence, it should be easier to address PR36663 and
further modularize the pipeline.
Added a couple of test cases in the BtVer2 specific directory.
llvm-svn: 328129
|
|
|
|
|
|
| |
The default is currently FAdd for some reason
llvm-svn: 327807
|
|
|
|
| |
llvm-svn: 327805
|
|
|
|
|
|
| |
functional pipe
llvm-svn: 327804
|
|
|
|
|
|
| |
(JFPA/JFPM) functional pipes
llvm-svn: 327803
|
|
|
|
| |
llvm-svn: 327801
|
|
|
|
|
|
| |
JVALU0/JVALU1 functional pipes
llvm-svn: 327794
|
|
|
|
|
|
| |
float cluster (FPA/FPM) not the integer.
llvm-svn: 327793
|
|
|
|
|
|
|
|
|
|
| |
unit for JWriteResFpuPair defs
Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit.
This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later.
llvm-svn: 327791
|