| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
Added Intel Atom tests to verify that the tool correctly generates instruction
tables even if the CPU is in-order.
Fixes PR37282.
llvm-svn: 331169
|
|
|
|
| |
llvm-svn: 331144
|
|
|
|
| |
llvm-svn: 331140
|
|
|
|
| |
llvm-svn: 331109
|
|
|
|
|
|
| |
I intend to add further instruction tests to the resources-x86_64.s test file as required, but this initial commit is to help remove a load of unnecessary InstRW overrides in a future patch
llvm-svn: 331108
|
|
|
|
| |
llvm-svn: 330822
|
|
|
|
| |
llvm-svn: 330809
|
|
|
|
|
|
| |
This also fixes Jaguar's schedule which was treating it as the WriteVecIMul default.
llvm-svn: 330756
|
|
|
|
|
|
| |
Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887)
llvm-svn: 330737
|
|
|
|
|
|
| |
These are stores, not loads, so don't need to account for load latency.
llvm-svn: 330735
|
|
|
|
|
|
| |
Note this is IvyBridge (which shares the model) NOT SandyBridge.
llvm-svn: 330734
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the input asm dialect.
The instruction printer used by llvm-mca to generate the performance report now
defaults the output assembly format to the format used for the input assembly
file.
On x86, the asm format can be either AT&T or Intel, depending on the
presence/absence of directive `.intel_syntax`.
Users can still specify a different assembly dialect with the command line flag
-output-asm-variant=<uint>.
llvm-svn: 330733
|
|
|
|
| |
llvm-svn: 330720
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split off pinsr/pextr and extractps instructions.
(Mostly) fixes PR36887.
Note: It might be worth adding a WriteFInsertLd class as well in the future.
Differential Revision: https://reviews.llvm.org/D45929
llvm-svn: 330714
|
|
|
|
|
|
| |
The SandyBridge BMI tests are actually run on IvyBridge as that's the first lowest CPU that actually support the ISAs (but still use the SandyBridge model).
llvm-svn: 330556
|
|
|
|
|
|
| |
This also fixes some of the ReadAfterLd issues due to InstRW.
llvm-svn: 330544
|
|
|
|
| |
llvm-svn: 330540
|
|
|
|
| |
llvm-svn: 330512
|
|
|
|
| |
llvm-svn: 330506
|
|
|
|
| |
llvm-svn: 330502
|
|
|
|
| |
llvm-svn: 330499
|
|
|
|
| |
llvm-svn: 330486
|
|
|
|
| |
llvm-svn: 330371
|
|
|
|
| |
llvm-svn: 330366
|
|
|
|
|
|
|
|
|
|
| |
I've copied and regenerated a resource file from btver2 to every x86 scheduler model supported by llvm-mca so we have at least some basic coverage.
For most this has been the avx1 tests, but for silvermont I've used sse42 as thats the latest it supports.
More will be added later.
llvm-svn: 330352
|
|
|
|
|
|
| |
Useful to see scheduler class deltas against xmm equivalents
llvm-svn: 330335
|
|
|
|
|
|
| |
Move PABS instructions incorrectly tested under SSE2
llvm-svn: 330295
|
|
|
|
|
|
|
|
|
|
|
| |
This script can be used to regenerate tests in the
test/tools/llvm-mca directory (PR36904).
Regenerated a number of tests using the pattern: test/tools/llvm-mca/*/*/*.s
Differential Revision: https://reviews.llvm.org/D45369
llvm-svn: 330246
|
|
|
|
| |
llvm-svn: 330204
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
issued in the right order.
Normally, the Scheduler prioritizes older instructions over younger instructions
during the instruction issue stage. In one particular case where a dependent
instruction had a schedule read-advance associated to one of the input operands,
this rule was not correctly applied.
This patch fixes the issue and adds a test to verify that we don't regress that
particular case.
llvm-svn: 330032
|
|
|
|
|
|
| |
Also, removed flag -verbose in favor of flag -retire-stats.
llvm-svn: 329794
|
|
|
|
|
|
|
|
| |
BackendStatistics to its own view.
Added flag -scheduler-stats to print scheduler related statistics.
llvm-svn: 329792
|
|
|
|
|
|
|
|
|
|
|
| |
BackendStatistics to its own view.
This patch moves the logic that collects and analyzes dispatch events to the
DispatchStatistics view.
Added flag -dispatch-stats to print statistics related to the dispatch logic.
llvm-svn: 329708
|
|
|
|
| |
llvm-svn: 329694
|
|
|
|
|
|
|
|
| |
timeline view."
This reapplies r329403 with a fix for the floating point rounding issue.
llvm-svn: 329680
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch teaches llvm-mca how to parse code comments in search for special
"markers" used to select regions of code.
Example:
# LLVM-MCA-BEGIN My Code Region
....
# LLVM-MCA-END
The MCAsmLexer now delegates to an object of class MCACommentParser (i.e. an
AsmCommentConsumer) the parsing of code comments to search for begin/end code
region markers.
A comment starting with substring "LLVM-MCA-BEGIN" marks the beginning of a new
region of code. A comment starting with substring "LLVM-MCA-END" marks the end
of the last region.
This implementation doesn't allow regions to overlap. Each region can have a
optional description; internally, each region is identified by a range of source
code locations (SMLoc).
MCInst objects are added to a region R only if the source location for the
MCInst is in the range of locations specified by R.
By default, the tool allocates an implicit "Default" code region which contains
every source location. See new tests llvm-mca-marker-*.s for a few examples.
A new Backend object is created for every region. So, the analysis is conducted
on every parsed code region. The final report is the union of the reports
generated for every code region. Note that empty regions are skipped.
Special "[#] Code Region - ..." strings are used in the report to mark the
portion which is specific to a code region only. For example, see
llvm-mca-markers-5.s.
Differential Revision: https://reviews.llvm.org/D45433
llvm-svn: 329590
|
|
|
|
|
|
|
|
|
|
|
|
| |
timeline view."
This made AArch64/CortexA57/direct-branch.s fail on Windows, e.g.
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/11251
> Also, update a few tests to minimize the diff in D45369.
> No functional change intended.
llvm-svn: 329569
|
|
|
|
| |
llvm-svn: 329524
|
|
|
|
|
|
|
| |
Also, update a few tests to minimize the diff in D45369.
No functional change intended.
llvm-svn: 329403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the ability to describe properties of the hardware retire
control unit.
Tablegen class RetireControlUnit has been added for this purpose (see
TargetSchedule.td).
A RetireControlUnit specifies the size of the reorder buffer, as well as the
maximum number of opcodes that can be retired every cycle.
A zero (or negative) value for the reorder buffer size means: "the size is
unknown". If the size is unknown, then llvm-mca defaults it to the value of
field SchedMachineModel::MicroOpBufferSize. A zero or negative number of
opcodes retired per cycle means: "there is no restriction on the number of
instructions that can be retired every cycle".
Models can optionally specify an instance of RetireControlUnit. There can only
be up-to one RetireControlUnit definition per scheduling model.
Information related to the RCU (RetireControlUnit) is stored in (two new fields
of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes
the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp).
This patch fixes PR36661.
Differential Revision: https://reviews.llvm.org/D45259
llvm-svn: 329304
|
|
|
|
| |
llvm-svn: 329192
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
view. NFCI
Before this patch, the "BackendStatistics" view was responsible for printing the
register file usage (as well as many other statistics).
Now users can enable register file usage statistics using the command line flag
`-register-file-stats`. By default, the tool doesn't print register file
statistics.
llvm-svn: 329083
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
scheduling model for llvm-mca
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.
A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
- The total number of physical registers.
- Which target registers are accessible through the register file.
- The cost of allocating a register at register renaming stage.
Example (from this patch - see file X86/X86ScheduleBtVer2.td)
def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>
Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.
The syntax allows to specify an empty set of register classes. An empty set of
register classes means: this register file models all the registers specified by
the Target. For each register class, users can specify an optional register
cost. By default, register costs default to 1. A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".
This patch is structured in two parts.
* Part 1 - MC/Tablegen *
A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.
Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.
* Part 2 - llvm-mca related *
The second part of this patch is related to changes to llvm-mca.
The main differences are:
1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.
The effect of point 3. is also visible in tests register-files-[1..5].s.
Differential Revision: https://reviews.llvm.org/D44980
llvm-svn: 329067
|
|
|
|
|
|
|
|
|
|
| |
ReadAdvance entries.
Before, the instruction builder incorrectly assumed that only explicit reads
could have been associated with ReadAdvance entries.
This patch fixes the issue and adds a test to verify it.
llvm-svn: 328972
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.
This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.
I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.
Reviewers: RKSimon, GGanesh, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, gbedwell, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D44972
llvm-svn: 328914
|
|
|
|
|
|
|
|
|
| |
VSQRT instructions.
There were still a few AVX instructions with an incorrect number of opcodes.
These should be fixed now.
llvm-svn: 328892
|
|
|
|
| |
llvm-svn: 328886
|
|
|
|
|
|
|
|
|
| |
most vector logic instructions.
Fixed a few InstRW that forgot to specify a ReadAfterLd for the register input
operand.
llvm-svn: 328867
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions.
In the Btver2 model, there are a few InstRW overrides that don't specify a
ReadAfterLd for the register input operand.
As a result, a few AVX variants of horizontal operations and most vector logic
operations with a folded memory operand don't have a ReadAdvance info associated
to their input register operands.
llvm-svn: 328865
|
|
|
|
|
|
|
|
|
| |
Verify that the ReadAfterLd is correctly applied to FMA and 4-ops variable blend
instructions.
As Craig pointed out in D44726, some Intel models still have to be fixed.
llvm-svn: 328861
|