| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It's super irritating.
[properly configured] git client then complains about that double-newline,
and you have to use `--force` to ignore the warning, since even if you
fix it manually, it will be reintroduced the very next runtime :/
Reviewers: RKSimon, andreadb, courbet, craig.topper, javed.absar, gbedwell
Reviewed By: gbedwell
Subscribers: javed.absar, tschuett, gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D47697
llvm-svn: 333887
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
{FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI} were using WriteMicrocoded.
- I've measured the values for Broadwell, Haswell, SandyBridge, Skylake.
- For ZnVer1 and Atom, values were transferred form InstRWs.
- For SLM and BtVer2, I've guessed some values :(
Reviewers: RKSimon, craig.topper, andreadb
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D47585
llvm-svn: 333656
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- I've measured the values for Broadwell, Haswell, SandyBridge, Skylake.
- For ZnVer1 and Atom, values were transferred form `InstRW`s.
- For SLM and BtVer2, values are from Agner.
This is split off from https://reviews.llvm.org/D47377
Reviewers: RKSimon, andreadb
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D47523
llvm-svn: 333642
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
After SNB, Intel CPUs can rename CF independently of other EFLAGS,
so the renamer can zero it for free. Note that STC still consumes resources.
To reproduce: `$ llvm-exegesis -mode=uops -opcode-name=CLC`
On SNB:
```
---
key:
opcode_name: CLC
mode: uops
config: ''
cpu_name: sandybridge
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: '3', value: 0.0014, debug_string: SBPort0 }
- { key: '4', value: 0.0013, debug_string: SBPort1 }
- { key: '5', value: 0.0003, debug_string: SBPort4 }
- { key: '6', value: 0.0029, debug_string: SBPort5 }
- { key: '10', value: 0.0003, debug_string: SBPort23 }
error: ''
info: 'instruction is serial, repeating a random one.
Snippet:
CLC
'
...
```
On HSW:
```
---
key:
opcode_name: CLC
mode: uops
config: ''
cpu_name: haswell
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: '3', value: 0.001, debug_string: HWPort0 }
- { key: '4', value: 0.0009, debug_string: HWPort1 }
- { key: '5', value: 0.0004, debug_string: HWPort2 }
- { key: '6', value: 0.0006, debug_string: HWPort3 }
- { key: '7', value: 0.0002, debug_string: HWPort4 }
- { key: '8', value: 0.0012, debug_string: HWPort5 }
- { key: '9', value: 0.0022, debug_string: HWPort6 }
- { key: '10', value: 0.0001, debug_string: HWPort7 }
error: ''
info: 'instruction is serial, repeating a random one.
Snippet:
CLC
'
...
```
Reviewers: craig.topper, RKSimon
Subscribers: gchatelet, llvm-commits
Differential Revision: https://reviews.llvm.org/D47362
llvm-svn: 333392
|
|
|
|
|
|
|
|
| |
As confirmed by llvm-exegesis, there is no scheduler difference between MOVDQA/MOVDQU and VMOVDQA/VMOVDQU xmm reg-reg moves
Another chapter in the never ending crusade to remove useless InstRW overrides from the x86 scheduler models......
llvm-svn: 333271
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously update_mca_test_checks worked entirely at "block" level where
a block is some sequence of lines delimited by at least one empty line.
This generally worked well, but could sometimes lead to excessive
repetition of check lines for various prefixes if some block was almost
identical between prefixes, but not quite (for example, due to a
different dispatch width in the otherwise identical summary views).
This new analyis attempts to split blocks further in the case where the
following conditions are met:
a) There is some prefix common to every RUN line (typically 'ALL').
b) The first line of the block is common to the output with every prefix.
c) The block has the same number of lines for the output with every prefix.
Also, regenerated all llvm-mca test files with the following command:
update_mca_test_checks.py "../test/tools/llvm-mca/*/*.s" "../test/tools/llvm-mca/*/*/*.s"
The new analysis showed a "multiple lines not disambiguated by prefixes" warning
for test "AArch64/Exynos/scheduler-queue-usage.s" so I've also added some
explicit prefixes to each of the RUN lines in that test.
Differential Revision: https://reviews.llvm.org/D47321
llvm-svn: 333204
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements the "block reciprocal throughput" computation in the
SummaryView.
The block reciprocal throughput is computed as the MAX of:
- NumMicroOps / DispatchWidth
- Resource Cycles / #Units (for every resource consumed).
The block throughput is bounded from above by the hardware dispatch throughput.
That is because the DispatchWidth is an upper bound on how many opcodes can be part
of a single dispatch group.
The block throughput is also limited by the amount of hardware parallelism. The
number of available resource units affects how the resource pressure is
distributed, and also how many blocks can be delivered every cycle.
llvm-svn: 333095
|
|
|
|
|
|
| |
Also, regenerate all tests.
llvm-svn: 332853
|
|
|
|
|
|
|
| |
This is to keep the Jaguar model's naming convention. Processor resources all
have a 'J' prefix in the BtVer2 scheduling model.
llvm-svn: 332851
|
|
|
|
|
|
|
|
|
|
| |
BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737)
The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1:
SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM)
Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner)
llvm-svn: 332745
|
|
|
|
|
|
| |
Include the 6cy delay transferring from the GPR to FPU.
llvm-svn: 332737
|
|
|
|
| |
llvm-svn: 332722
|
|
|
|
|
|
|
|
|
|
| |
WriteVecLoad/WriteVecStore scheduler classes
Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc.
Fixes BtVer2/SLM which have different behaviours for GPR stores.
llvm-svn: 332718
|
|
|
|
|
|
|
|
|
|
| |
classes
Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc.
Fixes BtVer2/SLM which have different behaviours for GPR stores.
llvm-svn: 332714
|
|
|
|
| |
llvm-svn: 332622
|
|
|
|
| |
llvm-svn: 332616
|
|
|
|
|
|
|
| |
Flag -all-views enables all the views.
Flag -all-stats enables all the views that print hardware statistics.
llvm-svn: 332602
|
|
|
|
| |
llvm-svn: 332595
|
|
|
|
| |
llvm-svn: 332510
|
|
|
|
| |
llvm-svn: 332447
|
|
|
|
|
|
|
|
| |
BtVer2 - Fixes schedules for (V)CVTPS2PD instructions
A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first
llvm-svn: 332376
|
|
|
|
|
|
|
|
|
| |
Btver2 - VCVTPH2PSYrm needs to double pump the AGU
Broadwell - missing VCVTPS2PH*mr stores extra latency
Allows us to remove the WriteCvtF2FSt conversion store class
llvm-svn: 332357
|
|
|
|
| |
llvm-svn: 332347
|
|
|
|
| |
llvm-svn: 332270
|
|
|
|
|
|
| |
MMX was missing and YMM was tagged as a fp nt store
llvm-svn: 332269
|
|
|
|
| |
llvm-svn: 332262
|
|
|
|
| |
llvm-svn: 332257
|
|
|
|
|
|
| |
We still need to handle mmx/xmm moves as 'decode-only' no-pipe instructions
llvm-svn: 332109
|
|
|
|
|
|
| |
Fixes an issue on SLM/Btver2 where we had instructions were being treated as scalar loads/stores
llvm-svn: 332104
|
|
|
|
|
|
|
|
| |
Confirmed by both Agner and Intel's AOM - the IEC/FPC are not required for pure load/stores (even if its a partial update).
Can't fix WriteStore until all RMW instructions are cleaned up though....
llvm-svn: 332096
|
|
|
|
|
|
| |
Fixes a SNB issue that was missing vlddqu/vmovntdqa ymm instructions
llvm-svn: 332094
|
|
|
|
| |
llvm-svn: 332002
|
|
|
|
|
|
|
|
| |
MOVNTPD/MOVNTPS should be WriteFStore
Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction....
llvm-svn: 331864
|
|
|
|
|
|
|
| |
This fixes a couple of BtVer2 missing instructions that weren't been handled in the override.
NOTE: There are still a lot of overrides that still need cleaning up!
llvm-svn: 331770
|
|
|
|
| |
llvm-svn: 331769
|
|
|
|
|
|
|
| |
I've created the necessary classes but there are still a lot of overrides that need cleaning up.
NOTE: The Znver1 model was missing some div/idiv variants in the instregex patterns and wasn't setting the resource cycles at all in the overrides.
llvm-svn: 331767
|
|
|
|
| |
llvm-svn: 331765
|
|
|
|
| |
llvm-svn: 331678
|
|
|
|
|
|
|
|
| |
Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions.
Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour.
llvm-svn: 331672
|
|
|
|
|
|
|
|
| |
These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents.
Differential Revision: https://reviews.llvm.org/D46229
llvm-svn: 331659
|
|
|
|
|
|
| |
Fixes x87 schedules to more closely match Agner - AMD doesn't tend to "special case" x87 instructions as much as Intel.
llvm-svn: 331645
|
|
|
|
|
|
|
|
| |
and YMM/ZMM instructions.
This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values.
llvm-svn: 331643
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions.
WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions.
This removes all InstrRW overrides for these instructions.
NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner.
NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80.
llvm-svn: 331629
|
|
|
|
|
|
| |
Filled in the missing values from Btver2 SoG or Agner
llvm-svn: 331546
|
|
|
|
|
|
|
|
| |
Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions.
Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA.
llvm-svn: 331515
|
|
|
|
|
|
|
|
| |
Avoids extra entries in the class tables.
Found a typo that missed the MMX_PHSUBSW instruction.
llvm-svn: 331488
|
|
|
|
|
|
| |
The entries were being bound to the wrong class.
llvm-svn: 331388
|
|
|
|
|
|
| |
https://reviews.llvm.org/D46356
llvm-svn: 331356
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes PR37293.
We can have scheduling classes with no write latency entries, that still consume
processor resources. We don't want to treat those instructions as zero-latency
instructions; they still have to be issued to the underlying pipelines, so they
still consume resource cycles.
This is likely to be a regression which I have accidentally introduced at
revision 330807. Now, if an instruction has a non-empty set of write processor
resources, we conservatively treat it as a normal (i.e. non zero-latency)
instruction.
llvm-svn: 331193
|
|
|
|
|
|
| |
Before this change, it wrongly specified -mcpu=slm instead of -mcpu=atom.
llvm-svn: 331170
|