| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 330760
|
|
|
|
|
|
| |
This also fixes Jaguar's schedule which was treating it as the WriteVecIMul default.
llvm-svn: 330756
|
|
|
|
|
|
| |
Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887)
llvm-svn: 330737
|
|
|
|
| |
llvm-svn: 330720
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split off pinsr/pextr and extractps instructions.
(Mostly) fixes PR36887.
Note: It might be worth adding a WriteFInsertLd class as well in the future.
Differential Revision: https://reviews.llvm.org/D45929
llvm-svn: 330714
|
|
|
|
|
|
| |
We have test coverage for these with resources-sse*/avx*
llvm-svn: 330662
|
|
|
|
|
|
| |
We have test coverage for these with resources-bmi2.s
llvm-svn: 330659
|
|
|
|
| |
llvm-svn: 330648
|
|
|
|
| |
llvm-svn: 330611
|
|
|
|
|
|
| |
Note - noticed this as the STAC case as it was unintentionally matching against *STACK* pseudo instructions.
llvm-svn: 330588
|
|
|
|
| |
llvm-svn: 330581
|
|
|
|
|
|
| |
Fixed a lot of the default classes which were being completely overridden.
llvm-svn: 330554
|
|
|
|
| |
llvm-svn: 330553
|
|
|
|
| |
llvm-svn: 330549
|
|
|
|
|
|
| |
to remove unnecessary instrw overrides.
llvm-svn: 330546
|
|
|
|
|
|
| |
This also fixes some of the ReadAfterLd issues due to InstRW.
llvm-svn: 330544
|
|
|
|
|
|
| |
instrw overrides.
llvm-svn: 330542
|
|
|
|
|
|
| |
scheduler models.
llvm-svn: 330527
|
|
|
|
|
|
| |
models.
llvm-svn: 330523
|
|
|
|
|
|
| |
pack/unpack instruction instrw overrides from scheduler models.
llvm-svn: 330521
|
|
|
|
| |
llvm-svn: 330517
|
|
|
|
|
|
|
|
| |
from scheduler models.
The required the default skylake schedules to be updated - these were being completely overriden by the InstRW and the existing values not used at all.
llvm-svn: 330510
|
|
|
|
|
|
| |
scheduler models.
llvm-svn: 330508
|
|
|
|
| |
llvm-svn: 330503
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes.
This unearthed a couple of things that are also handled in this patch:
(1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic
(2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015.
(3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops.
Differential Revision: https://reviews.llvm.org/D45629
llvm-svn: 330480
|
|
|
|
|
|
|
|
|
|
|
|
| |
Intel CPUs.
The XCHG16rr/XCHG32rr/XCHG64rr instructions should be 3 uops just like XCHG8rr. I believe they're just implemented as 3 move uops with a temporary register.
XADD is probably 2 moves and an add also using a temporary register.
Change the latency for both from 2 cycles to 3 cycles. Only 2 of the uops are serialized in their execution, the move into the temporary and the move out of the temporary. The move from one GPR to the other should be able to go in parallel with this if there are ALU resources available.
llvm-svn: 330349
|
|
|
|
|
|
| |
There's a lot more but I'd prefer focussing on removing unnecessary InstRWs first.
llvm-svn: 330347
|
|
|
|
|
|
| |
These are all already handled identically by WriteFMA.
llvm-svn: 330319
|
|
|
|
|
|
| |
This removes a bunch of unnecessary InstRW overrides. It also cleans up the missing information from the Sandy Bridge model. Other fixes to other models.
llvm-svn: 330308
|
|
|
|
| |
llvm-svn: 330204
|
|
|
|
| |
llvm-svn: 330203
|
|
|
|
|
|
|
|
| |
Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd
Differential Revision: https://reviews.llvm.org/D45656
llvm-svn: 330179
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split variable index shuffles from immediate index shuffles
WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.)
WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.)
WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.)
WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.)
Differential Revision: https://reviews.llvm.org/D45404
llvm-svn: 329806
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops.
This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs.
The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc.
Reviewers: RKSimon, andreadb, GGanesh
Reviewed By: RKSimon
Subscribers: GGanesh, llvm-commits
Differential Revision: https://reviews.llvm.org/D45380
llvm-svn: 329539
|
|
|
|
|
|
| |
ADC/SBB.
llvm-svn: 329424
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Haswell/Broadwell/Skylake scheduler models without InstRWs
Summary:
This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency.
Apparently we were inconsistent about whether the store has latency or not thus the test changes.
I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5.
Reviewers: RKSimon, andreadb
Reviewed By: andreadb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45351
llvm-svn: 329416
|
|
|
|
|
|
|
|
| |
Bridge/Broadwell/Haswell/Skylake scheduler model.
Even those the address was calculated for the load, its calculated again for the store.
llvm-svn: 329415
|
|
|
|
| |
llvm-svn: 329386
|
|
|
|
|
|
| |
According to Agner's data, CDQE is closer to CWDE.
llvm-svn: 329354
|
|
|
|
|
|
|
|
|
|
| |
LEAVE64. Make LEAVE/LEAVE64 more correct on Sandy Bridge.
This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64.
The Sandy Bridge version was missing a load port use.
llvm-svn: 329347
|
|
|
|
|
|
| |
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.
llvm-svn: 329339
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's failing on the bots and I'm not sure why.
This reverts:
[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC
llvm-svn: 329256
|
|
|
|
| |
llvm-svn: 329251
|
|
|
|
|
|
|
|
| |
SandyBridge/Haswell/Broadwell/Skylake scheduler models.
The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r.
llvm-svn: 329211
|
|
|
|
|
|
| |
Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/
llvm-svn: 328961
|
|
|
|
|
|
|
|
| |
Bridge/Haswell/Broadwell/Skylake scheduler models.
Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those.
llvm-svn: 328960
|
|
|
|
|
|
|
|
| |
ADC8/16/32/64mr and SBB8/16/32/64mi.
It doesn't make a lot of sense that it would be different.
llvm-svn: 328946
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.
This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.
I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.
Reviewers: RKSimon, GGanesh, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, gbedwell, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D44972
llvm-svn: 328914
|
|
|
|
|
|
|
|
|
|
| |
SchedRW for BEXTR/BZHI.
These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd
Differential Revision: https://reviews.llvm.org/D44838
llvm-svn: 328823
|
|
|
|
|
|
|
|
|
|
| |
scheduler regexs.
Most of these were optional matches at the end of the strings, but since the strings themselves are prefix matches by default you don't need to check for something optional at the end.
I've left the 'b' on memory instructions where it means 'broadcast' because I'm not sure those really have the same load latency and we may need to split them explicitly in the future.
llvm-svn: 328730
|