| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
to remove unnecessary instrw overrides.
llvm-svn: 330546
|
|
|
|
|
|
| |
This also fixes some of the ReadAfterLd issues due to InstRW.
llvm-svn: 330544
|
|
|
|
|
|
| |
instrw overrides.
llvm-svn: 330542
|
|
|
|
|
|
| |
latency.
llvm-svn: 330541
|
|
|
|
|
|
| |
VPERM2I128/VINSERTI128
llvm-svn: 330522
|
|
|
|
| |
llvm-svn: 330517
|
|
|
|
| |
llvm-svn: 330505
|
|
|
|
| |
llvm-svn: 330501
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes.
This unearthed a couple of things that are also handled in this patch:
(1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic
(2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015.
(3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops.
Differential Revision: https://reviews.llvm.org/D45629
llvm-svn: 330480
|
|
|
|
| |
llvm-svn: 330465
|
|
|
|
|
|
|
|
|
|
|
|
| |
Intel CPUs.
The XCHG16rr/XCHG32rr/XCHG64rr instructions should be 3 uops just like XCHG8rr. I believe they're just implemented as 3 move uops with a temporary register.
XADD is probably 2 moves and an add also using a temporary register.
Change the latency for both from 2 cycles to 3 cycles. Only 2 of the uops are serialized in their execution, the move into the temporary and the move out of the temporary. The move from one GPR to the other should be able to go in parallel with this if there are ALU resources available.
llvm-svn: 330349
|
|
|
|
|
|
| |
There's a lot more but I'd prefer focussing on removing unnecessary InstRWs first.
llvm-svn: 330347
|
|
|
|
|
|
| |
This removes a bunch of unnecessary InstRW overrides. It also cleans up the missing information from the Sandy Bridge model. Other fixes to other models.
llvm-svn: 330308
|
|
|
|
| |
llvm-svn: 330204
|
|
|
|
| |
llvm-svn: 330203
|
|
|
|
|
|
|
|
| |
Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd
Differential Revision: https://reviews.llvm.org/D45656
llvm-svn: 330179
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split variable index shuffles from immediate index shuffles
WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.)
WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.)
WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.)
WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.)
Differential Revision: https://reviews.llvm.org/D45404
llvm-svn: 329806
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops.
This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs.
The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc.
Reviewers: RKSimon, andreadb, GGanesh
Reviewed By: RKSimon
Subscribers: GGanesh, llvm-commits
Differential Revision: https://reviews.llvm.org/D45380
llvm-svn: 329539
|
|
|
|
|
|
| |
ADC/SBB.
llvm-svn: 329424
|
|
|
|
|
|
|
|
| |
scheduler model.
We can get this right through WriteALU and friends now.
llvm-svn: 329417
|
|
|
|
|
|
|
|
| |
Bridge/Broadwell/Haswell/Skylake scheduler model.
Even those the address was calculated for the load, its calculated again for the store.
llvm-svn: 329415
|
|
|
|
|
|
| |
Noticed this during D44654
llvm-svn: 329389
|
|
|
|
|
|
|
|
|
|
|
|
| |
As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions.
I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent.
As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests...
Differential Revision: https://reviews.llvm.org/D44654
llvm-svn: 329388
|
|
|
|
|
|
| |
According to Agner's data, CDQE is closer to CWDE.
llvm-svn: 329354
|
|
|
|
| |
llvm-svn: 329351
|
|
|
|
|
|
|
|
|
|
| |
LEAVE64. Make LEAVE/LEAVE64 more correct on Sandy Bridge.
This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64.
The Sandy Bridge version was missing a load port use.
llvm-svn: 329347
|
|
|
|
|
|
| |
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.
llvm-svn: 329339
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's failing on the bots and I'm not sure why.
This reverts:
[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC
llvm-svn: 329256
|
|
|
|
| |
llvm-svn: 329251
|
|
|
|
|
|
|
|
| |
SandyBridge/Haswell/Broadwell/Skylake scheduler models.
The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r.
llvm-svn: 329211
|
|
|
|
|
|
|
|
| |
Bridge/Haswell/Broadwell/Skylake scheduler models.
Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those.
llvm-svn: 328960
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.
This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.
I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.
Reviewers: RKSimon, GGanesh, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, gbedwell, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D44972
llvm-svn: 328914
|
|
|
|
|
|
|
|
|
|
| |
SchedRW for BEXTR/BZHI.
These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd
Differential Revision: https://reviews.llvm.org/D44838
llvm-svn: 328823
|
|
|
|
|
|
|
|
| |
Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer.
Differential Revision: https://reviews.llvm.org/D44924
llvm-svn: 328664
|
|
|
|
|
|
|
|
| |
Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis.
Differential Revision: https://reviews.llvm.org/D44647
llvm-svn: 328582
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR36881)
Give the bit count instructions their own scheduler classes instead of forcing them into existing classes.
These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar).
Differential Revision: https://reviews.llvm.org/D44879
llvm-svn: 328566
|
|
|
|
|
|
|
|
| |
SandyBridge/Haswell/Broadwell/Skylake scheduler models.
I've used Agner's data as best I could to get the values to converge on.
llvm-svn: 328473
|
|
|
|
|
|
|
|
|
|
| |
add 1uop for memory folds for Intel models
The Intel models need an extra 1uop for memory folded instructions, plus a lot of instructions take a non-default memory latency which should allow us to use the multiclass a lot more to tidy things up.
Differential Revision: https://reviews.llvm.org/D44840
llvm-svn: 328446
|
|
|
|
|
|
| |
regex matches to reduce compile time
llvm-svn: 328431
|
|
|
|
|
|
|
|
| |
of 2 instregex entries
Found while updating D44687
llvm-svn: 328308
|
|
|
|
|
|
| |
Agner's data. Add missing MMX multiplies.
llvm-svn: 328295
|
|
|
|
|
|
| |
The VMOVMSKBrr was in a separate InstRW with a lower latency, but I assume they should be the same and the higher latency matches Agners table so I'm going with that.
llvm-svn: 328291
|
|
|
|
|
|
|
|
|
|
| |
VROUNDPDY*. Fix itinerary mistake on all memory forms of VROUNDPD
This makes the Y position consistent with other instructions.
This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary.
llvm-svn: 328254
|
|
|
|
|
|
| |
to as best as I understand how they are implemented.
llvm-svn: 328231
|
|
|
|
|
|
| |
Models were completely overriding all SSE42 strins instructions when the default classes could be used for exactly the same coverage.
llvm-svn: 328203
|
|
|
|
|
|
|
|
|
| |
Models were completely overriding all AES instructions when the WriteAES default classes could be used for exactly the same coverage.
Removes 6 unnecessary scheduler classes from every model.
Note: Still looking for a way for tblgen to warn when this is happening - often the override is more complete than the default.
llvm-svn: 328192
|
|
|
|
| |
llvm-svn: 328110
|
|
|
|
|
|
|
|
| |
SchedWriteRes group (NFCI) (PR35955)
I've also merged some VEX/non-VEX instregex strings with a (V?) prefix - there are still a lot more of these to do.
llvm-svn: 327974
|
|
|
|
|
|
|
|
| |
Sandybridge/Haswell/Broadwell/Skylake scheduler models.
Move it from a load+store group on SNB to a load only group, the same group as CMP.
llvm-svn: 327944
|
|
|
|
|
|
| |
I assume these match the generic immediate version like they do in the other models.
llvm-svn: 327943
|