| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions.
Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour.
llvm-svn: 331672
|
|
|
|
|
|
|
|
| |
and YMM/ZMM instructions.
This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values.
llvm-svn: 331643
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions.
WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions.
This removes all InstrRW overrides for these instructions.
NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner.
NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80.
llvm-svn: 331629
|
|
|
|
|
|
| |
Rename scalar and XMM versions, this is to match/simplify an upcoming change to split MUL/DIV/SQRT scalar/xmm/ymm/zmm classes.
llvm-svn: 331531
|
|
|
|
|
|
|
|
| |
Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions.
Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA.
llvm-svn: 331515
|
|
|
|
|
|
| |
not SchedWriteVecALU.
llvm-svn: 331473
|
|
|
|
|
|
|
|
| |
scheduler classes
This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness.
llvm-svn: 331472
|
|
|
|
| |
llvm-svn: 331446
|
|
|
|
|
|
|
|
| |
YMM/ZMM scheduler classes
Also retagged VDBPSADBW instructions as SchedWritePSADBW instead of SchedWriteVecIMul which matches the behaviour on SkylakeServer (the only thing that supports it...)
llvm-svn: 331445
|
|
|
|
|
|
|
|
| |
classes to X86SchedWriteWidths.
We've dealt with the majority already.
llvm-svn: 331353
|
|
|
|
|
|
| |
classes
llvm-svn: 331290
|
|
|
|
|
|
| |
Removes more WriteFCmp InstRW overrides
llvm-svn: 331283
|
|
|
|
|
|
| |
In preparation of splitting WriteFAdd by vector width.
llvm-svn: 331273
|
|
|
|
|
|
| |
In preparation of splitting WriteFShuffle by vector width.
llvm-svn: 331262
|
|
|
|
|
|
| |
In preparation of splitting WriteVecLogic by vector width.
llvm-svn: 331256
|
|
|
|
|
|
| |
Although they are encoded similar to bit shifts, the byte shifts behave like shuffles from a scheduling point of view.
llvm-svn: 331253
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
widths.
We need to split most of the scheduler classes by vector width to remove more of the InstRW overrides, this patch should make this easier/tidier by allowing us to pass the X86SchedWriteWidths wrapper to multi-width multiclasses and then split as required.
I've included fields for Scl (scalar float/double), MMX (MMX integer), XMM, YMM and ZMM widths. These fields mostly share the same classes but it should give us the flexibility that we may need in the future.
This patch has replaced a set of example SSE/AVX512 instruction cases but isn't exhaustive as it gets very noisy before we really need the functionality.
Differential Revision: https://reviews.llvm.org/D46266
llvm-svn: 331208
|
|
|
|
|
|
|
|
|
|
|
|
| |
syntax. NFCI
Many of these aliases exist to give one syntax or the other a slightly different mnemonic and the other variant gets a duplicate of its normal mnemonic
This patch restricts a lot of these to only one variant so we don't get the duplication.
This removes a lot of duplicate entries from the matcher table. It also reduces the number of warnings printed when you enable the ambiguous match warning in tablegen.
llvm-svn: 331117
|
|
|
|
|
|
|
|
| |
scheduler classes
This removes all the WriteFBlend/WriteFVarBlend InstRW overrides - some WriteFVarShuffle remain to be fixed.
llvm-svn: 331065
|
|
|
|
|
|
| |
This removes all the AND/ANDN/OR/XOR PS/PD InstRW overrides.
llvm-svn: 331051
|
|
|
|
|
|
|
|
| |
This removes all the FMA InstRW overrides.
If we ever get PR36924, then we can remove many of these declarations from models.
llvm-svn: 330820
|
|
|
|
|
|
| |
These variants all take an immediate shuffle mask value and should be scheduled as such.
llvm-svn: 330747
|
|
|
|
|
|
| |
Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887)
llvm-svn: 330737
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split off pinsr/pextr and extractps instructions.
(Mostly) fixes PR36887.
Note: It might be worth adding a WriteFInsertLd class as well in the future.
Differential Revision: https://reviews.llvm.org/D45929
llvm-svn: 330714
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes.
This unearthed a couple of things that are also handled in this patch:
(1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic
(2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015.
(3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops.
Differential Revision: https://reviews.llvm.org/D45629
llvm-svn: 330480
|
|
|
|
| |
llvm-svn: 330204
|
|
|
|
|
|
|
|
| |
Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd
Differential Revision: https://reviews.llvm.org/D45656
llvm-svn: 330179
|
|
|
|
|
|
| |
WriteFAdd
llvm-svn: 330023
|
|
|
|
| |
llvm-svn: 330022
|
|
|
|
| |
llvm-svn: 330013
|
|
|
|
|
|
| |
Was being used to move around empty/unused itineraries...
llvm-svn: 329970
|
|
|
|
|
|
| |
This removes the last of the x86 schedule itineraries, I'm intending to cleanup the remaining uses of NoItinerary/OpndItins/etc. before resolving PR37093.
llvm-svn: 329967
|
|
|
|
| |
llvm-svn: 329953
|
|
|
|
| |
llvm-svn: 329945
|
|
|
|
| |
llvm-svn: 329940
|
|
|
|
|
|
| |
(PR37093)
llvm-svn: 329912
|
|
|
|
| |
llvm-svn: 329906
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split variable index shuffles from immediate index shuffles
WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.)
WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.)
WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.)
WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.)
Differential Revision: https://reviews.llvm.org/D45404
llvm-svn: 329806
|
|
|
|
|
|
|
|
| |
equivalents.
Mostly vector load, store, and move instructions.
llvm-svn: 329330
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's failing on the bots and I'm not sure why.
This reverts:
[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC
llvm-svn: 329256
|
|
|
|
|
|
|
|
| |
equivalents.
Mostly vector load, store, and move instructions.
llvm-svn: 329254
|
|
|
|
|
|
| |
These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both.
llvm-svn: 329154
|
|
|
|
|
|
| |
It was being inadvertently defaulted to an FADD scheduler class.
llvm-svn: 328959
|
|
|
|
|
|
| |
versions.
llvm-svn: 328958
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.
This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.
I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.
Reviewers: RKSimon, GGanesh, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, gbedwell, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D44972
llvm-svn: 328914
|
|
|
|
|
|
|
|
|
|
| |
This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs.
This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best.
We probably want to merge this with the vXi1 concat code since they are very similar.
llvm-svn: 327454
|
|
|
|
|
|
|
|
|
|
|
|
| |
of vXi32.
This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX.
I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that.
I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that.
llvm-svn: 326991
|
|
|
|
| |
llvm-svn: 326679
|
|
|
|
|
|
|
|
| |
legalization if AVX512DQ is not supported.
We were previously doing this with isel patterns. Moving it to op legalization gives us chance to see the required bitcast earlier. And it lets us remove some isel patterns.
llvm-svn: 326669
|
|
|
|
|
|
|
|
| |
and extending/truncating.
This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns.
llvm-svn: 326375
|