summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrAVX512.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Split WriteFAdd/WriteFCmp/WriteFMul schedule classesSimon Pilgrim2018-05-071-2/+1
| | | | | | | | Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions. Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour. llvm-svn: 331672
* [X86] Split WriteFDiv schedule classes to support single/double scalar, XMM ↵Simon Pilgrim2018-05-071-47/+47
| | | | | | | | and YMM/ZMM instructions. This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values. llvm-svn: 331643
* [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classesSimon Pilgrim2018-05-071-18/+22
| | | | | | | | | | | | | WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions. WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions. This removes all InstrRW overrides for these instructions. NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner. NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80. llvm-svn: 331629
* [X86] Cleanup SchedWriteFMA classes and use X86SchedWriteWidths directly.Simon Pilgrim2018-05-041-5/+5
| | | | | | Rename scalar and XMM versions, this is to match/simplify an upcoming change to split MUL/DIV/SQRT scalar/xmm/ymm/zmm classes. llvm-svn: 331531
* [X86] Add SchedWriteFRnd fp rounding scheduler classesSimon Pilgrim2018-05-041-12/+12
| | | | | | | | Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions. Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA. llvm-svn: 331515
* [X86][AVX512] VPLZCNT instructions match SchedWriteVecIMul scheduling class ↵Simon Pilgrim2018-05-031-2/+1
| | | | | | not SchedWriteVecALU. llvm-svn: 331473
* [X86] Split WriteVecShift/WriteVarVecShift into MMX, XMM and YMM/ZMM ↵Simon Pilgrim2018-05-031-8/+8
| | | | | | | | scheduler classes This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness. llvm-svn: 331472
* [X86][AVX512] VPAVG instructions should be tagged as SchedWriteVecALUSimon Pilgrim2018-05-031-1/+1
| | | | llvm-svn: 331446
* [X86] Split WriteVecIMul/WriteVecPMULLD/WriteMPSAD/WritePSADBW into XMM and ↵Simon Pilgrim2018-05-031-1/+1
| | | | | | | | YMM/ZMM scheduler classes Also retagged VDBPSADBW instructions as SchedWritePSADBW instead of SchedWriteVecIMul which matches the behaviour on SkylakeServer (the only thing that supports it...) llvm-svn: 331445
* [X86] Convert most remaining AVX512 uses of X86SchedWritePair scheduler ↵Simon Pilgrim2018-05-021-245/+277
| | | | | | | | classes to X86SchedWriteWidths. We've dealt with the majority already. llvm-svn: 331353
* [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt into XMM and YMM/ZMM scheduler ↵Simon Pilgrim2018-05-011-33/+40
| | | | | | classes llvm-svn: 331290
* [X86] Split WriteFCmp into XMM and YMM/ZMM scheduler classesSimon Pilgrim2018-05-011-18/+17
| | | | | | Removes more WriteFCmp InstRW overrides llvm-svn: 331283
* [X86] Convert all uses of WriteFAdd to X86SchedWriteWidths.Simon Pilgrim2018-05-011-95/+108
| | | | | | In preparation of splitting WriteFAdd by vector width. llvm-svn: 331273
* [X86] Convert all uses of WriteFShuffle to X86SchedWriteWidths.Simon Pilgrim2018-05-011-65/+85
| | | | | | In preparation of splitting WriteFShuffle by vector width. llvm-svn: 331262
* [X86] Convert all uses of WriteFLogic/WriteVecLogic to X86SchedWriteWidths.Simon Pilgrim2018-05-011-84/+94
| | | | | | In preparation of splitting WriteVecLogic by vector width. llvm-svn: 331256
* [X86] Tag PSLLDQ/PSRLDQ as WriteShuffle scheduler classes instead of shifts.Simon Pilgrim2018-05-011-12/+12
| | | | | | Although they are encoded similar to bit shifts, the byte shifts behave like shuffles from a scheduling point of view. llvm-svn: 331253
* [X86] Introduce X86SchedWriteWidths schedule wrapper for different vector ↵Simon Pilgrim2018-04-301-55/+58
| | | | | | | | | | | | | | widths. We need to split most of the scheduler classes by vector width to remove more of the InstRW overrides, this patch should make this easier/tidier by allowing us to pass the X86SchedWriteWidths wrapper to multi-width multiclasses and then split as required. I've included fields for Scl (scalar float/double), MMX (MMX integer), XMM, YMM and ZMM widths. These fields mostly share the same classes but it should give us the flexibility that we may need in the future. This patch has replaced a set of example SSE/AVX512 instruction cases but isn't exhaustive as it gets very noisy before we really need the functionality. Differential Revision: https://reviews.llvm.org/D46266 llvm-svn: 331208
* [X86] Restrict many of the InstAliases to either to only att or intel ↵Craig Topper2018-04-281-18/+18
| | | | | | | | | | | | syntax. NFCI Many of these aliases exist to give one syntax or the other a slightly different mnemonic and the other variant gets a duplicate of its normal mnemonic This patch restricts a lot of these to only one variant so we don't get the duplication. This removes a lot of duplicate entries from the matcher table. It also reduces the number of warnings printed when you enable the ambiguous match warning in tablegen. llvm-svn: 331117
* [X86] Split WriteFBlend/WriteFVarBlend/WriteFVarShuffle into XMM and YMM/ZMM ↵Simon Pilgrim2018-04-271-22/+33
| | | | | | | | scheduler classes This removes all the WriteFBlend/WriteFVarBlend InstRW overrides - some WriteFVarShuffle remain to be fixed. llvm-svn: 331065
* [X86][AVX] Split WriteFLogic into XMM and YMM/ZMM scheduler classesSimon Pilgrim2018-04-271-19/+31
| | | | | | This removes all the AND/ANDN/OR/XOR PS/PD InstRW overrides. llvm-svn: 331051
* [X86] Split WriteFMA into XMM, Scalar and YMM/ZMM scheduler classesSimon Pilgrim2018-04-251-32/+50
| | | | | | | | This removes all the FMA InstRW overrides. If we ever get PR36924, then we can remove many of these declarations from models. llvm-svn: 330820
* [AVX512] VPERMQ/VPERMPD/VPERMIL single op shuffles are not variable shufflesSimon Pilgrim2018-04-241-4/+5
| | | | | | These variants all take an immediate shuffle mask value and should be scheduled as such. llvm-svn: 330747
* [X86][F16C] Add WriteCvtF2FSt scheduling classSimon Pilgrim2018-04-241-17/+13
| | | | | | Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887) llvm-svn: 330737
* [X86] Add vector element insertion/extraction scheduler classesSimon Pilgrim2018-04-241-11/+11
| | | | | | | | | | | | Split off pinsr/pextr and extractps instructions. (Mostly) fixes PR36887. Note: It might be worth adding a WriteFInsertLd class as well in the future. Differential Revision: https://reviews.llvm.org/D45929 llvm-svn: 330714
* [X86] Add WriteFSign/WriteFLogic scheduler classesSimon Pilgrim2018-04-201-4/+4
| | | | | | | | | | | | | | Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes. This unearthed a couple of things that are also handled in this patch: (1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic (2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015. (3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops. Differential Revision: https://reviews.llvm.org/D45629 llvm-svn: 330480
* [X86] Add separate scheduling class for PSADBW instruction.Craig Topper2018-04-171-1/+1
| | | | llvm-svn: 330204
* [X86] Add FP comparison scheduler classesSimon Pilgrim2018-04-171-28/+28
| | | | | | | | Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd Differential Revision: https://reviews.llvm.org/D45656 llvm-svn: 330179
* [X86][AVX512] UNPCKL/H PS and PD should be scheduled with WriteFShuffle not ↵Simon Pilgrim2018-04-131-2/+2
| | | | | | WriteFAdd llvm-svn: 330023
* [X86] Remove remaining OpndItins/SizeItins from all instruction defs (PR37093)Simon Pilgrim2018-04-131-1333/+1298
| | | | llvm-svn: 330022
* [X86] Remove OpndItins/SizeItins from all sse instruction defs (PR37093)Simon Pilgrim2018-04-131-8/+8
| | | | llvm-svn: 330013
* [X86] Remove unused MoveLoadStoreItins/ShiftOpndItins schedule class wrappers.Simon Pilgrim2018-04-121-14/+14
| | | | | | Was being used to move around empty/unused itineraries... llvm-svn: 329970
* [X86] Remove x86 InstrItinClass entries (PR37093)Simon Pilgrim2018-04-121-12/+12
| | | | | | This removes the last of the x86 schedule itineraries, I'm intending to cleanup the remaining uses of NoItinerary/OpndItins/etc. before resolving PR37093. llvm-svn: 329967
* [X86] Remove InstrItinClass entries from all x86 instruction defs (PR37093)Simon Pilgrim2018-04-121-99/+82
| | | | llvm-svn: 329953
* [X86] Remove InstrItinClass entries from SSE/AVX instructions defs (PR37093)Simon Pilgrim2018-04-121-343/+331
| | | | llvm-svn: 329945
* [X86] Remove explicit SSE/AVX schedule itineraries from defs (PR37093)Simon Pilgrim2018-04-121-100/+98
| | | | llvm-svn: 329940
* [X86] Remove AES/CLMUL/CRC32/LDDQU/MOVNT/POPCNT/SHA schedule itineraries ↵Simon Pilgrim2018-04-121-3/+2
| | | | | | (PR37093) llvm-svn: 329912
* [X86] Remove remaining system/special schedule itineraries (PR37093)Simon Pilgrim2018-04-121-2/+2
| | | | llvm-svn: 329906
* [X86] Add variable shuffle schedule classesSimon Pilgrim2018-04-111-3/+3
| | | | | | | | | | | | | | Split variable index shuffles from immediate index shuffles WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.) WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.) WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.) WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.) Differential Revision: https://reviews.llvm.org/D45404 llvm-svn: 329806
* [X86] Synchronize the SchedRW on some EVEX instructions with their VEX ↵Craig Topper2018-04-051-67/+82
| | | | | | | | equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329330
* [X86] Revert r329251-329254Craig Topper2018-04-051-82/+67
| | | | | | | | | | | | | It's failing on the bots and I'm not sure why. This reverts: [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. [X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. [X86] Remove some InstRWs for plain store instructions on Sandy Bridge. [X86] Auto-generate complete checks. NFC llvm-svn: 329256
* [X86] Synchronize the SchedRW on some EVEX instructions with their VEX ↵Craig Topper2018-04-051-67/+82
| | | | | | | | equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329254
* [X86] Use the same predicate for the load for PMOVSXBQ and PMOVZXBQ.Craig Topper2018-04-041-4/+4
| | | | | | These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both. llvm-svn: 329154
* [X86] Fix the SchedRW for AVX512 shift instructions.Craig Topper2018-04-021-8/+13
| | | | | | It was being inadvertently defaulted to an FADD scheduler class. llvm-svn: 328959
* [X86] Give the AVX512 VEXTRACT instructions the same SchedRWs as the SSE/AVX ↵Craig Topper2018-04-021-29/+19
| | | | | | versions. llvm-svn: 328958
* [X86] Add SchedRW for PMULLDCraig Topper2018-03-311-1/+1
| | | | | | | | | | | | | | | | | | | Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914
* [X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats.Craig Topper2018-03-131-12/+4
| | | | | | | | | | This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs. This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best. We probably want to merge this with the vXi1 concat code since they are very similar. llvm-svn: 327454
* [X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead ↵Craig Topper2018-03-081-6/+4
| | | | | | | | | | | | of vXi32. This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX. I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that. I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that. llvm-svn: 326991
* [X86] Add a DAG combine to turn stores of vXi1 constants into scalar stores.Craig Topper2018-03-041-4/+0
| | | | llvm-svn: 326679
* [X86] Lower v1i1/v2i1/v4i1/v8i1 load/stores to i8 load/store during op ↵Craig Topper2018-03-041-27/+2
| | | | | | | | legalization if AVX512DQ is not supported. We were previously doing this with isel patterns. Moving it to op legalization gives us chance to see the required bitcast earlier. And it lets us remove some isel patterns. llvm-svn: 326669
* [X86] Lower extract_element from k-registers by bitcasting from v16i1 to i16 ↵Craig Topper2018-02-281-3/+0
| | | | | | | | and extending/truncating. This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns. llvm-svn: 326375
OpenPOWER on IntegriCloud