summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrAVX512.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] VRNDSCALE* folding from masked and scalar ffloor and fceil patternsMikhail Dvoretckii2018-06-191-2/+84
| | | | | | | | | | | This patch handles back-end folding of generic patterns created by lowering the X86 rounding intrinsics to native IR in cases where the instruction isn't a straightforward packed values rounding operation, but a masked operation or a scalar operation. Differential Revision: https://reviews.llvm.org/D45203 llvm-svn: 335037
* [X86] Add the ability to force an EVEX2VEX mapping table entry from the .td ↵Craig Topper2018-06-191-64/+119
| | | | | | | | | | | | | | files. Remove remaining manual table entries from the tablegen emitter. This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information. Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions. Finally, remove the manual table from the emitter. This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap. llvm-svn: 335018
* [X86] Add a new VEX_WPrefix encoding to tag EVEX instruction that have ↵Craig Topper2018-06-191-7/+7
| | | | | | | | | | | | | | VEX.W==1, but can be converted to their VEX equivalent that uses VEX.W==0. EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity. The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1. This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0. This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler. llvm-svn: 335017
* [X86] Remove ReadAfterLd from avx512_shift_rmbi multiclass.Craig Topper2018-06-181-1/+1
| | | | | | The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd? llvm-svn: 334994
* [X86] Encode the EVEX2VEX exception list information in .td files instead of ↵Craig Topper2018-06-181-15/+34
| | | | | | | | the emitter source. Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility. llvm-svn: 334971
* [X86] Add '.s' aliases to the assembler for the various redundant move ↵Craig Topper2018-06-181-3/+0
| | | | | | | | | | encodings to match gas and our EVEX instructions. We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions. Also remove the vpextrw.s EVEX alias. That's not something gas implements. llvm-svn: 334922
* [X86] Move the 'vmovq.s' and similar assembly strings for EVEX vector moves ↵Craig Topper2018-06-181-45/+80
| | | | | | | | | | with reversed operands to InstAliases. The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser. Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported. llvm-svn: 334920
* [X86] More additions to the load folding tables based on the autogenerated ↵Craig Topper2018-06-161-17/+22
| | | | | | | | tables. Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table. llvm-svn: 334898
* [X86] Fix an inconsistency between AVX512 and AVX/SSE version on a couple ↵Craig Topper2018-06-161-2/+2
| | | | | | | | | | instructions. VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode. VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode. llvm-svn: 334896
* [X86] Lowering sqrt intrinsics to native IRTomasz Krupa2018-06-151-13/+10
| | | | | | | | | | | | | | Summary: Complementary patch to lowering sqrt intrinsics in Clang. Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, mike.dvoretsky, llvm-commits Differential Revision: https://reviews.llvm.org/D41599 llvm-svn: 334849
* [X86] Add 'Z' to the internal names of various EVEX instructions for overall ↵Craig Topper2018-06-151-36/+36
| | | | | | consistency. llvm-svn: 334785
* [X86] Remove '128' from the internal name of some scalar FP instructions to ↵Craig Topper2018-06-141-8/+8
| | | | | | be consistent with other scalar instructions. llvm-svn: 334727
* [X86] Remove NotMemoryFoldable from some AVX/AVX512 scalar instructions.Craig Topper2018-06-141-9/+8
| | | | | | Some of these instructions are already in the manual folding table so we should have them in the auto table too. llvm-svn: 334725
* [x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes ↵Craig Topper2018-06-141-15/+121
| | | | | | | | | | | | | | | | | | | | | and isel patterns (PR37551) Summary: The tests in: https://bugs.llvm.org/show_bug.cgi?id=37751 ...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes. This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll Reviewers: RKSimon, gbedwell, spatel Reviewed By: spatel Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D47993 llvm-svn: 334685
* [X86] Mark all instructions that have masked store semantics with ↵Craig Topper2018-06-131-6/+8
| | | | | | | | NotMemoryFoldable. Remove dependency on SchedRW from memory table autogenerator. Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore. llvm-svn: 334563
* [X86] Remove VPCOMPRESSB/W from the autogenerated load folding table.Craig Topper2018-06-131-2/+4
| | | | llvm-svn: 334562
* [X86] Remove mayLoad flag from AVX512 truncating store instructions.Craig Topper2018-06-121-2/+1
| | | | llvm-svn: 334529
* [X86] Add NotMemoryFoldable to the VPCOMPRESS instructions.Craig Topper2018-06-121-4/+4
| | | | llvm-svn: 334481
* [X86] Add isel patterns for folding loads when creating ROUND instructions ↵Craig Topper2018-06-121-0/+66
| | | | | | | | | | | | | | | | from ffloor/fnearbyint/fceil/frint/ftrunc. We were missing packed isel folding patterns for all of sse41, avx, and avx512. For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns. Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch. Some of this was spotted in the review for D47993. This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns. llvm-svn: 334460
* [X86][AVX512] Tag AVX5124FMAPS/AVX5124VNNIW with missing scheduler classesSimon Pilgrim2018-06-111-6/+12
| | | | | | | | Necessary for D46276 as even though btver2 doesn't use these instructions, its now flagged as complete so complains if ANY instruction isn't tagged..... UnsupportedFeatures wouldn't help here as these instructions don't appear to have a feature predicate (like a lot of AVX512). llvm-svn: 334423
* [X86] Fix skylake server scheduling info.Clement Courbet2018-06-111-4/+4
| | | | | | | | | | | | | | Summary: This fixes most of the scheduling info for SKX vector operations. I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM. The before/after llvm-exegesis analysis are in the phabricator diff. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47721 llvm-svn: 334407
* [X86] Rename vy512mem->vy512xmem and vz256xmem->vz256mem.Craig Topper2018-06-061-12/+12
| | | | | | | | | | | The index size is represented by the letter after the 'v'. The number represents the memory size. If an 'x' appears after the number its means the index register can be from VR128X/VR256X instead of VR128/VR256. As vy512mem uses a VR256X index it should have an x. And vz256mem uses a VR512 index so it shouldn't have an x. I admit these names kind of suck and are confusing. llvm-svn: 334120
* TableGen: Streamline the semantics of NAMENicolai Haehnle2018-06-041-157/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The new rules are straightforward. The main rules to keep in mind are: 1. NAME is an implicit template argument of class and multiclass, and will be substituted by the name of the instantiating def/defm. 2. The name of a def/defm in a multiclass must contain a reference to NAME. If such a reference is not present, it is automatically prepended. And for some additional subtleties, consider these: 3. defm with no name generates a unique name but has no special behavior otherwise. 4. def with no name generates an anonymous record, whose name is unique but undefined. In particular, the name won't contain a reference to NAME. Keeping rules 1&2 in mind should allow a predictable behavior of name resolution that is simple to follow. The old "rules" were rather surprising: sometimes (but not always), NAME would correspond to the name of the toplevel defm. They were also plain bonkers when you pushed them to their limits, as the old version of the TableGen test case shows. Having NAME correspond to the name of the toplevel defm introduces "spooky action at a distance" and breaks composability: refactoring the upper layers of a hierarchy of nested multiclass instantiations can cause unexpected breakage by changing the value of NAME at a lower level of the hierarchy. The new rules don't suffer from this problem. Some existing .td files have to be adjusted because they ended up depending on the details of the old implementation. Change-Id: I694095231565b30f563e6fd0417b41ee01a12589 Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D47430 llvm-svn: 333900
* [X86] Add tied source operand to AVX5124FMAPS and AVX5124VNNIW instructions.Craig Topper2018-06-021-20/+33
| | | | | | This doesn't affect the assembly or disassembly, but is more accurate. llvm-svn: 333822
* [X86] Add encoding information for the AVX5124FMAPS and AVX5124VNNIW ↵Craig Topper2018-06-021-0/+44
| | | | | | | | | | instructions so they can be assembled and disassembled. These instructions are unusual in that they operate on 4 consecutive registers so supporting them in codegen will be more difficult than normal. Includes an assembler check to warn if the source register is not the first register of a 4 register group. llvm-svn: 333812
* [X86] Add isel patterns to use vexpand with zero masking when the passthru ↵Craig Topper2018-06-011-0/+4
| | | | | | value is a zero vector. llvm-svn: 333800
* [X86] Remove some of the extractelts from the new MOVSS+FMA patterns.Craig Topper2018-05-291-33/+40
| | | | | | | | We only need the extractelt that corresponds to the register we're trying to insert back into. We can't guarantee the others haven't been optimized out depending on how those operands were produced. So instead just look for an FR32/FR64 input and emit a COPY_TO_REGCLASS to VR128 in the output pattern. This matches what we do for ADD/SUB/MUL/DIV. llvm-svn: 333473
* [X86] Use VR128X instead of VR128 in EVEX instruction patterns.Craig Topper2018-05-291-23/+23
| | | | llvm-svn: 333464
* [X86] Rename the operands in the recently introduced MOVSS+FMA patterns so ↵Craig Topper2018-05-291-20/+20
| | | | | | | | that the operand names in the output pattern are always in 1, 2, 3 order since those are the operand names in the instruction. The order should be controlled in the input pattern. llvm-svn: 333463
* [X86] Scalar mask and scalar move optimizationsAlexander Ivchenko2018-05-291-49/+127
| | | | | | | | | | | | | | 1. Introduction of mask scalar TableGen patterns. 2. Introduction of new scalar move TableGen patterns and refactoring of existing ones. 3. Folding of pattern created by introducing scalar masking in Clang header files. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D47012 llvm-svn: 333419
* [X86] Converge X86ISD::VPERMV3 and X86ISD::VPERMIV3 to a single opcode.Craig Topper2018-05-281-14/+19
| | | | | | | | | | These do the same thing with the first and second sources swapped. They previously came from separate intrinsics that specified different masking behavior. But we can cover that with isel patterns and a single node. This is a step towards reducing the number of intrinsics needed. A bunch of tests change because we are now biased to choosing VPERMT over VPERMI when there is nothing to signal that commuting is beneficial. llvm-svn: 333383
* [X86] Stop forcing X86VPermi2X node index operand to match destination type ↵Craig Topper2018-05-281-30/+87
| | | | | | | | | | to make masking pattern matching easier. Add extra patterns with bitcasts instead. This basically reverts r280696 in favor of using extra patterns as mentioned as an alternative in that commit message. For now I've only added the cases we have test cases for, but it should be easy to add more in the future. This will help to convert VPERMI2PS/VPERMT2PS intrinsics to use a single ISD node opcode. And hopefully allow some intrinsics to be removed. llvm-svn: 333365
* [X86][MIPS][ARM] New machine instruction property 'isMoveReg'Petar Jovanovic2018-05-231-1/+3
| | | | | | | | | | | | | This property is needed in order to follow values movement between registers. This property is used in TII to implement method that returns true if simple copy like instruction is recognized, along with source and destination machine operands. Patch by Nikola Prica. Differential Revision: https://reviews.llvm.org/D45204 llvm-svn: 333093
* [X86] Add GPR<->XMM Schedule TagsSimon Pilgrim2018-05-181-9/+9
| | | | | | | | | | BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737) The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1: SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM) Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner) llvm-svn: 332745
* [X86][SSE] Ensure vector partial load/stores use the ↵Simon Pilgrim2018-05-181-10/+10
| | | | | | | | | | WriteVecLoad/WriteVecStore scheduler classes Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc. Fixes BtVer2/SLM which have different behaviours for GPR stores. llvm-svn: 332718
* [X86][SSE] Ensure float load/stores use the WriteFLoad/WriteFStore scheduler ↵Simon Pilgrim2018-05-181-16/+16
| | | | | | | | | | classes Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc. Fixes BtVer2/SLM which have different behaviours for GPR stores. llvm-svn: 332714
* [X86] Add OptForSize to a couple load folding patterns. Remove some bad ↵Craig Topper2018-05-171-1/+1
| | | | | | | | FIXME comments. The FIXME comments were about preventing load folding to avoid a partial xmm update. But these instructions use GPR as input when the load isn't folded. This won't help prevent a partial xmm update. llvm-svn: 332573
* [X86] Split WriteCvtI2F/WriteCvtF2I into I<->F32 and I<->F64 scheduler classesSimon Pilgrim2018-05-161-107/+107
| | | | | | A lot of the models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first llvm-svn: 332451
* [X86] Split WriteCvtF2F into F32->F64 and F64->F32 scheduler classesSimon Pilgrim2018-05-151-15/+15
| | | | | | | | BtVer2 - Fixes schedules for (V)CVTPS2PD instructions A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first llvm-svn: 332376
* [X86] Split off F16C WriteCvtPH2PS/WriteCvtPS2PH scheduler classesSimon Pilgrim2018-05-151-13/+17
| | | | | | | | | Btver2 - VCVTPH2PSYrm needs to double pump the AGU Broadwell - missing VCVTPS2PH*mr stores extra latency Allows us to remove the WriteCvtF2FSt conversion store class llvm-svn: 332357
* [X86] Add NT load/store scheduler classesSimon Pilgrim2018-05-141-3/+3
| | | | llvm-svn: 332274
* [X86] Remove and autoupgrade avx512.vbroadcast.ss/avx512.vbroadcast.sd ↵Craig Topper2018-05-141-5/+0
| | | | | | intrinsics. llvm-svn: 332271
* [X86] Remove and autoupgrade the cvtusi2sd intrinsic. Use ↵Craig Topper2018-05-141-7/+0
| | | | | | uitofp+insertelement instead. llvm-svn: 332206
* [X86] Add patterns for combining movss+uint_to_fp into the intrinsic ↵Craig Topper2018-05-131-0/+40
| | | | | | | | instructions under AVX512. This matches what we do for sint_to_fp. llvm-svn: 332205
* [X86] Add some load folding patterns for cvtsi2ss/sd into intrinsic ↵Craig Topper2018-05-131-0/+20
| | | | | | instructions. llvm-svn: 332189
* [X86] Remove and autoupgrade cvtsi2ss/cvtsi2sd intrinsics to match what ↵Craig Topper2018-05-121-16/+0
| | | | | | clang has used for a very long time. llvm-svn: 332186
* [X86] Added scheduler helper classes to split move/load/store by sizeSimon Pilgrim2018-05-111-102/+100
| | | | | | Nothing uses this yet but this will allow us to specialize MMX/XMM/YMM/ZMM vector moves. llvm-svn: 332090
* [X86] Remove and autoupgrade the avx512.mask.store.ss intrinsic.Craig Topper2018-05-111-4/+0
| | | | llvm-svn: 332079
* [X86] Add new patterns for masked scalar load/store to match clang's codegen ↵Craig Topper2018-05-101-0/+117
| | | | | | | | | | | | from r331958. Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets. So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened. We may be able to drop some of the old patterns, but I leave that for a future patch. llvm-svn: 332049
* [X86] Split ↵Simon Pilgrim2018-05-101-4/+6
| | | | | | | | WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes Split off XMM classes from the default (MMX) classes. llvm-svn: 331999
OpenPOWER on IntegriCloud