summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrAVX512.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][AVX512] Add itinerary argument to all AVX512_maskable_* wrappers. NFCISimon Pilgrim2017-11-291-44/+49
| | | | | | All default to NoItinerary llvm-svn: 319326
* [X86][AVX512] Tag VPERMILV instruction scheduler classSimon Pilgrim2017-11-291-17/+21
| | | | llvm-svn: 319316
* [X86][AVX512] Setup unary (PABS/VPLZCNT/VPOPCNT/VPCONFLICT/VMOV*DUP) ↵Simon Pilgrim2017-11-291-55/+78
| | | | | | instruction scheduler classes llvm-svn: 319312
* [X86] Mark ISD::FP_TO_UINT v16i8/v16i16 as Promote under AVX512 instead of ↵Craig Topper2017-11-281-5/+0
| | | | | | | | | | legal. Fix infinite loop in op legalization when promotion requires 2 steps. Previously we had an isel pattern to add the truncate. Instead use Promote to add the truncate to the DAG before isel. The Promote legalization code had to be updated to prevent an infinite loop if promotion took multiple steps because it wasn't remembering the previously tried value. llvm-svn: 319259
* [X86] Remove some unused pattern fragments from td file. NFCCraig Topper2017-11-281-10/+0
| | | | llvm-svn: 319143
* [X86][AVX512] Tag AVX512 PACKSS/PACKUS/PMADDWD/PMADDUBSW instructions with ↵Simon Pilgrim2017-11-271-20/+24
| | | | | | | | SSE_PACK/SSE_PMADD schedule classes llvm-svn: 319065
* [X86][AVX512] Tag AVX512 sqrt instructions with SSE_SQRT schedule classesSimon Pilgrim2017-11-271-29/+32
| | | | llvm-svn: 319045
* [X86][FMA] Tag all FMA/FMA4 instructions with WriteFMA schedule classSimon Pilgrim2017-11-271-17/+21
| | | | | | | | | | As mentioned on PR17367, many instructions are missing scheduling tags preventing us from setting 'CompleteModel = 1' for better instruction analysis. This patch deals with FMA/FMA4 which is one of the bigger offenders (along with AVX512 in general). Annoyingly all scheduler models need to define WriteFMA (now that its actually used), even for older targets without FMA/FMA4 support, but that is an existing problem shared by other schedule classes. Differential Revision: https://reviews.llvm.org/D40351 llvm-svn: 319016
* [x86][icelake]GFNICoby Tayree2017-11-261-0/+52
| | | | | | | | | | galois field arithmetic (GF(2^8)) insns: gf2p8affineinvqb gf2p8affineqb gf2p8mulb Differential Revision: https://reviews.llvm.org/D40373 llvm-svn: 318993
* [x86][icelake]BITALGCoby Tayree2017-11-231-0/+27
| | | | | | | | | | 2/3 vpshufbitqmb encoding 3/3 vpshufbitqmb intrinsics Differential Revision: https://reviews.llvm.org/D40222 llvm-svn: 318904
* [X86] Lower all ISD::MGATHER nodes to X86ISD:MGATHER.Craig Topper2017-11-221-1/+1
| | | | | | | | Now we consistently represent the mask result without relying on isel ignoring it. We now have a more general SDNode and type constraints to represent these nodes in isel patterns. This allows us to present both both vXi1 and XMM/YMM mask types with a single set of constraints. llvm-svn: 318821
* [x86][icelake]BITALGCoby Tayree2017-11-211-0/+11
| | | | | | | vpopcnt{b,w} Differential Revision: https://reviews.llvm.org/D40213 llvm-svn: 318748
* [x86][icelake]VNNICoby Tayree2017-11-211-0/+44
| | | | | | | | | Introducing Vector Neural Network Instructions, consisting of: vpdpbusd{s} vpdpwssd{s} Differential Revision: https://reviews.llvm.org/D40208 llvm-svn: 318746
* [x86][icelake]vbmi2Coby Tayree2017-11-211-10/+107
| | | | | | | | | | | introducing vbmi2, consisting of vpcompress{b,w} vpexpand{b,w} vpsh{l,r}d{w,d,q} vpsh{l,r}dv{w,d,q} Differential Revision: https://reviews.llvm.org/D40206 llvm-svn: 318745
* [x86][icelake]vpclmulqdq introductionCoby Tayree2017-11-211-0/+23
| | | | | | | an icelake promotion of pclmulqdq Differential Revision: https://reviews.llvm.org/D40101 llvm-svn: 318741
* [x86][icelake]VAES introductionCoby Tayree2017-11-211-0/+27
| | | | | | | an icelake promotion of AES Differential Revision: https://reviews.llvm.org/D40078 llvm-svn: 318740
* [X86] Add test cases for rndscaless/sd intrinsics.Craig Topper2017-11-191-1/+1
| | | | | | Also fix the memop in the ins for these instructions. Not sure what effect this has. llvm-svn: 318624
* [X86] Improve load folding of scalar rcp28 and rsqrt28 instructions using ↵Craig Topper2017-11-191-3/+2
| | | | | | sse_load_f32/f64. llvm-svn: 318623
* [X86] Redefine the 128-bit version of VPGATHERQD and VGATHERQPS to use a VK2 ↵Craig Topper2017-11-151-6/+8
| | | | | | | | | | mask instead of a VK4 mask. This allows us to remove extra extend creation during lowering and more accurately reflects the semantics of the instruction. While there add an extra output VT to X86 masked gather node to better match the isel pattern predicate. Currently we're exploiting the fact that the isel table doesn't count how many output results a node actually has if the result type of any can be inferred from the first result and the type constraints defined in tablegen. I think we might ultimately want to lower all MGATHER/MSCATTER to an X86ISD node with the extra mask result and stop relying on this hole in the isel checking. llvm-svn: 318278
* [X86] Use sse_load_f32/f64 to improve load folding of scalar vfscalefss/sd, ↵Craig Topper2017-11-131-5/+4
| | | | | | vrcp14ss/sd, rsqrt14ss/sd instructions. llvm-svn: 318022
* [X86] Use sse_load_f32/f64 to improve load folding for scalar VFPCLASS ↵Craig Topper2017-11-131-4/+4
| | | | | | intrinsics. llvm-svn: 318019
* [X86] Fix SQRTSS/SQRTSD/RCPSS/RCPSD intrinsics to use ↵Craig Topper2017-11-131-4/+3
| | | | | | sse_load_f32/sse_load_f64 to increase load folding opportunities. llvm-svn: 318016
* [X86] Use sse_load_f32/f64 in patterns for the memory forms of VRNDSCALESS/SD.Craig Topper2017-11-131-3/+2
| | | | llvm-svn: 318009
* [X86] Split VRNDSCALE/VREDUCE/VGETMANT/VRANGE ISD nodes into versions with ↵Craig Topper2017-11-131-40/+46
| | | | | | | | and without the rounding operand. NFCI I want to reuse the VRNDSCALE node for the legacy SSE rounding intrinsics so that those intrinsics can use EVEX instructions. All of these nodes share tablegen multiclasses so I split them all so that they all remain similar in their implementations. llvm-svn: 318007
* [X86] Add an X86ISD::RANGES opcode to use for the scalar intrinsics.Craig Topper2017-11-121-2/+2
| | | | | | This fixes a bug where we selected packed instructions for scalar intrinsics. llvm-svn: 317999
* [X86] Use vrndscaleps/pd for 128/256 ffloor/ftrunc/fceil/fnearbyint/frint ↵Craig Topper2017-11-111-0/+46
| | | | | | | | when avx512vl is enabled. This matches what we do for scalar and 512-bit types. llvm-svn: 317991
* [X86] Add scalar register class versions of VRNDSCALE instructions and ↵Craig Topper2017-11-111-34/+50
| | | | | | | | | | rename the existing versions to _Int. This is consistent with out normal implementation of scalar instructions. While there disable load folding for the patterns with IMPLICIT_DEF unless optimizing for size which is also our standard practice. llvm-svn: 317977
* [X86] Inline some SDNode operand multiclass operands that don't vary. NFCCraig Topper2017-11-111-33/+28
| | | | llvm-svn: 317975
* [X86] Set the execution domain for VFPCLASS to SSEPackedSingle/Double.Craig Topper2017-11-111-1/+3
| | | | llvm-svn: 317974
* [X86] Set the execution domain for vptest instruction to the integer domain.Craig Topper2017-11-111-0/+3
| | | | llvm-svn: 317973
* [X86] Give priority to EVEX FMA instructions over FMA4 instructions.Craig Topper2017-11-091-2/+2
| | | | | | No existing processor has both so it doesn't really matter what we do here. But we were previously just relying on pattern order which gave FMA4 priority. llvm-svn: 317775
* [X86] Add patterns to fold EVEX store with EVEX encoded vcvtps2ph ↵Craig Topper2017-11-081-11/+23
| | | | | | instructions. Remove bad pattern that had vf432 vcvtps2ph storing 128-bits. llvm-svn: 317662
* [X86] Add patterns to fold a 64-bit load into the EVEX vcvtph2ps instructions.Craig Topper2017-11-071-7/+16
| | | | llvm-svn: 317548
* [X86] Add support for using EVEX instructions for the legacy vcvtph2ps ↵Craig Topper2017-11-071-12/+13
| | | | | | | | intrinsics. Looks like there's some missed load folding opportunities for i64 loads. llvm-svn: 317544
* [X86] Use IMPLICIT_DEF in VEX/EVEX vcvtss2sd/vcvtsd2ss patterns instead of a ↵Craig Topper2017-11-071-2/+2
| | | | | | | | COPY_TO_REGCLASS. ExeDepsFix pass should take care of making the registers match. llvm-svn: 317542
* [X86] Make FeatureAVX512 imply FeatureF16C.Craig Topper2017-11-061-29/+0
| | | | | | | | | | The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available. Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns. All known CPUs with AVX512 have F16C so this should safe for now. llvm-svn: 317521
* [X86] Add scalar FMA ISD nodes without rounding mode. NFCCraig Topper2017-11-061-23/+26
| | | | | | Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers. llvm-svn: 317453
* [X86] Use EVEX encoded instructions for legacy scalar sqrt intrinsics.Craig Topper2017-11-061-5/+18
| | | | | | Fixes PR35161. llvm-svn: 317445
* [X86] Add missing predicate to a pattern. NFCCraig Topper2017-11-051-0/+2
| | | | | | Other patterns had higher priority so this wasn't noticed. But we shouldn't be dependent on pattern order. llvm-svn: 317442
* [X86] Remove some more RCP and RSQRT patterns from InstrAVX512.td that I ↵Craig Topper2017-11-051-13/+0
| | | | | | missed in r317413. llvm-svn: 317441
* [X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy ↵Craig Topper2017-11-041-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SSE rcp/rsqrt intrinsics when AVX512 features are enabled. Summary: AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement. Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt. I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed. As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here. This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions. Going forward I think our focus should be on -Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14. -Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER -Supporting double precision. Reviewers: zvi, DavidKreitzer, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39583 llvm-svn: 317413
* [X86] Remove PALIGNR/VALIGN handling from combineBitcastForMaskedOp and move ↵Craig Topper2017-11-031-0/+117
| | | | | | to isel patterns instead. Prefer 128-bit VALIGND/VALIGNQ over PALIGNR during lowering when possible. llvm-svn: 317299
* [X86] Add more type qualifiers to INSERT_SUBREG operations in rotate ↵Craig Topper2017-11-011-8/+8
| | | | | | | | | | patterns so they don't get created with a v64i8 type. Not sure why tablegen didn't error on this. Fixes PR35158. llvm-svn: 317079
* [AVX512] Adding new patterns for extract_subvector of vXi1Michael Zuckerman2017-10-311-14/+42
| | | | | | | | | | | | | | | | | extract subvector of vXi1 from vYi1 is poorly supported by LLVM and most of the time end with an assertion. This patch fixes this issue by adding new patterns to the TD file. Reviewers: 1. guyblank 2. igorb 3. zvi 4. ayman 5. craig.topper Differential Revision: https://reviews.llvm.org/D39292 Change-Id: Ideb4d7e946c8d40cfce2920891f2d89fe64c58f8 llvm-svn: 316981
* [X86][SSE] Remove AssertZext stage from PEXTRW/PEXTRB lowering. NFCI.Simon Pilgrim2017-10-231-3/+2
| | | | | | | | Remove AssertZext and instead add PEXTRW/PEXTRB support to computeKnownBitsForTargetNode to simplify instruction selection. Differential Revision: https://reviews.llvm.org/D39169 llvm-svn: 316336
* Strip trailing whitespace. NFCI.Simon Pilgrim2017-10-221-2/+2
| | | | llvm-svn: 316296
* [X86] Add VEX_WIG to applicable AVX512 instructions.Craig Topper2017-10-221-41/+43
| | | | | | This should be NFC. Will be used in future patches to fix disassembler bugs. llvm-svn: 316284
* [AVX512] Don't mark EXTLOAD as legal with AVX512. Continue using custom ↵Craig Topper2017-10-151-40/+0
| | | | | | | | | | | | | | | | | | | lowering. Summary: This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes. There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8. Reviewers: RKSimon, zvi, delena Reviewed By: delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38714 llvm-svn: 315860
* [X86] Add patterns for vzmovl+cvtpd2dq/cvttpd2dq with a load.Craig Topper2017-10-141-0/+6
| | | | llvm-svn: 315802
* [X86] Add patterns for vzmovl+cvtpd2ps with a load.Craig Topper2017-10-141-4/+8
| | | | llvm-svn: 315800
OpenPOWER on IntegriCloud