summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrAVX512.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Change the masked FPCLASS implementation to use AND instead of OR to ↵Craig Topper2018-02-281-5/+5
| | | | | | | | | | | | | | | | | | | | | | combine the mask results. While the description for the instruction does mention OR, its talking about how the individual classification test results are ORed together. The incoming mask is used as a zeroing write mask. If the bit is 1 the classification is written to the output. The bit is 0 the output is 0. This equivalent to an AND. Here is pseudocode from the intrinsics guide FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0]) ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0 llvm-svn: 326306
* [X86] Use SDNode instead of SDPatternOperator. NFCCraig Topper2018-02-251-7/+7
| | | | llvm-svn: 326048
* [X86] Remove checks for '(scalar_to_vector (i8 (trunc GR32:)))' from scalar ↵Craig Topper2018-02-241-4/+4
| | | | | | | | masked move patterns. This portion can be matched by other patterns. We don't need it to make the larger pattern valid. It's sufficient to have a v1i1 mask input without caring where it came from. llvm-svn: 325999
* [X86] Add assembler/disassembler support for blendm with zero masking and ↵Craig Topper2018-02-231-0/+8
| | | | | | | | broacast. Fixes PR31617 llvm-svn: 325957
* [X86] Add DAG combine to remove (and X, 1) from in front of a v1i1 scalar to ↵Craig Topper2018-02-231-4/+4
| | | | | | | | | | | | vector. These can be created by type legalization promoting the inputs to select to match scalar boolean contents. We were trying to pattern match them away during isel, but its better to just remove them from the DAG. I've cleaned up some patterns to not check for this 'and' anymore. But I suspect this has also opened up opportunities for pattern removal. llvm-svn: 325949
* [X86] Make a helper function for commuting AVX512 VPCMP immediates since we ↵Craig Topper2018-02-201-12/+1
| | | | | | do it in two places. llvm-svn: 325546
* [X86] Reduce the number of isel pattern variations needed for ↵Craig Topper2018-02-191-16/+24
| | | | | | | | | | VPTESTM/VPTESTNM matching. Canonicalize EQ/NE PCMPM to have build vector all zeros on the RHS so we don't have to pattern match it in both locations. This significantly reduces the number of isel patterns needed since we also had to multiply it out with loads being in either operand of the 'and' input node and in the 'and' masking node. This removes over 24000 bytes from the isel table. llvm-svn: 325526
* [X86] Make masked pcmpeq commutable during isel so we can fold loads in ↵Craig Topper2018-02-181-2/+2
| | | | | | | | | | other operand to the shorter encoding. Previously we used the immediate encoding if the load was in operand 0 and the short encoding if the load was in operand 1. This added an insane number of bytes to the size of the isel table. I'm wondering if we should always use the immediate form during isel and change to the short form during emission. This would remove the need to pattern match every combination for both the immediate form and the short form during isel. We could do the same with vpcmpgt llvm-svn: 325456
* [X86][AVX512] Add missing scheduling class tag for KMOVB/KMOVW/KMOVD/KMOVQ ↵Simon Pilgrim2018-02-121-3/+5
| | | | | | | | moves/loads/stores. We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639). llvm-svn: 324905
* [X86][AVX512] Add missing scheduling class tag for ↵Simon Pilgrim2018-02-121-3/+7
| | | | | | | | | | VMOVQ/VMOVHLPS/VMOVLHPS/VMOVHPD/VMOVHPS/VMOVLPD/VMOVLPS Tag AVX512 variants to match SSE/AVX originals. We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639). llvm-svn: 324901
* [X86] Add KADD X86ISD opcode instead of reusing ISD::ADD.Craig Topper2018-02-121-1/+1
| | | | | | | | ISD::ADD implies individual vector element addition with no carries between elements. But for a vXi1 type that would be the same as XOR. And we already turn ISD::ADD into ISD::XOR for all vXi1 types during lowering. So the ISD::ADD pattern would never be able to match anyway. KADD is different, it adds the elements but also propagates a carry between them. This just a way of doing an add in k-register without bitcasting to the scalar domain. There's still no way to match the pattern, but at least its not obviously wrong. llvm-svn: 324861
* [X86] Add isel patterns for selecting masked SUBV_BROADCAST with bitcasts. ↵Craig Topper2018-02-051-0/+107
| | | | | | | | Remove combineBitcastForMaskedOp. Add test cases for the merge masked versions to make sure we have all those covered. llvm-svn: 324210
* [X86] Remove X86ISD::SHUF128 from combineBitcastForMaskedOp. Use isel ↵Craig Topper2018-02-051-11/+46
| | | | | | | | | | patterns instead. We always created X86ISD::SHUF128 with a 64-bit element type so we can use isel patterns to detect a bitconvert to 32-bit to handle masking. The test changes are because we also match the bitconvert even if there is no masking. This leads to unnecessary isel pattern, but it requires more multiclass hackery in tablegen to get rid of it. llvm-svn: 324205
* [X86] Use VMOVDQA64 for aligned vXi32 stores.Craig Topper2018-01-291-3/+12
| | | | | | I meant to do this with the unaligned stores in r322820, but looks like I missed it. llvm-svn: 323708
* [X86] Remove VPTESTM/VPTESTNM ISD opcodes. Use isel patterns matching cmpm ↵Craig Topper2018-01-281-31/+68
| | | | | | eq/ne with immallzeros. llvm-svn: 323612
* [X86] Add patterns for using masked vptestnmd for 256-bit vectors without VLX.Craig Topper2018-01-271-13/+24
| | | | | | We can widen the mask and extract it back down. llvm-svn: 323610
* [X86] Remove X86ISD::PCMPGTM/PCMPEQM and instead just use X86ISD::PCMPM and ↵Craig Topper2018-01-271-7/+12
| | | | | | | | | | pattern match the immediate value during isel. Legalization is still biased to turn LT compares in to GT by swapping operands to avoid needing extra isel patterns to commute. I'm hoping to remove TESTM/TESTNM next and this should simplify that by making EQ/NE more similar. llvm-svn: 323604
* [X86] Use vpternlog to implement vector not under AVX512.Craig Topper2018-01-261-0/+36
| | | | | | | | Previously we had to materialize all 1s in a register using vpternlog or pcmpeq and then xor with that. By using vpternlog directly we can do it in one operation. This is implemented using isel patterns, but we should maybe consider creating a generalized vpternlog combiner. llvm-svn: 323572
* [X86] Fix some inconsistencies in the itineraries and Sched for ↵Craig Topper2018-01-241-2/+2
| | | | | | | | (V)PEXTRW/(V)PINSRW The weirdest being that PEXTRWrr was tagged as a memory operation. llvm-svn: 323353
* [X86] Use ISD::SIGN_EXTEND instead of X86ISD::VSEXT for mask to xmm/ymm/zmm ↵Craig Topper2018-01-241-1/+11
| | | | | | | | | | | | | | conversion There are a couple tricky things with this patch. I had to add an override of isVectorLoadExtDesirable to stop DAG combine from combining sign_extend with loads after legalization since we legalize sextload using a load+sign_extend. Overriding this hook actually prevents a lot sextloads from being created in the first place. I also had to add isel patterns because DAG combine blindly combines sign_extend+truncate to a smaller sign_extend which defeats what legalization was trying to do. Differential Revision: https://reviews.llvm.org/D42407 llvm-svn: 323301
* [X86] Move 'Int_' to the end of the name of the VCOMISS/VUCOMISS and ↵Craig Topper2018-01-231-13/+13
| | | | | | | | instructions to get them picked up by the scheduler model regexs. All other intrinsic instructions put the _Int on the end. This make these instructions consistent and gets the prefix instregexs in the scheduler models to pick them up. llvm-svn: 323261
* [X86] Various vXi1 insertion improvements.Craig Topper2018-01-231-12/+5
| | | | | | Add missing patterns for inserting v1i1 into a zero vector. Use insert_subvector to zero upper bits before inserting an element into a vXi1 vector. Replace kshift based isel pattern with insert_subvector based pattern now that code that caused the pattern has been fixed to emit insert_subvector. llvm-svn: 323173
* Separate ExecutionDepsFix into 4 parts:Marina Yatsina2018-01-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains). 2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings. 3. BreakFalseDeps - Breaks false dependencies. 4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks. This also included the following changes to ExcecutionDepsFix original logic: 1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class. 2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class. Additional changes in affected files: 1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate. 2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate. Additional refactoring changes will follow. This commit is (almost) NFC. The only functional change is that now BreakFalseDeps will break dependency for all register classes. Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet. In a future commit several instructions (and tests) will be added. This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40333 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40330 Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275 llvm-svn: 323087
* [X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for ↵Craig Topper2018-01-181-15/+21
| | | | | | | | consistency with loads. Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads. llvm-svn: 322820
* [X86] Remove isel patterns for using unmasked vmovdqa32/vmovdqu32 for ↵Craig Topper2018-01-181-9/+10
| | | | | | | | integer vector loads. These patterns were just looking for a vXi64 bitcasted to vXi32, but there is no advantage to using vmovdqa32 over vmovdqa64. llvm-svn: 322819
* [X86] Add missing predicates for VRNDSCALES{D,S}{m,r}Clement Courbet2018-01-151-1/+1
| | | | | | | | | | | | Summary: This is similar to https://reviews.llvm.org/D41983. Reviewers: gchatelet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42069 llvm-svn: 322486
* [X86] Fix missing predicates HasAVX512 Predicates in avx512_sqrt_scalar.Clement Courbet2018-01-151-38/+39
| | | | | | | | | | | | | | Summary: For example, VSQRTSDZr and VSQRTSSZr were missing the predicate. Also fix braces indentation and braces for consistency. Reviewers: craig.topper, RKSimon Suscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41983 llvm-svn: 322478
* [X86] Use ISD::TRUNCATE instead of X86ISD::VTRUNC when input and output ↵Craig Topper2018-01-141-48/+56
| | | | | | types have the same number of elements. llvm-svn: 322455
* [X86] Improve legalization of vXi16/vXi8 selects.Craig Topper2018-01-141-0/+8
| | | | | | | | Extend vXi1 conditions of vXi8/vXi16 selects even before type legalization gets a chance to split wide vectors. Previously we would only extend 128 and 256 bit vectors. But if we start with a 512 bit vector or wider that needs to be split we wouldn't extend until after the split had taken place. By extending early we improve the results of type legalization. Don't widen condition of 128/256 bit vXi16/vXi8 selects when we have BWI but not VLX. We can still use a mask register by widening the select to 512-bits instead. This is similar to what we do for compares already. llvm-svn: 322450
* [X86] Remove unused isel pattern for zero extend from v16i1/v8i1 to ↵Craig Topper2018-01-121-5/+0
| | | | | | | | v16i32/v8i64. We have custom lowering on vzext that produces a vselect and a build vector. So zext never gets to isel. llvm-svn: 322381
* [X86] Optimize v2i32/v2f32 scatters.Craig Topper2018-01-111-6/+8
| | | | | | | | If the index is v2i64 we can use the scatter instruction that has v4i32/v4f32 data register, v2i64 index, and v2i1 mask. Similar was already done for gather. Implement custom widening for v2i32 data to remove the code that reverses type legalization during lowering. llvm-svn: 322254
* [X86] Remove unnecessary isel pattern that is a combination of two other ↵Craig Topper2018-01-091-2/+0
| | | | | | | | | | | | | | | patterns. The pattern was this def : Pat<(i32 (zext (i8 (bitconvert (v8i1 VK8:$src))))), (MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit))>, Requires<[NoDQI]>; but if you just let (i32 (zext X)) match byte itself you'll get MOVZX32rr8. And if you let (i8 (bitconvert (v8i1 VK8:$src))) match by itself you'll get (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit). So we can just let isel do the two patterns naturally. llvm-svn: 322049
* [X86] Replace CVT2MASK ISD opcode with PCMPGTM compared to zero.Craig Topper2018-01-081-2/+2
| | | | | | CVT2MASK is just checking the sign bit which can be represented with a comparison with zero. llvm-svn: 321985
* [X86] Add patterns to allow 512-bit BWI compare instructions to be used for ↵Craig Topper2018-01-081-0/+26
| | | | | | 128/256-bit compares when VLX is not available. llvm-svn: 321984
* [X86] Make v2i1 and v4i1 legal types without VLXCraig Topper2018-01-071-31/+69
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967
* [X86] Remove memory forms of EVEX encoded vcvttss2si/vcvttsd2si from asm ↵Craig Topper2018-01-061-7/+20
| | | | | | | | matcher table. This is also needed to fix PR35837. llvm-svn: 321946
* [X86] Add load folding pattern to EVEX vcvttss2si/vcvtsd2si.Craig Topper2018-01-061-6/+7
| | | | llvm-svn: 321945
* [X86] Remove an unnecessary VCVTTSD2SIrrb/VCVTSS2SIrrb instruction with no ↵Craig Topper2018-01-061-27/+23
| | | | | | | | isel pattern that only existed for the assembler. Use VCVTTSD2SIrrb_Int instead. For consistency use the _Int version of VCVTTSD2SIrr_Int and VCVTTSD2SIrm_Int for the assembler as well. llvm-svn: 321944
* [X86] Remove memory forms of EVEX encoded vcvtsd2si/vcvtss2si from the ↵Craig Topper2018-01-061-5/+17
| | | | | | | | | | assembler matcher table We should always prefer the VEX encoded version of these instructions. There is no advantage to the EVEX version. Fixes PR35837. llvm-svn: 321939
* [X86] Rename the EVEX encoded GFNI instructions to start with a 'V'. NFCCraig Topper2018-01-061-8/+8
| | | | | | This makes the names consistent with the mnemonics like every other instruction. llvm-svn: 321931
* [X86] Add vcvtsd2sil/vcvtsd2siq etc. InstAliases to the EVEX-encoded ↵Craig Topper2018-01-051-10/+18
| | | | | | | | instructions. This matches their VEX equivalents. llvm-svn: 321912
* [X86] Add InstAliases for 'vmovd' with GR64 registers to select EVEX encoded ↵Craig Topper2018-01-051-0/+6
| | | | | | | | | | instructions as well. Without this we allow "vmovd %rax, %xmm0", but not "vmovd %rax, %xmm16" This exists due to continue a silly bug where really old versions of the GNU assembler required movd instead of movq on these instructions. This compatibility hack then crept forward to avx version too, but we didn't propagate it to avx512. llvm-svn: 321903
* [X86] Add missing NoVLX predicate around some patterns that use zmm ↵Craig Topper2018-01-011-1/+1
| | | | | | registers to implement 128/256-bit operations without VLX. llvm-svn: 321613
* [X86] Add patterns for using zmm registers for v8i32/v8f32 vselect with the ↵Craig Topper2018-01-011-19/+24
| | | | | | | | false input being zero. We can use zmm move with zero masking for this. We already had patterns for using a masked move, but we didn't check for the zero masking case separately. llvm-svn: 321612
* [X86] Prevent combining (v8i1 (bitconvert (i8 load)))->(v8i1 load) if we ↵Craig Topper2017-12-311-0/+2
| | | | | | | | don't have DQI. We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns. llvm-svn: 321598
* [X86] Remove patterns for load/store of vXi with bitcasts to/from integer.Craig Topper2017-12-311-19/+0
| | | | | | This is better handled by a DAG combine if its not already being done. No lit tests fail from the removal of these patterns. llvm-svn: 321597
* [X86] Remove AND32ri8 from pattern for v1i1 load.Craig Topper2017-12-311-1/+1
| | | | | | I don't think anything would actually expect the other bits to be zero. llvm-svn: 321596
* [X86] Remove isel patterns for kshifts with types that don't support kshift ↵Craig Topper2017-12-301-17/+0
| | | | | | | | natively. We should only be creating natively supported kshifts now. llvm-svn: 321577
* [X86] Custom legalize vXi1 extract_subvector with KSHIFTR.Craig Topper2017-12-301-43/+0
| | | | | | | | This allows us to remove some isel patterns. This is mostly NFC, but we now use KSHIFTB instead of KSHIFTW with DQI. llvm-svn: 321576
* [X86] Remove unnecessary patterns for sign extending vXi1 without VLX.Craig Topper2017-12-281-16/+0
| | | | | | The custom lowering already widens the result type to 512-bits if VLX isn't supported. llvm-svn: 321533
OpenPOWER on IntegriCloud