summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrAVX512.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [AVX512] Use alignedstore256 in a pattern that's emitting a 256-bit movaps ↵Craig Topper2017-08-191-2/+2
| | | | | | from an extract subvector operation. llvm-svn: 311263
* [AVX512] Don't switch unmasked subvector insert/extract instructions when ↵Craig Topper2017-08-171-36/+91
| | | | | | | | | | | | | | AVX512DQI is enabled. There's no reason to switch instructions with and without DQI. It just creates extra isel patterns and test divergences. There is however value in enabling the masked version of the instructions with DQI. This required introducing some new multiclasses to enabling this splitting. Differential Revision: https://reviews.llvm.org/D36661 llvm-svn: 311091
* [X86] Remove patterns for PALIGNR with non-vXi8 types.Craig Topper2017-08-171-18/+0
| | | | llvm-svn: 311058
* [X86] Put multiclass closer to its use and simplify slightly. NFCCraig Topper2017-08-161-10/+11
| | | | llvm-svn: 311055
* [AVX512] Make the itinerary parameter actually pass through the the ↵Craig Topper2017-08-141-2/+2
| | | | | | | | | | | | | | | | AVX512_maskable_common multiclass Summary: This looks to have been disconnected about 3 years ago in r219358. Reviewers: gadi.haber, RKSimon, zvi Reviewed By: gadi.haber Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36658 llvm-svn: 310844
* [AVX-512] Add hasSideEffects = 0 to the 8-bit and 16-bit register broadcasts.Craig Topper2017-08-141-1/+1
| | | | llvm-svn: 310813
* [X86] Remove unused argument from the vextract_for_size multiclass. NFCCraig Topper2017-08-141-14/+7
| | | | llvm-svn: 310812
* [AVX512] Remove comment I should have removed in r310808. NFCCraig Topper2017-08-141-3/+0
| | | | llvm-svn: 310811
* [AVX512] Simplify the instruction defintion for VEXTRACT. NFCICraig Topper2017-08-141-33/+16
| | | | | | The comment about why we couldn't use avx512_maskable appears to have been incorrect. llvm-svn: 310808
* [X86][AVX512] Choose correct registers in vpbroadcastb/wGuy Blank2017-08-091-12/+43
| | | | | | | | | | | | Fixes the vpbroadcastb/w instructions which use GPRs as source operands, to use the correct registers. The full GPR should be used, and not the subregister, as it happens before the patch. Fixes pr33795 Differential Revision: https://reviews.llvm.org/D36479 llvm-svn: 310498
* [AVX-512] Don't use unmasked VMOVDQU8/16 for 8-bit or 16-bit element stores ↵Craig Topper2017-08-011-13/+29
| | | | | | | | | | | | | | even when BWI instructions are supported. Always use VMOVDQA32/VMOVDQU32. We were already using the 32 bit element opcode if BWI isn't enabled, but there's no reason to change opcode if we have BWI. We will still use the 8/16 opcodes for masked stores though. This allows us to use the aligned opcode when we can which makes our test output more consistent between different modes. It also reduces the number of isel patterns we need. This is a slight inconsistency with loads which default to 64 bit element opcodes. I'll probably rectify that in a future patch. Differential Revision: https://reviews.llvm.org/D35978 llvm-svn: 309693
* [AVX-512] Remove patterns that select vmovdqu8/16 for unmasked loads. Prefer ↵Craig Topper2017-07-311-11/+18
| | | | | | | | | | | | | | vmovdqa64/vmovdqu64 instead. These were taking priority over the aligned load instructions since there is no vmovda8/16. I don't think there is really a difference between aligned and unaligned on newer cpus so I don't think it matters which instructions we use. But with this change we reduce the size of the isel table a little and we allow the aligned information to pass through to the evex->vec pass and produce the same output has avx/avx2 in some cases. I also generally dislike patterns rooted in a bitcast which these were. Differential Revision: https://reviews.llvm.org/D35977 llvm-svn: 309589
* [X86][AVX512] Add masked MOVS[S|D] patternsGuy Blank2017-07-311-0/+16
| | | | | | | | | | | Added patterns to recognize AND 1 on the mask of a scalar masked move is not needed since only the lower bit is relevant for the instruction. Differential Revision: https://reviews.llvm.org/D35897 llvm-svn: 309546
* [X86][AVX512] Add patterns for masked AVX512 floating point compare ↵Ayman Musa2017-07-241-1/+52
| | | | | | | | | | | instructions that were missing. patterns were missed by D33188. Adding for completion. +Updating test. Differential Revesion: https://reviews.llvm.org/D35179 llvm-svn: 308868
* [X86] Add some hasSideEffects=0 flags.Craig Topper2017-07-231-0/+1
| | | | llvm-svn: 308835
* [AVX-512] Fix a bug that prevented some non-temporal loads from using the ↵Craig Topper2017-07-211-3/+3
| | | | | | | | movntdqa instruction. The bitconverts here had an input type of 128-bits and an output type of 256 bits. The input type should also have been 256 bits. llvm-svn: 308702
* [X86][AVX512] Add lowering of vXi32/vXi64 ISD::ROTL/ISD::ROTRSimon Pilgrim2017-07-171-0/+103
| | | | | | | | Add support for lowering to ISD::ROTL/ISD::ROTR, including rotate by immediate Differential Revision: https://reviews.llvm.org/D35463 llvm-svn: 308177
* Strip trailing whitespace. NFCISimon Pilgrim2017-07-161-64/+64
| | | | llvm-svn: 308143
* Recommitting rL305465 after fixing bug in TableGen in rL306251 & rL306371Ayman Musa2017-06-271-25/+620
| | | | | | | | | | | | | | | [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 306402
* [X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc.Ahmed Bougacha2017-06-261-12/+12
| | | | | | | | | | | | The non-AVX-512 behavior was changed in r248266 to match N1778 (C bindings for IEEE-754 (2008)), which defined the four functions to not raise the inexact exception ("rint" is still defined as raising it). Update the AVX-512 lowering of these functions to match that: it should not be different. llvm-svn: 306299
* AVX-512: Lowering Masked Gather intrinsic - fixed a bugElena Demikhovsky2017-06-221-1/+1
| | | | | | | | | | | | Masked gather for vector length 2 is lowered incorrectly for element type i32. The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD. The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal. In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well. Differential revision: https://reviews.llvm.org/D34343 llvm-svn: 305987
* Revert r305465: [X86][AVX512] Improve lowering of AVX512 compare intrinsics ↵Simon Pilgrim2017-06-151-620/+25
| | | | | | | | (remove redundant shift left+right instructions). This is causing windows buildbot failures llvm-svn: 305470
* [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove ↵Ayman Musa2017-06-151-25/+620
| | | | | | | | | | | | | | | redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 305465
* [AVX-512] Mark masked VPCMP instructions as commutable.Craig Topper2017-06-131-0/+1
| | | | llvm-svn: 305276
* [AVX-512] Mark masked version of vpcmpeq as being commutable.Craig Topper2017-06-131-0/+1
| | | | llvm-svn: 305275
* [X86] Adding FoldGenRegForm helper field (for memory folding tables tableGen ↵Ayman Musa2017-05-281-35/+82
| | | | | | | | | | | | | | | | | | | | | | backend) to X86Inst class and set its value for the relevant instructions. Some register-register instructions can be encoded in 2 different ways, this happens when 2 register operands can be folded (separately). For example if we look at the MOV8rr and MOV8rr_REV, both instructions perform exactly the same operation, but are encoded differently. Here is the relevant information about these instructions from Intel's 64-ia-32-architectures-software-developer-manual: Opcode Instruction Op/En 64-Bit Mode Compat/Leg Mode Description 8A /r MOV r8,r/m8 RM Valid Valid Move r/m8 to r8. 88 /r MOV r/m8,r8 MR Valid Valid Move r8 to r/m8. Here we can see that in order to enable the folding of the output and input registers, we had to define 2 "encodings", and as a result we got 2 move 8-bit register-register instructions. In the X86 backend, we define both of these instructions, usually one has a regular name (MOV8rr) while the other has "_REV" suffix (MOV8rr_REV), must be marked with isCodeGenOnly flag and is not emitted from CodeGen. Automatically generating the memory folding tables relies on matching encodings of instructions, but in these cases where we want to map both memory forms of the mov 8-bit (MOV8rm & MOV8mr) to MOV8rr (not to MOV8rr_REV) we have to somehow point from the MOV8rr_REV to the "regular" appropriate instruction which in this case is MOV8rr. This field enable this "pointing" mechanism - which is used in the TableGen backend for generating memory folding tables. Differential Revision: https://reviews.llvm.org/D32683 llvm-svn: 304087
* [X86] Adding vpopcntd and vpopcntq instructionsOren Ben Simhon2017-05-251-0/+35
| | | | | | | | | AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858
* [X86][AVX512] Make i1 illegal in the CodeGenGuy Blank2017-05-191-81/+51
| | | | | | | | | | This patch defines the i1 type as illegal in the X86 backend for AVX512. For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended. This should produce better scalar code for i1 types since GPRs will be used instead of mask registers. Differential Revision: https://reviews.llvm.org/D32273 llvm-svn: 303421
* [X86][AVX512] Move v2i64/v4i64 VPABS lowering to tablegenSimon Pilgrim2017-05-061-0/+14
| | | | | | Extend NoVLX targets to use the 512-bit versions llvm-svn: 302359
* [X86][AVX512CDI] Move v2i64/v4i64 and v4i32/v8i32 VPLZCNT lowering to tablegenSimon Pilgrim2017-05-051-0/+25
| | | | | | Extend NoVLX targets to use the 512-bit versions llvm-svn: 302229
* [X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (LLVM)Simon Pilgrim2017-04-141-6/+3
| | | | | | | | | | MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics. Clang companion patch: D31766. Differential Revision: https://reviews.llvm.org/D31767 llvm-svn: 300325
* [X86] Added missing mayLoad/mayStore attributes to some X86 instructions.Ayman Musa2017-04-131-0/+2
| | | | | | | | | Throughout the effort of automatically generating the X86 memory folding tables these missing information were encountered. This is a preparation work for a future patch including the automation of these tables. Differential Revision: https://reviews.llvm.org/D31714 llvm-svn: 300190
* [AVX-512] Remove explicit KMOVWrk from isel patterns. COPY_TO_REGCLASS to ↵Craig Topper2017-03-291-8/+8
| | | | | | GR32 is enough. llvm-svn: 298985
* [AVX-512] Remove explicit KMOVWrk/KMOVWKr instructions from patterns where ↵Craig Topper2017-03-291-16/+12
| | | | | | | | we can just use COPY_TO_REGCLASS instead. This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together. llvm-svn: 298984
* [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registersCraig Topper2017-03-281-18/+61
| | | | | | | | | | | | | | | | We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register. This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8. I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa. Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition. This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI. Differential Revision: https://reviews.llvm.org/D30968 llvm-svn: 298928
* [X86][AVX512F] Fix reg class for VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrkSimon Pilgrim2017-03-261-11/+10
| | | | | | | | | | Fixed -verify-machineinstrs errors in fast-isel-select-sse.ll (one of many in PR27481) The VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk instructions were assuming both source registers were V128X when the second is actually supposed to be FR32X/FR64X Differential Revision: https://reviews.llvm.org/D31200 llvm-svn: 298805
* [X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instructionMichael Zuckerman2017-03-231-11/+19
| | | | | | | | | | | Up until now, vpmovm2 instruction described its destination operand size by the source operand size. This patch adds new pattern for the vpmovm2 instruction. The node describes new expansion of the destination (from {128|256} to 512). Differential Revision: https://reviews.llvm.org/D30654 llvm-svn: 298586
* [AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time ↵Craig Topper2017-03-191-24/+0
| | | | | | | | | | | | | | | | | | | instead of isel Summary: Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust. This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely. Reviewers: zvi, delena, RKSimon, igorb Reviewed By: igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31056 llvm-svn: 298228
* [X86] Cleanup the AddedComplexity values on move immediate instructions. NFCCraig Topper2017-03-171-3/+5
| | | | | | This makes the values a little more consistent between similar instruction and reduces the values some. This results in better grouping in the isel table saving a few bytes. llvm-svn: 298043
* [SelectionDAG] Add a signed integer absolute ISD nodeSimon Pilgrim2017-03-141-60/+1
| | | | | | | | | | | | Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780
* [AVX-512] Use iPTR instead of i64 in patterns for ↵Craig Topper2017-03-141-6/+6
| | | | | | extract_subvector/insert_subvector index. llvm-svn: 297707
* [AVX-512] Use sse_loadf32/f64 for vcvtss2sd and vcvtsd2ss intrinsic patterns.Craig Topper2017-03-131-3/+2
| | | | llvm-svn: 297600
* [AVX-512] Use sse_load_f64/f32 in VCVTSS2SI/VCVTSD2SI patterns.Craig Topper2017-03-131-10/+10
| | | | llvm-svn: 297599
* [AVX-512] Remove unused field in X86VectorVTInfo tablegen class.Craig Topper2017-03-121-7/+0
| | | | llvm-svn: 297572
* [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG opsSimon Pilgrim2017-03-051-55/+55
| | | | | | | | | | | | As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT. We're missing a couple of shuffle combines that will be added in a future patch for review. Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases. Differential Revision: https://reviews.llvm.org/D30549 llvm-svn: 296985
* [AVX-512] Fix execution domain for scalar commutable min/max instructions.Craig Topper2017-02-261-1/+1
| | | | llvm-svn: 296292
* [AVX-512] Fix execution domain for vmovhpd/lpd/hps/lps.Craig Topper2017-02-261-0/+1
| | | | llvm-svn: 296291
* [AVX-512] Fix the execution domain for AVX-512 integer broadcasts.Craig Topper2017-02-261-0/+1
| | | | llvm-svn: 296290
* [AVX-512] Disable the redundant patterns in the VPBROADCASTBr_Alt and ↵Craig Topper2017-02-261-14/+16
| | | | | | VPBROADCASTWr_Alt instructions. NFC llvm-svn: 296289
* [AVX-512] Fix execution domain for VPMADD52 instructions.Craig Topper2017-02-261-0/+2
| | | | llvm-svn: 296288
OpenPOWER on IntegriCloud