summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Strengthen some of the SD type constraints in X86InstrFragmentsSIMD.tdCraig Topper2017-09-181-37/+32
| | | | | | | | This effects the vector shift and rotates as well as some of the vector compares. The changes to the shifts by immediates allows a few hundred bytes to be removed by removing type checks for the size of the immediate containing the shift/rotate amount. llvm-svn: 313512
* [X86] Mark the FMA nodes as commutable so tablegen will auto generate the ↵Craig Topper2017-09-041-17/+17
| | | | | | | | | | | | | | patterns. This uses the capability introduced in r312464 to make SDNode patterns commutable on the first two operands. This allows us to remove some of the extra FMA patterns that have to put loads and mask operands in different places to cover all cases. This even includes patterns that were missing to support match a load in the first operand with FMA4. Non-broadcast loads with masking for AVX512. I believe this is causing us to generate some duplicate patterns because tablegen's isomorphism checks don't catch isomorphism between the patterns as written in the td. It only detects isomorphism in the commuted variants it tries to create. The the unmasked 231 and 132 memory forms are isomorphic as written in the td file so we end up keeping both. I think we precommute the 132 pattern to fix this. We also need a follow up patch to go back to the legacy FMA3 instructions and add patterns to the 231 and 132 forms which we currently don't have. llvm-svn: 312469
* [X86] Remove X86ISD::FMADD in favor ISD::FMACraig Topper2017-08-231-1/+1
| | | | | | | | | | There's no reason to have a target specific node with the same semantics as a target independent opcode. This should simplify D36335 so that it doesn't need to touch X86ISelDAGToDAG.cpp Differential Revision: https://reviews.llvm.org/D36983 llvm-svn: 311568
* [X86] Merge all of the vecload and alignedload predicates into single ↵Craig Topper2017-08-191-48/+29
| | | | | | | | | | predicates. We can load the memory VT and check for natural alignment. This also adds a new preferNonTemporalLoad helper that checks the correct subtarget feature based on the load size. This shrinks the isel table by at least 5000 bytes by allowing more reordering and combining to occur. llvm-svn: 311266
* [X86] Converge alignedstore/alignedstore256/alignedstore512 to a single ↵Craig Topper2017-08-191-14/+3
| | | | | | | | predicate. We can read the memoryVT and get its store size directly from the SDNode to check its alignment. llvm-svn: 311265
* [X86] Remove memopmmx pattern fragmentCraig Topper2017-08-171-9/+0
| | | | | | | | | | | | | | Summary: Just like the FIXME says, there is no alignment requirement for MMX. Reviewers: RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36815 llvm-svn: 311090
* [X86] Remove patterns for PALIGNR with non-vXi8 types.Craig Topper2017-08-171-1/+5
| | | | llvm-svn: 311058
* [X86] Remove unused pattern fragment that referenced MVT::i1. NFCCraig Topper2017-08-131-5/+0
| | | | llvm-svn: 310799
* [X86] Prevent selecting masked aligned load instructions if the load should ↵Craig Topper2017-07-261-3/+6
| | | | | | | | | | | | | | | | be non-temporal Summary: The aligned load predicates don't suppress themselves if the load is non-temporal the way the unaligned predicates do. For the most part this isn't a problem because the aligned predicates are mostly used for instructions that only load the the non-temporal loads have priority over those. The exception are masked loads. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35712 llvm-svn: 309079
* [X86][SSE] Remove unused memopfsf32_128/memopfsf64_128 scalar memopsSimon Pilgrim2017-06-251-10/+0
| | | | | | The 'scalar' simd bitops were dropped a while ago llvm-svn: 306248
* Strip trailing whitespace. NFCI.Simon Pilgrim2017-06-251-1/+1
| | | | llvm-svn: 306247
* AVX-512: Lowering Masked Gather intrinsic - fixed a bugElena Demikhovsky2017-06-221-0/+12
| | | | | | | | | | | | Masked gather for vector length 2 is lowered incorrectly for element type i32. The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD. The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal. In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well. Differential revision: https://reviews.llvm.org/D34343 llvm-svn: 305987
* [X86][SSE] Change memop fragment to inherit from vec128load with local ↵Simon Pilgrim2017-06-121-8/+4
| | | | | | | | | | | | alignment controls First possible step towards merging SSE/AVX memory folding pattern fragments. Also allows us to remove the duplicate non-temporal load logic. Differential Revision: https://reviews.llvm.org/D33902 llvm-svn: 305184
* [X86][SSE41] Non-temporal loads shouldn't be folded if it can be avoided ↵Simon Pilgrim2017-06-051-2/+6
| | | | | | | | | | (PR32743) Missed SSE41 non-temporal load case in previous commit Differential Revision: https://reviews.llvm.org/D33728 llvm-svn: 304722
* [X86][SSE] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)Simon Pilgrim2017-06-051-9/+24
| | | | | | Differential Revision: https://reviews.llvm.org/D33728 llvm-svn: 304717
* [X86][AVX512] Make i1 illegal in the CodeGenGuy Blank2017-05-191-3/+3
| | | | | | | | | | This patch defines the i1 type as illegal in the X86 backend for AVX512. For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended. This should produce better scalar code for i1 types since GPRs will be used instead of mask registers. Differential Revision: https://reviews.llvm.org/D32273 llvm-svn: 303421
* [SelectionDAG] Add a signed integer absolute ISD nodeSimon Pilgrim2017-03-141-1/+0
| | | | | | | | | | | | Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780
* [X86][MMX] Fix folding of shift value loads to cover whole 64-bitsSimon Pilgrim2017-03-131-2/+0
| | | | | | | | | | | | rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value. This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source. Found during fuzz testing. Differential Revision: https://reviews.llvm.org/D30833 llvm-svn: 297667
* [X86] Remove unused SDTypeProfile. NFCCraig Topper2017-03-121-2/+0
| | | | llvm-svn: 297594
* [X86] Lower SSE/AVX cmpps/pd intrinsics directly to X86ISD::CMPP SDNodes.Craig Topper2017-03-121-3/+3
| | | | | | This allows us to remove a duplicate set of patterns. llvm-svn: 297593
* [AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD ↵Craig Topper2017-02-241-2/+8
| | | | | | opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC llvm-svn: 296094
* [AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions ↵Craig Topper2017-02-221-0/+2
| | | | | | | | | | | | when available This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION. I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others. Differential revision: https://reviews.llvm.org/D30186 llvm-svn: 295810
* [X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector.Igor Breger2017-02-201-3/+0
| | | | | | | | | | | | Its more profitable to go through memory (1 cycles throughput) than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index. IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. Removing the VINSERT node, we don't need it any more. Differential Revision: https://reviews.llvm.org/D29690 llvm-svn: 295660
* [X86] Tighten up some of the SDNode type constraints.Craig Topper2017-02-191-26/+43
| | | | llvm-svn: 295588
* [X86][XOP] Reduce the size of a multiclass by moving more stuff to ↵Craig Topper2017-02-181-0/+1
| | | | | | | | parameters instead of doing 128-bit and 256-bit simultaneously. This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions. llvm-svn: 295579
* [AVX-512] Don't reuse VSHLI/VSRLI for mask register shifts. VSHLI/VSHRI ↵Craig Topper2017-01-301-0/+9
| | | | | | shift within elements while KSHIFT moves whole elements. llvm-svn: 293448
* Added a template for building target specific memory node in DAG.Elena Demikhovsky2016-12-211-0/+72
| | | | | | | | | | I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp. There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern. Differential Revision: https://reviews.llvm.org/D27899 llvm-svn: 290250
* [AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsicsCraig Topper2016-12-091-0/+12
| | | | | | | | | | | | | | | | | | | | | Summary: Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection. This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits. We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version. We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that. This fixes PR30913. Reviewers: delena, zvi, v_klochkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27144 llvm-svn: 289190
* [X86] Remove scalar logical op alias instructions. Just use ↵Craig Topper2016-12-061-9/+0
| | | | | | | | | | | | | | | | | | | COPY_FROM/TO_REGCLASS and the normal packed instructions instead Summary: This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently. I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel. I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available. Reviewers: spatel, delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27401 llvm-svn: 288771
* [X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI Simon Pilgrim2016-11-241-13/+12
| | | | | | | | | | Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions. This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types. Differential Revision: https://reviews.llvm.org/D27072 llvm-svn: 287870
* [X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions.Ayman Musa2016-11-131-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D26022 llvm-svn: 286758
* [AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32Craig Topper2016-11-091-0/+3
| | | | | | | | | | | | This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions. If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result. It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result. Differential Revision: https://reviews.llvm.org/D26331 llvm-svn: 286345
* Expandload and Compressstore intrinsicsElena Demikhovsky2016-11-031-11/+6
| | | | | | | | 2 new intrinsics covering AVX-512 compress/expand functionality. This implementation includes syntax, DAG builder, operation lowering and tests. Does not include: handling of illegal data types, codegen prepare pass and the cost model. llvm-svn: 285876
* [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32Simon Pilgrim2016-10-181-0/+3
| | | | | | | | | | | | As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459
* DAG: Setting Masked-Expand-Load as a variant of Masked-Load nodeElena Demikhovsky2016-10-091-13/+19
| | | | | | | | | | Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector. Right now, the node is used in intrinsics for VEXPAND* instructions. The work is done towards implementation of masked.expandload and masked.compressstore intrinsics. Differential Revision: https://reviews.llvm.org/D25322 llvm-svn: 283694
* [X86] Remove unused PatFrags. NFCCraig Topper2016-10-071-5/+0
| | | | llvm-svn: 283523
* Target: Remove unused patterns and transforms. NFC.Peter Collingbourne2016-10-071-4/+0
| | | | llvm-svn: 283515
* [X86][avx512] Fix bug in masked compress store.Ayman Musa2016-09-261-1/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D23984 llvm-svn: 282381
* [AVX-512] Split scalar version of X86ISD::SELECT into a separate opcode ↵Craig Topper2016-09-241-1/+1
| | | | | | because isel is not robust with multiple type profiles for the same opcode. llvm-svn: 282340
* [AVX-512] Remove the patterns for selecting scalar VCOMI/VUCOMI instructions ↵Craig Topper2016-09-241-2/+0
| | | | | | with SAE as there is no way to create the pattern. llvm-svn: 282339
* [AVX-512] Split X86ISD::VFPROUND and X86ISD::VFPEXT into separate opcodes ↵Craig Topper2016-09-231-15/+4
| | | | | | | | for each type constraint. This revealed that scalar intrinsics could create nodes with a rounding mode of FROUND_CUR_DIRECTION, but the patterns didn't check for it. It just worked because isel doesn't check operand count and we had a pattern without the rounding mode argument at all. llvm-svn: 282231
* [AVX-512] Add separate ISD opcodes for each form of CVT instructions. Don't ↵Craig Topper2016-09-231-14/+14
| | | | | | reuse non-X86 ISD opcodes with extra X86 specific arguments. llvm-svn: 282230
* [AVX-512] Use different ISD opcodes for some of the scalar intrinsic ↵Craig Topper2016-09-231-7/+7
| | | | | | lowering. Isel is not very robust against using the same ISD opcode with different number of operands so its better to separate. llvm-svn: 282229
* [AVX-512] Split the 3 different usages of the X86ISD::FSETCC opcode into 3 ↵Craig Topper2016-09-211-2/+2
| | | | | | | | | | | | different opcodes. It turns out isel is really not robust against having different type profiles for the same opcode. It turns out that if you put an illegal rounding mode(i.e. not CUR_DIRECTION or NO_EXC) on a comiss intrinsic we would generate the FSETCC form with the rounding mode added, but then pattern match to an instruction with ROUND_CUR_DIRECTION. We can probably get away with just one FSETCCM opcode that always contains the rounding mode and explicitly put ROUND_CUR_DIRECTION in the pattern, but I'll leave that for future work. With this change the clang tests for the comiss intrinsics that used an incorrect rounding mode of 3 properly fail isel instead of silently doing the wrong thing. Those clang tests will be fixed in a follow up commit and I also plan to add rounding mode checking to clang. llvm-svn: 282055
* [AVX-512] Don't add an additional rounding mode operand to the avx512 ↵Craig Topper2016-09-211-3/+2
| | | | | | | | | | vcvtps2ph intrinsic lowering. There was no way to control its value so it was always FROUND_CURRENT making it unnecessary. The true rounding mode is encoded in the immediate operand of the instruction. This also removes the pattern from the rb form of the instructions since there is no way to specify the FROUND_NO_EXC rounding mode it required. llvm-svn: 282052
* [AVX-512] Don't lower avx512 vcvtps2ph/vcvtph2ps nodes to ↵Craig Topper2016-09-211-3/+3
| | | | | | ISD::FP16_TO_FP/ISD::FP_TO_FP16 with an extra x86 specific rounding mode operand. We should use a target specific ISD opcode. llvm-svn: 282046
* AVX-512: Fix for PR28175 - Scalar code optimization.Elena Demikhovsky2016-09-131-0/+5
| | | | | | | | | Optimized (truncate (assertzext x) to i1) and anyext i1 to i8/16/32. Optimization of this patterns is a one more step towards i1 optimization on AVX-512. Differential Revision: https://reviews.llvm.org/D24456 llvm-svn: 281302
* [AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast.Craig Topper2016-09-061-3/+5
| | | | | | We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type. llvm-svn: 280696
* [X86] Remove FsVMOVAPSrm/FsVMOVAPDrm/FsMOVAPSrm/FsMOVAPDrm. Due to their ↵Craig Topper2016-09-051-5/+0
| | | | | | | | | | placement in the td file they had lower precedence than (V)MOVSS/SD and could almost never be selected. The only way to select them was in AVX512 mode because EVEX VMOVSS/SD was below them and the patterns weren't qualified properly for AVX only. So if you happened to have an aligned FR32/FR64 load in AVX512 you could get a VEX encoded VMOVAPS/VMOVAPD. I tried to search back through history and it seems like these instructions were probably unselectable for at least 5 years, at least to the time the VEX versions were added. But I can't prove they ever were. llvm-svn: 280644
* [X86] Strengthen some SDNode type constraints.Craig Topper2016-09-021-3/+4
| | | | llvm-svn: 280463
OpenPOWER on IntegriCloud