summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
Commit message (Collapse)AuthorAgeFilesLines
...
* Recommit r344877 "[X86] Stop promoting integer loads to vXi64"Craig Topper2018-10-221-13/+41
| | | | | | | | | | | | | | | | | | | | | | | | I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965
* Revert r344877 "[X86] Stop promoting integer loads to vXi64"Craig Topper2018-10-221-41/+13
| | | | | | Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out. llvm-svn: 344921
* [X86] Stop promoting integer loads to vXi64Craig Topper2018-10-211-13/+41
| | | | | | | | | | | | | | | | | | | | | Summary: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast. I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size. I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344877
* [X86] Prevent non-temporal loads from folding into instructions by blocking ↵Craig Topper2018-10-101-30/+23
| | | | | | | | | | them in X86DAGToDAGISel::IsProfitableToFold rather than with a predicate. Remove tryFoldVecLoad since tryFoldLoad would call IsProfitableToFold and pick up the new check. This saves about 5K out of ~600K on the generated isel table. llvm-svn: 344189
* [X86] Change legacy SSE scalar fp to integer intrinsics to use specific ISD ↵Craig Topper2018-08-151-2/+8
| | | | | | | | | | | | opcodes instead of keeping as intrinsics. Unify SSE and AVX512 isel patterns. AVX512 added new versions of these intrinsics that take a rounding mode. If the rounding mode is 4 the new intrinsics are equivalent to the old intrinsics. The AVX512 intrinsics were being lowered to ISD opcodes, but the legacy SSE intrinsics were left as intrinsics. This resulted in the AVX512 instructions needing separate patterns for the ISD opcodes and the legacy SSE intrinsics. Now we convert SSE intrinsics and AVX512 intrinsics with rounding mode 4 to the same ISD opcode so we can share the isel patterns. llvm-svn: 339749
* [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types.Craig Topper2018-07-201-4/+6
| | | | | | | | | | Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction. Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining. Differential Revision: https://reviews.llvm.org/D49280 llvm-svn: 337590
* [X86] Remove patterns and ISD nodes for the old scalar FMA intrinsic lowering.Craig Topper2018-07-121-23/+0
| | | | | | We now use llvm.fma.f32/f64 or llvm.x86.fmadd.f32/f64 intrinsics that use scalar types rather than vector types. So we don't these special ISD nodes that operate on the lowest element of a vector. llvm-svn: 336883
* [X86] Remove X86ISD::MOVLPS and X86ISD::MOVLPD. NFCICraig Topper2018-07-101-3/+0
| | | | | | | | | | | | These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load. There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load. We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason. So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns. llvm-svn: 336728
* [X86] Remove dead SDNode object from X86InstrFragmentsSIMD.td. NFCCraig Topper2018-07-101-1/+0
| | | | | | It points to an opcode that doesn't exist. llvm-svn: 336720
* [X86] Remove FMA4 scalar intrinsics. Use llvm.fma intrinsic instead.Craig Topper2018-07-061-6/+0
| | | | | | | | The intrinsics can be implemented with a f32/f64 llvm.fma intrinsic and an insert into a zero vector. There are a couple regressions here due to SelectionDAG not being able to pull an fneg through an extract_vector_elt. I'm not super worried about this though as InstCombine should be able to do it before we get to SelectionDAG. llvm-svn: 336416
* [X86] Use PatFrag with hardcoded numbers for FROUND_NO_EXC/FROUND_CURRENT ↵Craig Topper2018-06-281-4/+2
| | | | | | | | | | instead of ImmLeafs with predicates where one of the two numbers was hardcoded. This more efficient for the isel table generator since we can use CheckChildInteger instead of MoveChild, CheckPredicate, MoveParent. This reduced the table size by 1-2K. I wish there was a way to share the values with X86BaseInfo.h and still use a PatFrag like this. These numbers are fixed by the X86 intrinsic spec going back many years and we should never need to change them. So we shouldn't waste table bytes to support sharing. llvm-svn: 335806
* [X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to iselCraig Topper2018-06-201-1/+0
| | | | | | | | | | I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code. I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering. Differential Revision: https://reviews.llvm.org/D43608 llvm-svn: 335173
* [X86] Pass the parent SDNode to X86DAGToDAGISel::selectScalarSSELoad to ↵Craig Topper2018-06-171-2/+2
| | | | | | | | | | simplify the hasSingleUseFromRoot handling. Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output. This fixed at least fma-scalar-memfold.ll to succed without the peephole pass. llvm-svn: 334908
* [X86] Reorder some type constraints to force things to be vectors and ↵Craig Topper2018-06-111-4/+4
| | | | | | | | integer/fp before forcing them to be the same size. This may be needed by another patch that I'm working on. It should have no effect on any of the generated outputs. llvm-svn: 334430
* [X86] Converge X86ISD::VPERMV3 and X86ISD::VPERMIV3 to a single opcode.Craig Topper2018-05-281-7/+0
| | | | | | | | | | These do the same thing with the first and second sources swapped. They previously came from separate intrinsics that specified different masking behavior. But we can cover that with isel patterns and a single node. This is a step towards reducing the number of intrinsics needed. A bunch of tests change because we are now biased to choosing VPERMT over VPERMI when there is nothing to signal that commuting is beneficial. llvm-svn: 333383
* [X86] Stop forcing X86VPermi2X node index operand to match destination type ↵Craig Topper2018-05-281-5/+3
| | | | | | | | | | to make masking pattern matching easier. Add extra patterns with bitcasts instead. This basically reverts r280696 in favor of using extra patterns as mentioned as an alternative in that commit message. For now I've only added the cases we have test cases for, but it should be easy to add more in the future. This will help to convert VPERMI2PS/VPERMT2PS intrinsics to use a single ISD node opcode. And hopefully allow some intrinsics to be removed. llvm-svn: 333365
* [X86] Make the STTNI flag intrinsics use the flags from pcmpestrm/pcmpistrm ↵Craig Topper2018-04-271-11/+0
| | | | | | | | | | | | | | | | | | | | | | | if the mask instrinsics are also used in the same basic block. Summary: Previously the flag intrinsics always used the index instructions even if a mask instruction also exists. To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction. Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs. I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions. Reviewers: chandlerc, RKSimon, spatel Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46202 llvm-svn: 331091
* [X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead ↵Craig Topper2018-03-081-4/+2
| | | | | | | | | | | | of vXi32. This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX. I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that. I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that. llvm-svn: 326991
* [X86] Lower extract_element from k-registers by bitcasting from v16i1 to i16 ↵Craig Topper2018-02-281-4/+0
| | | | | | | | and extending/truncating. This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns. llvm-svn: 326375
* [X86] Make masked pcmpeq commutable during isel so we can fold loads in ↵Craig Topper2018-02-181-0/+2
| | | | | | | | | | other operand to the shorter encoding. Previously we used the immediate encoding if the load was in operand 0 and the short encoding if the load was in operand 1. This added an insane number of bytes to the size of the isel table. I'm wondering if we should always use the immediate form during isel and change to the short form during emission. This would remove the need to pattern match every combination for both the immediate form and the short form during isel. We could do the same with vpcmpgt llvm-svn: 325456
* [X86] Add KADD X86ISD opcode instead of reusing ISD::ADD.Craig Topper2018-02-121-0/+2
| | | | | | | | ISD::ADD implies individual vector element addition with no carries between elements. But for a vXi1 type that would be the same as XOR. And we already turn ISD::ADD into ISD::XOR for all vXi1 types during lowering. So the ISD::ADD pattern would never be able to match anyway. KADD is different, it adds the elements but also propagates a carry between them. This just a way of doing an add in k-register without bitcasting to the scalar domain. There's still no way to match the pattern, but at least its not obviously wrong. llvm-svn: 324861
* [X86] Add isel patterns for selecting masked SUBV_BROADCAST with bitcasts. ↵Craig Topper2018-02-051-0/+1
| | | | | | | | Remove combineBitcastForMaskedOp. Add test cases for the merge masked versions to make sure we have all those covered. llvm-svn: 324210
* [X86] Remove VPTESTM/VPTESTNM ISD opcodes. Use isel patterns matching cmpm ↵Craig Topper2018-01-281-6/+0
| | | | | | eq/ne with immallzeros. llvm-svn: 323612
* [X86] Remove X86ISD::PCMPGTM/PCMPEQM and instead just use X86ISD::PCMPM and ↵Craig Topper2018-01-271-6/+0
| | | | | | | | | | pattern match the immediate value during isel. Legalization is still biased to turn LT compares in to GT by swapping operands to avoid needing extra isel patterns to commute. I'm hoping to remove TESTM/TESTNM next and this should simplify that by making EQ/NE more similar. llvm-svn: 323604
* [X86] Replace CVT2MASK ISD opcode with PCMPGTM compared to zero.Craig Topper2018-01-081-2/+0
| | | | | | CVT2MASK is just checking the sign bit which can be represented with a comparison with zero. llvm-svn: 321985
* [X86] Use extract_vector_elt instead of X86ISD::VEXTRACT for isel of vXi1 ↵Craig Topper2017-12-171-3/+4
| | | | | | extractions. llvm-svn: 320937
* [x86][icelake]GFNICoby Tayree2017-11-261-0/+5
| | | | | | | | | | galois field arithmetic (GF(2^8)) insns: gf2p8affineinvqb gf2p8affineqb gf2p8mulb Differential Revision: https://reviews.llvm.org/D40373 llvm-svn: 318993
* [X86] Add separate intrinsics for scalar FMA4 instructions.Craig Topper2017-11-251-0/+6
| | | | | | | | | | | | | | | | | | | | | | | Summary: These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits. I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512. I think there are still some missed negate folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before. I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics. fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39851 llvm-svn: 318984
* [X86][SSE] Use (V)PHMINPOSUW for vXi16 SMAX/SMIN/UMAX/UMIN horizontal ↵Simon Pilgrim2017-11-231-0/+3
| | | | | | | | | | | | | | reductions (PR32841) (V)PHMINPOSUW determines the UMIN element in an v8i16 input, with suitable bit flipping it can also be used for SMAX/SMIN/UMAX cases as well. This patch matches vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions and reduces the input down to a v8i16 vector before calling (V)PHMINPOSUW. A later patch will use this for v16i8 reductions as well (PR32841). Differential Revision: https://reviews.llvm.org/D39729 llvm-svn: 318917
* [x86][icelake]BITALGCoby Tayree2017-11-231-0/+7
| | | | | | | | | | 2/3 vpshufbitqmb encoding 3/3 vpshufbitqmb intrinsics Differential Revision: https://reviews.llvm.org/D40222 llvm-svn: 318904
* [X86] Add an X86ISD::MSCATTER node for consistency with the X86ISD::MGATHER.Craig Topper2017-11-221-12/+19
| | | | | | | | This makes the fact that X86 needs an explicit mask output not part of the type constraint for the ISD::MSCATTER. This also gives the X86ISD::MGATHER/MSCATTER nodes a common base class simplifying the address selection code in X86ISelDAGToDAG.cpp llvm-svn: 318823
* [X86] Lower all ISD::MGATHER nodes to X86ISD:MGATHER.Craig Topper2017-11-221-95/+18
| | | | | | | | Now we consistently represent the mask result without relying on isel ignoring it. We now have a more general SDNode and type constraints to represent these nodes in isel patterns. This allows us to present both both vXi1 and XMM/YMM mask types with a single set of constraints. llvm-svn: 318821
* [x86][icelake]VNNICoby Tayree2017-11-211-0/+9
| | | | | | | | | Introducing Vector Neural Network Instructions, consisting of: vpdpbusd{s} vpdpwssd{s} Differential Revision: https://reviews.llvm.org/D40208 llvm-svn: 318746
* [x86][icelake]vbmi2Coby Tayree2017-11-211-0/+13
| | | | | | | | | | | introducing vbmi2, consisting of vpcompress{b,w} vpexpand{b,w} vpsh{l,r}d{w,d,q} vpsh{l,r}dv{w,d,q} Differential Revision: https://reviews.llvm.org/D40206 llvm-svn: 318745
* [X86] Simplify type constraints for AVX2 masked gather.Craig Topper2017-11-211-19/+14
| | | | | | We don't need separate 32 and 64 node types. We can use SDTCisInt and SDTCisSameSizeAs to ensure the mask size the result type and is integer. llvm-svn: 318732
* [X86] Simplify the predicates for avx2 masked gather patterns.Craig Topper2017-11-211-33/+17
| | | | | | We don't need a dyn_cast and we only need to check the type of the index. The base ptr is guaranteed to be scalar. llvm-svn: 318730
* [LV][X86] Support of AVX2 Gathers code generation and update the LV with thisMohammed Agabaria2017-11-201-0/+88
| | | | | | | | | | | | | | | This patch depends on: https://reviews.llvm.org/D35348 Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen) Update LoopVectorize to generate gathers for AVX2 processors. Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb Reviewed By: delena, RKSimon Differential Revision: https://reviews.llvm.org/D35772 llvm-svn: 318641
* [X86] Simplify the gather/scatter isel predicates.Craig Topper2017-11-181-54/+27
| | | | | | We don't need a dyn_cast, the predicate already specified the base node. We only need to check the type of the index, the base ptr is guaranteed to be scalar. llvm-svn: 318596
* [X86] Redefine the 128-bit version of VPGATHERQD and VGATHERQPS to use a VK2 ↵Craig Topper2017-11-151-1/+10
| | | | | | | | | | mask instead of a VK4 mask. This allows us to remove extra extend creation during lowering and more accurately reflects the semantics of the instruction. While there add an extra output VT to X86 masked gather node to better match the isel pattern predicate. Currently we're exploiting the fact that the isel table doesn't count how many output results a node actually has if the result type of any can be inferred from the first result and the type constraints defined in tablegen. I think we might ultimately want to lower all MGATHER/MSCATTER to an X86ISD node with the extra mask result and stop relying on this hole in the isel checking. llvm-svn: 318278
* [X86] Split VRNDSCALE/VREDUCE/VGETMANT/VRANGE ISD nodes into versions with ↵Craig Topper2017-11-131-8/+23
| | | | | | | | and without the rounding operand. NFCI I want to reuse the VRNDSCALE node for the legacy SSE rounding intrinsics so that those intrinsics can use EVEX instructions. All of these nodes share tablegen multiclasses so I split them all so that they all remain similar in their implementations. llvm-svn: 318007
* [X86] Add an X86ISD::RANGES opcode to use for the scalar intrinsics.Craig Topper2017-11-121-0/+1
| | | | | | This fixes a bug where we selected packed instructions for scalar intrinsics. llvm-svn: 317999
* [X86] Make X86ISD::FMADDS3 isel patterns commutable.Craig Topper2017-11-091-4/+4
| | | | | | This was missed when FMADDS3 was split from X86ISD::FMADDS3_RND. llvm-svn: 317769
* [X86] Add support for using EVEX instructions for the legacy vcvtph2ps ↵Craig Topper2017-11-071-0/+5
| | | | | | | | intrinsics. Looks like there's some missed load folding opportunities for i64 loads. llvm-svn: 317544
* [X86] Add scalar FMA ISD nodes without rounding mode. NFCCraig Topper2017-11-061-0/+11
| | | | | | Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers. llvm-svn: 317453
* [X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy ↵Craig Topper2017-11-041-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SSE rcp/rsqrt intrinsics when AVX512 features are enabled. Summary: AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement. Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt. I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed. As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here. This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions. Going forward I think our focus should be on -Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14. -Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER -Supporting double precision. Reviewers: zvi, DavidKreitzer, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39583 llvm-svn: 317413
* [X86][XOP] Merge rotation opcodes with AVX512 equivalents. NFCI.Simon Pilgrim2017-09-261-3/+0
| | | | | | | | The XOP rotations act as ROTL with +ve values and ROTR with -ve values, which means that we can treat them all as ROTL with unsigned modulo. We already check that we're only trying to lower as ROTL for XOP rotations. Differential Revision: https://reviews.llvm.org/D37949 llvm-svn: 314207
* [X86] Make IFMA instructions during isel so we can fold broadcast loads.Craig Topper2017-09-241-2/+2
| | | | | | This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us. llvm-svn: 314083
* [X86] Move the getInsertVINSERTImmediate and getExtractVEXTRACTImmediate ↵Craig Topper2017-09-231-4/+4
| | | | | | | | helper functions over to X86ISelDAGToDAG.cpp Redefine them to call getI8Imm and return that directly. llvm-svn: 314059
* [X86] Remove is the isVINSERT*Index/isVEXTRACT*Index predicates from isel.Craig Topper2017-09-231-13/+8
| | | | | | The only insert_subvector/extract_subvector nodes that make it to isel are guaranteed to match. llvm-svn: 314058
* [X86] Convert X86ISD::SELECT to ISD::VSELECT just before instruction ↵Craig Topper2017-09-191-6/+0
| | | | | | | | | | selection to avoid duplicate patterns Similar to what we do for X86ISD::SHRUNKBLEND just turn X86ISD::SELECT into ISD::VSELECT. This allows us to remove the duplicated TRUNC patterns. Differential Revision: https://reviews.llvm.org/D38022 llvm-svn: 313644
OpenPOWER on IntegriCloud