summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrAVX512.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBSNikita Popov2018-12-181-2/+2
| | | | | | | | | | | | Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision: https://reviews.llvm.org/D55833 llvm-svn: 349520
* [X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUSNikita Popov2018-12-181-2/+2
| | | | | | | | | | | | | Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes UADDSAT and USUBSAT. As a side-effect, this also makes codegen for the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable. This only replaces use in the X86 backend, and does not move any of the ADDUS/SUBUS X86 specific combines into generic codegen. Differential Revision: https://reviews.llvm.org/D55787 llvm-svn: 349481
* [AVX512] Update typo in commentCameron McInally2018-12-101-1/+1
| | | | | | | | Should be "Sae" for "Suppress All Exceptions". NFC llvm-svn: 348763
* [SelectionDAG][X86] Relax restriction on the width of an input to ↵Craig Topper2018-11-131-72/+130
| | | | | | | | | | | | | | | | | | *_EXTEND_VECTOR_INREG. Use them and regular *_EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784
* [X86] Move the promotion of v16i16->v16i8 for avx512f but not avx512bw from ↵Craig Topper2018-11-091-0/+12
| | | | | | | | | | lowering to isel. Change to use vpmovzx instead of vpmovsx. With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs. llvm-svn: 346552
* [X86] Don't turn any_extend from a mask register into a sign_extend during ↵Craig Topper2018-11-051-0/+12
| | | | | | | | | | lowering. Add patterns to match any_extend during isel instead. SimplifyDemandedBits can turn a sign_extend back into an any_extend and trigger an infinite loop. So instead legalize it the same way as a sign_extend, but preserve the opcode. Then just pattern match it the same as sign_extend during isel. I don't have a reduced test case for such an infinite loop yet. llvm-svn: 346170
* [X86] Stop promoting vector and/or/xor/andn to vXi64.Craig Topper2018-10-261-203/+476
| | | | | | | | | | | | These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation. This patch removes the promotion and adds isel patterns instead. The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward. Differential Revision: https://reviews.llvm.org/D53268 llvm-svn: 345408
* [X86] Correct a bad isel predicate. Though I don't think it can be exposed.Craig Topper2018-10-241-1/+1
| | | | | | This B/W VPTEST instructions are only available with AVX512BW. But lowering should prevent any byte or word elements from getting to isel so this can't be exposed. llvm-svn: 345112
* Recommit r344877 "[X86] Stop promoting integer loads to vXi64"Craig Topper2018-10-221-98/+145
| | | | | | | | | | | | | | | | | | | | | | | | I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965
* Revert r344877 "[X86] Stop promoting integer loads to vXi64"Craig Topper2018-10-221-145/+98
| | | | | | Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out. llvm-svn: 344921
* [X86] Add patterns for vector and/or/xor/andn with other types than vXi64.Craig Topper2018-10-221-0/+88
| | | | | | | | | | | | This makes fast isel treat all legal vector types the same way. Previously only vXi64 was in the fast-isel tables. This unfortunately prevents matching of andn by fast-isel for these types since the requires SelectionDAG. But we already had this issue for vXi64. So at least we're consistent now. Interestinly it looks like fast-isel can't handle instructions with constant vector arguments so the the not part of the andn patterns is selected with SelectionDAG. This explains why VPTERNLOG shows up in some of the tests. This is a subset of D53268. As I make progress on that, I will try to reduce the number of lines in the tablegen files. llvm-svn: 344884
* [X86] Stop promoting integer loads to vXi64Craig Topper2018-10-211-98/+145
| | | | | | | | | | | | | | | | | | | | | Summary: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast. I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size. I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344877
* [X86] Remove some isel patterns that shouldn't be possible.Craig Topper2018-10-151-2/+0
| | | | | | These included a bitcast of a load from v4f32 to v2f64, but DAG combine should have already changed the type of the load to remove the cast. llvm-svn: 344573
* [X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for ↵Craig Topper2018-10-151-9/+10
| | | | | | EVEX encoded instructions. llvm-svn: 344563
* [X86] Move ReadAfterLd functionality into X86FoldableSchedWrite (PR36957)Simon Pilgrim2018-10-051-119/+119
| | | | | | | | | | | | Currently we hardcode instructions with ReadAfterLd if the register operands don't need to be available until the folded load has completed. This doesn't take into account the different load latencies of different memory operands (PR36957). This patch adds a ReadAfterFold def into X86FoldableSchedWrite to replace ReadAfterLd, allowing us to specify the load latency at a scheduler class level. I've added ReadAfterVec*Ld classes that match the XMM/Scl, XMM and YMM/ZMM WriteVecLoad classes that we currently use, we can tweak these values in future patches once this infrastructure is in place. Differential Revision: https://reviews.llvm.org/D52886 llvm-svn: 343868
* [X86] Add isel pattern for (v8i16 (sext (v8i1))) with DQI and no BWI.Craig Topper2018-09-231-0/+5
| | | | | | | | Our lowering that tries to avoid this sign extend can be defeated by the DAG combine folding it with a truncate. The pattern needs to extend to an v8i32 then truncate back down to v8i16. llvm-svn: 342830
* [SelectionDAG][X86] Reorder the operands the MaskedStoreSDNode to put the ↵Craig Topper2018-08-251-12/+11
| | | | | | | | | | | | | | | | | | | | | value first. Summary: Previously the value being stored is the last operand in SDNode. This causes the type legalizer to visit the mask operand before the value operand. The type legalizer was more complicated because of this since we want the type of the value to drive the decisions. This patch moves the value to be the first operand so we visit it first during type legalization. It also simplifies the type legalization code accordingly. X86 is currently the only in tree target that uses this SDNode. Not sure if there are any users out of tree. Reviewers: RKSimon, delena, hfinkel, eli.friedman Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50402 llvm-svn: 340689
* [X86] Change legacy SSE scalar fp to integer intrinsics to use specific ISD ↵Craig Topper2018-08-151-86/+47
| | | | | | | | | | | | opcodes instead of keeping as intrinsics. Unify SSE and AVX512 isel patterns. AVX512 added new versions of these intrinsics that take a rounding mode. If the rounding mode is 4 the new intrinsics are equivalent to the old intrinsics. The AVX512 intrinsics were being lowered to ISD opcodes, but the legacy SSE intrinsics were left as intrinsics. This resulted in the AVX512 instructions needing separate patterns for the ISD opcodes and the legacy SSE intrinsics. Now we convert SSE intrinsics and AVX512 intrinsics with rounding mode 4 to the same ISD opcode so we can share the isel patterns. llvm-svn: 339749
* [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types.Craig Topper2018-07-201-10/+0
| | | | | | | | | | Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction. Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining. Differential Revision: https://reviews.llvm.org/D49280 llvm-svn: 337590
* [X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by ↵Craig Topper2018-07-181-11/+20
| | | | | | | | using VMOVLPS with a modified address. This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable. llvm-svn: 337357
* [X86] Remove patterns that mix X86ISD::MOVLHPS/MOVHLPS with v2i64/v2f64 types.Craig Topper2018-07-181-7/+0
| | | | | | The X86ISD::MOVLHPS/MOVHLPS should now only be emitted in SSE1 only. This means that the v2i64/v2f64 types would be illegal thus we don't need these patterns. llvm-svn: 337349
* [X86] Add patterns for folding full vector load into MOVHPS and MOVLPS with ↵Craig Topper2018-07-171-1/+3
| | | | | | SSE1 only. llvm-svn: 337320
* [X86] Remove some standalone patterns in favor of the patterns in the MOVLPD ↵Craig Topper2018-07-171-5/+1
| | | | | | | | instruction definitions. Previously we passed 'null_frag' into the instruction definition. The multiclass is shared with MOVHPD which doesn't use null_frag. It turns out by passing X86Movsd it produces patterns equivalent to some standalone patterns. llvm-svn: 337299
* [X86] Add full set of patterns for turning ceil/floor/trunc/rint/nearbyint ↵Craig Topper2018-07-171-178/+197
| | | | | | | | into rndscale with loads, broadcast, and masking. This amounts to pretty ridiculous number of patterns. Ideally we'd canonicalize the X86ISD::VRNDSCALE earlier to reuse those patterns. I briefly looked into doing that, but some strict FP operations could still get converted to rint and nearbyint during isel. It's probably still worthwhile to look into. This patch is meant as a starting point to work from. llvm-svn: 337234
* [X86] Merge the FR128 and VR128 regclass since they have identical spill and ↵Craig Topper2018-07-161-138/+152
| | | | | | | | | | alignment characteristics. This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid. The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well. llvm-svn: 337147
* [x86/SLH] Teach speculative load hardening to correctly harden theChandler Carruth2018-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | indices used by AVX2 and AVX-512 gather instructions. The index vector is hardened by broadcasting the predicate state into a vector register and then or-ing. We don't even have to worry about EFLAGS here. I've added a test for all of the gather intrinsics to make sure that we don't miss one. A particularly interesting creation is the gather prefetch, which needs to be marked as potentially "loading" to get the correct behavior. It's a memory access in many ways, and is actually relevant for SLH. Based on discussion with Craig in review, I've moved it to be `mayLoad` and `mayStore` rather than generic side effects. This matches how we model other prefetch instructions. Many thanks to Craig for the review here. Differential Revision: https://reviews.llvm.org/D49336 llvm-svn: 337144
* [X86] Use 128-bit blends instead vmovss/vmovsd for 512-bit vzmovl patterns ↵Craig Topper2018-07-151-12/+39
| | | | | | to match AVX. llvm-svn: 337135
* [X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing ↵Craig Topper2018-07-141-8/+15
| | | | | | | | | | for size. AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX. This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that. llvm-svn: 337083
* [X86] Remove isel patterns that turns packed add/sub/mul/div+movss/sd into ↵Craig Topper2018-07-131-18/+18
| | | | | | | | | | scalar intrinsic instructions. This is not an optimization we should be doing in isel. This is more suitable for a DAG combine. My main concern is a future time when we support more FPENV. Changing a packed op to a scalar op could cause us to miss some exceptions that should have occured if we had done a packed op. A DAG combine would be better able to manage this. llvm-svn: 336971
* [X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions.Craig Topper2018-07-121-5/+36
| | | | | | These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD. llvm-svn: 336954
* Revert r336950 and r336951 "[X86] Add AVX512 equivalents of some isel ↵Craig Topper2018-07-121-36/+5
| | | | | | | | patterns so we get EVEX instructions." and "foo" One of them had a bad title and they should have been squashed. llvm-svn: 336953
* [X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions.Craig Topper2018-07-121-0/+31
| | | | | | These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD. llvm-svn: 336951
* fooCraig Topper2018-07-121-5/+5
| | | | llvm-svn: 336950
* [X86] Remove patterns and ISD nodes for the old scalar FMA intrinsic lowering.Craig Topper2018-07-121-40/+12
| | | | | | We now use llvm.fma.f32/f64 or llvm.x86.fmadd.f32/f64 intrinsics that use scalar types rather than vector types. So we don't these special ISD nodes that operate on the lowest element of a vector. llvm-svn: 336883
* [X86] Add patterns to use VMOVSS/SD zero masking for scalar f32/f64 select ↵Craig Topper2018-07-121-0/+8
| | | | | | | | with zero. These showed up in some of the upgraded FMA code. We really need to improve these test cases more, but this helps for now. llvm-svn: 336875
* [X86] Remove and autoupgrade the scalar fma intrinsics with masking.Craig Topper2018-07-121-0/+22
| | | | | | This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches. llvm-svn: 336871
* [X86] Fix MayLoad/HasSideEffect flag for (V)MOVLPSrm instructions.Andrea Di Biagio2018-07-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was inferred directly from the "default" pattern associated with the instruction definition. r336728 removed special node X86Movlps, and all the patterns associated to it. Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the 'mayLoad/hasSideEffects' flags are left unset. When the instruction info is emitted by tablegen, method CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a pattern, and flags are undefined. So, it conservatively sets the "hasSideEffects" flag for it. As a consequence, we were losing the 'mayLoad' flag, and we were gaining a 'hasSideEffect' flag in its place. This patch fixes the issue (originally reported by Michael Holmen). The mca tests show the differences in the instruction info flags. Instructions that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm. Differential Revision: https://reviews.llvm.org/D49182 llvm-svn: 336818
* [X86] Remove some composite MOVSS/MOVSD isel patterns.Craig Topper2018-07-111-11/+0
| | | | | | | | | | | | These patterns looked for a MOVSS/SD followed by a scalar_to_vector. Or a scalar_to_vector followed by a load. In both cases we emitted a MOVSS/SD for the MOVSS/SD part, a REG_CLASS for the scalar_to_vector, and a MOVSS/SD for the load. But we have patterns that do each of those 3 things individually so there's no reason to build large patterns. Most of the test changes are just reorderings. The one test that had a meaningful change is pr30430.ll and it appears to be a regression. But its doing -O0 so I think it missed a lot of opportunities and was just getting lucky before. llvm-svn: 336762
* [X86] Remove AddedComplexity from all patterns that use X86vzmovl as their root.Craig Topper2018-07-101-73/+60
| | | | | | | | Some added 20 and some added 15. Its unclear when to use which value and whether they are required at all. This patch removes them all. If we start finding real world issues we may need to add them back with proper tests. llvm-svn: 336735
* [X86] Remove X86ISD::MOVLPS and X86ISD::MOVLPD. NFCICraig Topper2018-07-101-21/+4
| | | | | | | | | | | | These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load. There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load. We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason. So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns. llvm-svn: 336728
* [X86] Correct vfixupimm load patterns to look for an integer load, not a ↵Craig Topper2018-07-101-21/+20
| | | | | | | | floating point load bitcasted to integer. DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load. llvm-svn: 336626
* [X86] Remove FloatVT from X86VectorVTInfo in X86InstrAVX512.tdCraig Topper2018-07-101-18/+8
| | | | | | The only places it was used where places where VT was the same as FloatVT. So switch those uses to VT and drop it. llvm-svn: 336624
* [X86] Remove some patterns that include a bitcast of a floating point load ↵Craig Topper2018-07-091-2/+0
| | | | | | | | to an integer type. DAG combine should have converted the type of the load. llvm-svn: 336557
* [X86] Remove some patterns that seems to be unreachable.Craig Topper2018-07-091-3/+0
| | | | | | | | These patterns mapped (v2f64 (X86vzmovl (v2f64 (scalar_to_vector FR64:$src)))) to a MOVSD and an zeroing XOR. But the complexity of a pattern for (v2f64 (X86vzmovl (v2f64))) that selects MOVQ is artificially and hides this MOVSD pattern. Weirder still, the SSE version of the pattern was explicitly blocked on SSE41, but yet we had copied it to AVX and AVX512. llvm-svn: 336556
* [X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types.Craig Topper2018-07-081-43/+178
| | | | | | | | | | | | | | This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma. I had to add new rounding mode CodeGenOnly instructions to support isel when we can't find a movss to grab the upper bits from to use the b_Int instruction. Fast-isel tests have been updated to match new clang codegen. We are currently having trouble folding fneg into the new intrinsic. I'm going to correct that in a follow up patch to keep the size of this one down. A future patch will also remove the old intrinsics. llvm-svn: 336506
* [X86] Add more FMA3 memory folding patterns. Remove patterns that are no ↵Craig Topper2018-07-061-0/+16
| | | | | | | | longer needed. We've removed the legacy FMA3 intrinsics and are now using llvm.fma and extractelement/insertelement. So we don't need patterns for the nodes that could only be created by the old intrinscis. Those ISD opcodes still exist because we haven't dropped the AVX512 intrinsics yet, but those should go to EVEX instructions. llvm-svn: 336457
* [X86] Remove some isel patterns for X86ISD::SELECTS that specifically looked ↵Craig Topper2018-07-051-18/+0
| | | | | | | | for the v1i1 mask to have come from a scalar_to_vector from GR8. We have patterns for SELECTS that top at v1i1 and we have a pattern for (v1i1 (scalar_to_vector GR8)). The patterns being removed here do the same thing as the two other patterns combined so there is no need for them. llvm-svn: 336305
* [X86] Reduce the number of patterns needed for masked scalar ceil/floor isel.Craig Topper2018-06-251-35/+10
| | | | | | The scalar to vector on the mask register should not be part of the patterns. llvm-svn: 335435
* [X86] Rename VFPCLASSSS and VFPCLASSSD internal instruction names to include ↵Craig Topper2018-06-241-6/+6
| | | | | | a Z to match other EVEX instructions. llvm-svn: 335428
* [X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to iselCraig Topper2018-06-201-112/+197
| | | | | | | | | | I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code. I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering. Differential Revision: https://reviews.llvm.org/D43608 llvm-svn: 335173
OpenPOWER on IntegriCloud