summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrAVX512.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Make getZeroVector return floating point vectors in their native type ↵Craig Topper2019-09-081-0/+12
| | | | | | | | | | | | | | on SSE2 and later. isel used to require zero vectors to be canonicalized to a single type to minimize the number of patterns needed to match. This is no longer required. I plan to do this to integers too, but floating point was simpler to start with. Integer has a complication where v32i16/v64i8 aren't legal when the other 512-bit integer types are. llvm-svn: 371325
* [X86] Remove isel patterns with X86VBroadcast+scalar_to_vector+load.Craig Topper2019-08-291-29/+1
| | | | | | The DAG should have these as X86VBroadcast+load. llvm-svn: 370299
* [X86] Remove some unneeded X86VBroadcast isel patterns that have larger than ↵Craig Topper2019-08-291-32/+0
| | | | | | | | | 128 bit input types. We should always be shrinking the input to 128 bits or smaller when the node is created. llvm-svn: 370296
* [X86] Add isel patterns to match vpdpwssd avx512vnni instruction from ↵Craig Topper2019-08-241-0/+29
| | | | | | add+pmaddwd nodes. llvm-svn: 369859
* [X86] Mark VPDPWSSD and VPDPWSSDS as commutable. Add stack folding tests.Craig Topper2019-08-231-10/+15
| | | | llvm-svn: 369792
* [X86] Add isel patterns for (i64 (zext (i8 (bitcast (v16i1 X))))) to use a ↵Craig Topper2019-08-201-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | KMOVW and a SUBREG_TO_REG. Similar for i8 and anyextend. We already had patterns for extending to i32 to take advantage of the impliciting zeroing of the upper bits of a 32-bit GPR that is done by KMOVW/KMOVB. But the extend might be all the way to i64, in which case the existing patterns would fail and we'd get a KMOVW/B followed by a MOVZX. By adding patterns for i64 we can use the fact that KMOVW/B zero the upper bits of the 32-bit GPR and the normal property that 32-bit GPR writes implicitly zero the upper 32-bits of the full 64-bit GPR. The anyextend patterns are slightly different since we don't care about the upper zeros. For the i8->i64 I think this avoids selecting the anyextend as a MOVZX to prevent a partial register issue that doesn't exist. For i16->i64 I think we would have just emitted an insert_subreg on top of the extract_subreg that the vXi16->i16 bitcast pattern emits. The register coalescer or peephole pass should combine those, but this saves that work and makes i8/16 consistent. llvm-svn: 369431
* [X86] Separate the memory size of vzext_load/vextract_store from the element ↵Craig Topper2019-07-151-48/+48
| | | | | | | | | | | | | | | | | | | | | | | | | size of the result type. Use them improve the codegen of v2f32 loads/stores with sse1 only. Summary: SSE1 only supports v4f32. But does have instructions like movlps/movhps that load/store 64-bits of memory. This patch breaks the connection between the node VT of the vzext_load/vextract_store patterns and the memory VT. Enabling a v4f32 node with a 64-bit memory VT. I've used i64 as the memory VT here. I've written the PatFrag predicate to just check the store size not the specific VT. I think the VT will only matter for CSE purposes. We could use v2f32, but if we want to start using these operations in more places a simple integer type might make the most sense. I'd like to maybe use this same thing for SSE2 and later as well, but that will need more work to be supported by EltsFromConsecutiveLoads to avoid regressing lit tests. I'd maybe also like to combine bitcasts with these load/stores nodes now that the types are disconnected. And I'd also like to consider canonicalizing (scalar_to_vector + load) to vzext_load. If you want I can split the mechanical tablegen stuff where I added the 32/64 off from the sse1 change. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64528 llvm-svn: 366034
* [X86] Make sure load isn't volatile before shrinking it in MOVDDUP isel ↵Craig Topper2019-07-071-3/+3
| | | | | | patterns. llvm-svn: 365275
* [X86] Remove patterns from MOVLPSmr and MOVHPSmr instructions.Craig Topper2019-07-061-8/+4
| | | | | | | | These patterns are the same as the MOVLPDmr and MOVHPDmr patterns, but with a bitcast at the end. We can just select the PD instruction and let execution domain fixing switch to PS. llvm-svn: 365267
* [X86] Add a DAG combine for turning *_extend_vector_inreg+load into an ↵Craig Topper2019-07-021-20/+0
| | | | | | | | | | appropriate extload if the load isn't volatile. Remove the corresponding isel patterns that did the same thing without checking for volatile. This fixes another variation of PR42079 llvm-svn: 364977
* [X86] Add patterns to select (scalar_to_vector (loadf32)) as (V)MOVSSrm ↵Craig Topper2019-07-021-0/+4
| | | | | | | | | | instead of COPY_TO_REGCLASS + (V)MOVSSrm_alt. Similar for (V)MOVSD. Ultimately, I'd like to see about folding scalar_to_vector+load to vzload. Which would select as (V)MOVSSrm so this is closer to that. llvm-svn: 364948
* [X86] Add PreprocessISelDAG support for turning ISD::FP_TO_SINT/UINT into ↵Craig Topper2019-07-021-113/+9
| | | | | | X86ISD::CVTTP2SI/CVTTP2UI and to reduce the number of isel patterns. llvm-svn: 364887
* [X86] Use v4i32 vzloads instead of v2i64 for vpmovzx/vpmovsx patterns where ↵Craig Topper2019-07-011-5/+3
| | | | | | | | | | | | only 32-bits are loaded. v2i64 vzload defines a 64-bit memory access. It doesn't look like we have any coverage for this either way. Also remove some vzload usages where the instruction loads only 16-bits. llvm-svn: 364851
* [X86] Remove several bad load folding isel patterns for VPMOVZX/VPMOVSX.Craig Topper2019-07-011-6/+0
| | | | | | | These patterns all matched a v2i64 vzload which only loads 64-bits to instructions that load a full 128-bits. llvm-svn: 364847
* [X86] Correct v4f32->v2i64 cvt(t)ps2(u)qq memory isel patternsCraig Topper2019-07-011-2/+54
| | | | | | | | | | | | | | | These instructions only read 64-bits of memory so we shouldn't allow a full vector width load to be pattern matched in case it is marked volatile. Instead allow vzload or scalar_to_vector+load. Also add a DAG combine to turn full vector loads into vzload when used by one of these instructions if the load isn't volatile. This fixes another case for PR42079 llvm-svn: 364838
* [X86] Add a DAG combine to replace vector loads feeding a v4i32->v2f64 ↵Craig Topper2019-07-011-0/+16
| | | | | | | | | | | | | | CVTSI2FP/CVTUI2FP node with a vzload. But only when the load isn't volatile. This improves load folding during isel where we only have vzload and scalar_to_vector+load patterns. We can't have full vector load isel patterns for the same volatile load issue. Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns. llvm-svn: 364728
* [X86] Add MOVHPDrm/MOVLPDrm patterns that use VZEXT_LOAD.Craig Topper2019-07-011-0/+6
| | | | | | | | | We already had patterns that used scalar_to_vector+load. But we can also have a vzload. Found while investigating combining scalar_to_vector+load to vzload. llvm-svn: 364726
* [X86] Remove some duplicate patterns that already exist as part of their ↵Craig Topper2019-06-281-5/+1
| | | | | | instruction definition. NFC llvm-svn: 364623
* [X86] Remove isel patterns that look for (vzext_movl (scalar_to_vector (load)))Craig Topper2019-06-251-38/+0
| | | | | | | | I believe these all get canonicalized to vzext_movl. The only case where that wasn't true was when the load was loadi32 and the load was an extload aligned to 32 bits. But that was fixed in r364207. Differential Revision: https://reviews.llvm.org/D63701 llvm-svn: 364337
* [X86] Add a DAG combine to turn vzmovl+load into vzload if the load isn't ↵Craig Topper2019-06-251-8/+0
| | | | | | | | | | | | volatile. Remove isel patterns for vzmovl+load We currently have some isel patterns for treating vzmovl+load the same as vzload, but that shrinks the load which we shouldn't do if the load is volatile. Rather than adding isel checks for volatile. This patch removes the patterns and teachs DAG combine to merge them into vzload when its legal to do so. Differential Revision: https://reviews.llvm.org/D63665 llvm-svn: 364333
* [X86] Turn v16i16->v16i8 truncate+store into a any_extend+truncstore if we ↵Craig Topper2019-06-231-2/+0
| | | | | | | | | | | | | | | | avx512f, but not avx512bw. Ideally we'd be able to represent this truncate as a any_extend to v16i32 and a truncate, but SelectionDAG doens't know how to not fold those together. We have isel patterns to use a vpmovzxwd+vpdmovdb for the truncate, but we aren't able to simultaneously fold the load and the store from the isel pattern. By pulling the truncate into the store we can successfully hide it from the DAG combiner. Then we can isel pattern match the truncstore and load+any_extend separately. llvm-svn: 364163
* [X86] Fix isel pattern that was looking for a bitcasted load. Remove what ↵Craig Topper2019-06-231-13/+1
| | | | | | | | | | appears to be a copy/paste mistake. DAG combine should ensure bitcasts of loads don't exist. Also remove 3 patterns that are identical to the block above them. llvm-svn: 364158
* [X86][SelectionDAG] Cleanup and simplify masked_load/masked_store in ↵Craig Topper2019-06-231-12/+12
| | | | | | | | | | | | | | | | | | | | tablegen. Use more precise PatFrags for scalar masked load/store. Rename masked_load/masked_store to masked_ld/masked_st to discourage their direct use. We need to check truncating/extending and compressing/expanding before using them. This revealed that our scalar masked load/store patterns were misusing these. With those out of the way, renamed masked_load_unaligned and masked_store_unaligned to remove the "_unaligned". We didn't check the alignment anyway so the name was somewhat misleading. Make the aligned versions inherit from masked_load/store instead from a separate identical version. Merge the 3 different alignments PatFrags into a single version that uses the VT from the SDNode to determine the size that the alignment needs to match. llvm-svn: 364150
* [X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into ↵Craig Topper2019-06-211-38/+0
| | | | | | | | | | | | | | | | (insert_subvector allzeros, (vzmovl X), 0) 128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg. This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns. Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining. I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload. Differential Revision: https://reviews.llvm.org/D63512 llvm-svn: 364095
* [X86] Use vmovq for v4i64/v4f64/v8i64/v8f64 vzmovl.Craig Topper2019-06-211-31/+22
| | | | | | | | | | | | | | We already use vmovq for v2i64/v2f64 vzmovl. But we were using a blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or movsd+xorpd under optsize. I think the blend with 0 or movss/d is only needed for vXi32 where we don't have an instruction that can move 32 bits from one xmm to another while zeroing upper bits. movq is no worse than blendpd on any known CPUs. llvm-svn: 364079
* [X86] Replace any_extend* vector extensions with zero_extend* equivalentsSimon Pilgrim2019-06-181-36/+0
| | | | | | | | | | First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal. In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep...... Differential Revision: https://reviews.llvm.org/D63326 llvm-svn: 363655
* [X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly ↵Craig Topper2019-06-181-19/+0
| | | | | | | | | | | | instructions. The isel patterns for these use a bitcast and load/store, but DAG combine should have canonicalized those away. For the purposes of the memory folding table these opcodes can be replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes. llvm-svn: 363644
* [X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class.Craig Topper2019-06-181-11/+14
| | | | | | | | | | | | | | | | | | | | | | Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt. Use the new versions in patterns that previously used a COPY_TO_REGCLASS to VR128. These patterns expect the upper bits to be zero. The current set up appears to work, but I'm not sure we should be enforcing upper bits being zero through a COPY_TO_REGCLASS. I wanted to flip the arrangement and use a COPY_TO_REGCLASS to FR32/FR64 for the patterns that need an f32/f64 result, but that complicated fastisel and globalisel. I've been doing some experiments with reducing some isel patterns and ended up in a situation where I had a (SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our post-isel peephole was unable to avoid using an instruction for the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128 instruction removes the COPY_TO_REGCLASS that was breaking this. llvm-svn: 363643
* Use VR128X instead of FR32X/FR64X for the register class in ↵Craig Topper2019-06-171-5/+5
| | | | | | | | VMOVSSZmrk/VMOVSDZmrk. Removes COPY_TO_REGCLASS from some patterns. llvm-svn: 363630
* [X86] Add load folding isel patterns to scalar_math_patterns and ↵Craig Topper2019-06-111-0/+23
| | | | | | | | AVX512_scalar_math_fp_patterns. Also add a FIXME for the peephole pass not being able to handle this. llvm-svn: 363032
* [X86] Disable f32->f64 extload when sse2 is enabledCraig Topper2019-06-101-8/+0
| | | | | | | | | | | | | | | | | | | Summary: We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize. This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62710 llvm-svn: 362919
* [X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and ↵Craig Topper2019-06-101-45/+0
| | | | | | | | | | | | | | | | | | | scalar_to_vector/extract_vector_elts to reduce isel patterns. Previously we did the equivalent operation in isel patterns with COPY_TO_REGCLASS operations to transition. By inserting scalar_to_vetors and extract_vector_elts before isel we can allow each piece to be selected individually and accomplish the same final result. I ideally we'd use vector operations earlier in lowering/combine, but that looks to be more difficult. The scalar-fp-to-i64.ll changes are because we have a pattern for using movlpd for store+extract_vector_elt. While an f64 store uses movsd. The encoding sizes are the same. llvm-svn: 362914
* [X86] Remove (store (f32 (extractelt (v4f32))) isel patterns which is redundant.Craig Topper2019-06-091-5/+0
| | | | | | | We emit a MOVSSmr and a COPY_TO_REGCLASS, but that's what we would get from selecting the store and extractelt independently. llvm-svn: 362895
* [X86] Mutate scalar fceil/ffloor/ftrunc/fnearbyint/frint into ↵Craig Topper2019-06-081-28/+4
| | | | | | | | X86ISD::RNDSCALE during PreProcessIselDAG to cut down on number of isel patterns. Similar was done for vectors in r362535. Removes about 1200 bytes from the isel table. llvm-svn: 362894
* [X86] Make a bunch of merge masked binops commutable for loading folding.Craig Topper2019-06-061-8/+7
| | | | | | | | | | | | | This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg. We already commuted the unmasked and zero masked versions. I've added 512-bit stack folding tests for most of the instructions affected. I've tested needing commuting and not commuting across unmasked, merged masked, and zero masked. The 128/256 bit instructions should behave similarly. llvm-svn: 362746
* [X86] Make masked floating point equality/ordered compares commutable for ↵Craig Topper2019-06-061-3/+3
| | | | | | | | load folding purposes. Same as what is supported for the unmasked form. llvm-svn: 362717
* [X86] Fix mistake that marked ↵Craig Topper2019-06-051-1/+1
| | | | | | | | | | | | | | | VADDSSrrb_Int/VADDSDrrb_Int/VMULSSrrb_Int/VMULSDrrb_Int as commutable. One of the sources controls the pass through value for the upper bits of the result so we can't really commute it. In practice this problem isn't a functional issue because we would only try to commute this instruction in order to fold a load. But we can't do embedded rounding and fold a load at the same time. So the load fold would never succeed so I don't think we would ever commute or at least keep the version after commuting. llvm-svn: 362647
* [X86] Mutate fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE ↵Craig Topper2019-06-041-203/+0
| | | | | | | | | | | | | | | | during PreProcessIselDAG to cut down on pattern permutations We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table. This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K. There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need. We can probably do something similar for scalar, but I haven't looked at it yet. Differential Revision: https://reviews.llvm.org/D62757 llvm-svn: 362535
* [X86] Fix the pattern for merge masked vcvtps2pd.Craig Topper2019-06-031-4/+1
| | | | | | | | r362199 fixed it for zero masking, but not zero masking. The load folding in the peephole pass hid the bug. This patch turns off the peephole pass on the relevant test to ensure coverage. llvm-svn: 362440
* [X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64.Craig Topper2019-05-311-51/+12
| | | | | | | | | | | | | | These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits. Similar to PR42079. Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the load pattern used in the instructions. This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe we should use VZMOVL for widened loads even when we don't need the upper bits as zeroes? llvm-svn: 362203
* [X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp ↵Craig Topper2019-05-311-11/+57
| | | | | | | | | | | | | | | | | extloads instead. DAG combine will usually fold fpextend+load to an fp extload anyway. So the 256 and 512 patterns were probably unnecessary. The 128 bit pattern was special in that it looked for a v4f32 load, but then used it in an instruction that only loads 64-bits. This is bad if the load happens to be volatile. We could probably make the patterns volatile aware, but that's more work for something that's probably rare. The peephole pass might kick in and save us anyway. We might also be able to fix this with some additional DAG combines. This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be used. Previously we looked for the unlikely vselect+fpextend+load. llvm-svn: 362199
* [X86] Correct the ins operand order for MASKPAIR16STORE to match other store ↵Craig Topper2019-05-311-4/+4
| | | | | | | | | | | | | | | | | | instructions. This makes the 5 address operands come first. And the data operand comes last. This matches the operand order the instruction is created with. It's also the expected order in X86MCInstLower. So everything appeared to work, but the operands didn't match their declared type. Fixes a -verify-machineinstrs failure. Also remove the isel patterns from these instructions since they should only be used for stack spills and reloads. I'm not even sure what types the patterns were looking for to match. llvm-svn: 362193
* [X86] Add VP2INTERSECT instructionsPengfei Wang2019-05-311-0/+57
| | | | | | | | | | Support Intel AVX512 VP2INTERSECT instructions in llvm Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62366 llvm-svn: 362188
* [X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. ↵Craig Topper2019-05-221-69/+0
| | | | | | | | | | | | | | | | | | | | | | Remove a bunch of isel patterns that become unnecessary. We effectively had a second set of isel patterns that tried to use a regular store instruction and an extract_subreg instruction. Or a masked move and an extract_subreg. These patterns were intended to override the matching of VEXTRACT instructions by taking advantage of the priority of the explicit immediate 0 for the index. This patch instaed just disables the immediate 0 matchin the VEXTRACT patterns. This each of the component pieces of the larger patterns will match by themselves. This found a bug of sorts were we didn't use 128-bit store for 512->128 extract on KNL. Its unclear what the right thing here should be. Using the vextract avoids constraining the register allocator to use xmm0-15. But it always results in a longer encoding if the register allocator ends up choosing xmm0-15 anyway. llvm-svn: 361431
* [X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics ↵Craig Topper2019-05-221-46/+0
| | | | | | | | | | | | | | | | | | | | | | | into llvm.ceil/floor. Remove some isel patterns that existed because that was happening. We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond to the libm functions. For the libm functions we need to disable the precision exception so the llvm.floor/ceil functions should always map to encodings 0x9 and 0xA. We had a mix of isel patterns where some used 0x9 and 0xA and others used 0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA. Since we have no way in isel of knowing where the llvm.ceil/floor came from, we can't map X86 specific intrinsics with encodings 1 or 2 to it. We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like to see a use case and optimization advantage first. I've left the backend test cases to show the blend we now emit without the extra isel patterns. But I've removed the InstCombine tests completely. llvm-svn: 361425
* [X86] Use 0x9 instead of 0x1 as the immediate in some masked floor pattern. ↵Craig Topper2019-05-161-4/+4
| | | | | | | | | Similarly change 0x2 to 0xA for ceil. This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate in patterns without masking. llvm-svn: 360915
* [X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly ↵Craig Topper2019-05-061-15/+24
| | | | | | | | | | | | | | printing. We require d/q suffixes on the memory form of these instructions to disambiguate the memory size. We don't require it on the register forms, but need to support parsing both with and without it. Previously we always printed the d/q suffix on the register forms, but it's redundant and inconsistent with gcc and objdump. After this patch we should support the d/q for parsing, but not print it when its unneeded. llvm-svn: 360085
* Revert r359392 and r358887Craig Topper2019-05-061-7/+22
| | | | | | | | | | | | | | | | | | | | Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066
* Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper LakeLuo, Yuanke2019-05-061-0/+140
| | | | | | | | | | | | | | | | | | | | | | | | Summary: 1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake; 2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision. VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data. VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data. VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision. For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Author: LiuTianle Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel Reviewed By: craig.topper Subscribers: kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60550 llvm-svn: 360017
* [X86] Allow assembly parser to accept x/y/z suffixes on non-memory ↵Craig Topper2019-05-031-5/+26
| | | | | | | | | | | | vfpclassps/pd and on memory forms in intel syntax The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned. But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas. The printing code will still only use the suffix for the memory form where it is needed. llvm-svn: 359903
OpenPOWER on IntegriCloud