summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrAVX512.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add more one checks to masked compare patterns that were missed in ↵Craig Topper2019-05-031-46/+48
| | | | | | | | | r358358. This covers the patterns we use for widening 128/256 comparisons to 512-bit when AVX512VL isn't supported. llvm-svn: 359863
* [X86] Remove the redundant suffix in vfpclassp[d,s]'s broadcasting variantCraig Topper2019-05-021-9/+9
| | | | | | | | | | The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D61295 llvm-svn: 359753
* [X86] Remove some intel syntax aliases on (v)cvtpd2(u)dq, (v)cvtpd2ps, ↵Craig Topper2019-04-291-24/+154
| | | | | | | | | | | | | | | | | | | (v)cvt(u)qq2ps. Add 'x' and'y' suffix aliases to masked version of the same in att syntax. The 128/256 bit version of these instructions require an 'x' or 'y' suffix to disambiguate the memory form in att syntax. We were allowing the same suffix in intel syntax, but it appears gas does not do that. gas does allow the 'x' and 'y' suffix on register and broadcast forms even though its not needed. We were allowing it on unmasked register form, but not on masked versions or on masked or unmasked broadcast form. While there fix some test coverage holes so they can be extended with the 'x' and 'y' suffix tests. llvm-svn: 359418
* [X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result ↵Craig Topper2019-04-281-22/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MOVD/MOVQ and COPY_TO_REGCLASS instead Summary: The register form of these instructions are CodeGenOnly instructions that cover GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for the opposite bitcast. Due to the patterns using bitcasts these instructions get marked as "bitcast" machine instructions as well. The peephole pass is able to look through these as well as other copies to try to avoid register bank copies. Because FR32/FR64/VR128 are all coalescable to each other we can end up in a situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to GR32->GR64 which the copyPhysReg code can't handle. To prevent this, this patch removes one set of the 'bitcast' instructions. So now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that converts from GR32/GR64->VR128 has no special significance to the peephole pass and won't be looked through. I guess the other option would be to add support to copyPhysReg to just promote the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined anyway. But removing the CodeGenOnly instruction in favor of one that won't be optimized seemed safer. I deleted the peephole test because it couldn't be made to work with the bitcast instructions removed. The load version of the instructions were unnecessary as the pattern that selects them contains a bitcasted load which should never happen. Fixes PR41619. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61223 llvm-svn: 359392
* [X86] Use MOVQ for i64 atomic_stores when SSE2 is enabledCraig Topper2019-04-271-0/+5
| | | | | | | | | | | | | | | | Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store. Reviewers: spatel, RKSimon, reames, jfb, efriedma Reviewed By: RKSimon Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60546 llvm-svn: 359368
* [X86] Add the rounding control operand to the printing for some scalar FMA ↵Craig Topper2019-04-211-1/+1
| | | | | | instructions. llvm-svn: 358844
* [X86] Don't form masked vfpclass instruction from and+vfpclass unless the ↵Craig Topper2019-04-211-28/+36
| | | | | | fpclass only has a single use. llvm-svn: 358841
* [X86] Redefine KUNPCK instructions to take a narrower source register class ↵Craig Topper2019-04-141-11/+9
| | | | | | | | | than destination register class. Remove copies from the isel output pattern. There's no reason for the inputs to be the destination register class. This just forces an unnecessary copy in the output patterns. llvm-svn: 358362
* [X86] Move VPTESTM matching from the isel table to custom code in ↵Craig Topper2019-04-141-251/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | X86ISelDAGToDAG. We had many tablegen patterns for these instructions. And due to the commutability of the patterns, tablegen expands them to even more patterns. All together VPTESTMD patterns accounted for more the 50K of the 610K isel table. This had gotten bad when we stopped canonicalizing AND to vXi64. This required a pattern for every combination of bitcast input type. This change moves the matching to custom code where it is easier to look through the bitcasts without being concerned with the specific types. The test changes are because we are now stricter with one use checks as its required to make load folding legal. We now require the AND and any BITCAST to only have a single use. This prevents forming VPTESTM and a VPAND with the same inputs. We now support broadcast loads for 128/256 patterns without VLX. We'll widen to 512-bit like and still fold the broadcast since the amount of memory read doesn't change. There are a few tests that got slightly longer because are now prefering load + VPTESTM over XOR+VPCMPEQ for (seteq (load), allzeros). Previously we were able to share the XOR with multiple VPTESTM instructions. llvm-svn: 358359
* [X86] Don't form masked vpcmp/vcmp/vptestm operations if the setcc node has ↵Craig Topper2019-04-141-148/+256
| | | | | | | | | | | | | | more than one use. We're better of emitting a single compare + kand rather than a compare for the other use and a masked compare. I'm looking into using custom instruction selection for VPTESTM to reduce the ridiculous number of permutations of patterns in the isel table. Putting a one use check on all masked compare folding makes load fold matching in the custom code easier. llvm-svn: 358358
* [X86] Remove some unused tablegen multiclasses. NFCCraig Topper2019-04-141-27/+0
| | | | llvm-svn: 358345
* [X86] Make _Int instructions the preferred instructon for the assembly ↵Craig Topper2019-04-101-21/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | parser and disassembly parser to remove inconsistencies between VEX and EVEX. Many of our instructions have both a _Int form used by intrinsics and a form used by other IR constructs. In the EVEX space the _Int versions usually cover all the capabilities include broadcasting and rounding. While the other version only covers simple register/register or register/load forms. For this reason in EVEX, the non intrinsic form is usually marked isCodeGenOnly=1. In the VEX encoding space we were less consistent, but usually the _Int version was the isCodeGenOnly version. This commit makes the VEX instructions match the EVEX instructions. This was done by manually studying the AsmMatcher table so its possible I missed some cases, but we should be closer now. I'm thinking about using the isCodeGenOnly bit to simplify the EVEX2VEX tablegen code that disambiguates the _Int and non _Int versions. Currently it checks register class sizes and Record the memory operands come from. I have some other changes I was looking into for D59266 that may break the memory check. I had to make a few scheduler hacks to keep the _Int versions from being treated differently than the non _Int version. Differential Revision: https://reviews.llvm.org/D60441 llvm-svn: 358138
* [X86] Support the EVEX versions vcvt(t)ss2si and vcvt(t)sd2si with the ↵Craig Topper2019-04-101-32/+10
| | | | | | | | | | | | {evex} pseudo prefix in the assembler. The EVEX versions are ambiguous with the VEX versions based on operands alone so we had explicitly dropped them from the AsmMatcher table. Unfortunately, when we add them they incorrectly show in the table before their VEX counterparts. This is different how the prioritization normally works. To fix this we have to explicitly reject the instructions unless the {evex} prefix has been seen. llvm-svn: 358041
* [X86] Add VEX_LIG to scalar VEX/EVEX instructions that were missing it.Craig Topper2019-04-091-21/+21
| | | | | | | | | | Scalar VEX/EVEX instructions don't use the L bit and don't look at it for decoding either. So we should ignore it in our disassembler. The missing instructions here were found by grepping the raw tablegen class definitions in the tablegen debug output. llvm-svn: 358040
* [X85][AVX] Add missing vXi16 broadcast fold patternsSimon Pilgrim2019-03-281-0/+18
| | | | | | | | Now that D59484 has landed its easier to add these. Added missing AVX512BW v32i16 equivalents while I was at it. llvm-svn: 357155
* [X86] Remove the _alt forms of (V)CMP instructions. Use a combination of ↵Craig Topper2019-03-181-90/+29
| | | | | | | | | | custom printing and custom parsing to achieve the same result and more Similar to previous change done for VPCOM and VPCMP Differential Revision: https://reviews.llvm.org/D59468 llvm-svn: 356384
* [X86] Remove the _alt forms of AVX512 VPCMP instructions. Use a combination ↵Craig Topper2019-03-171-73/+22
| | | | | | | | | | of custom printing and custom parsing to achieve the same result and more Similar to the previous patch for VPCOM. Differential Revision: https://reviews.llvm.org/D59398 llvm-svn: 356344
* [X86] Strip the SAE bit from the rounding mode passed to the _RND opcodes. ↵Craig Topper2019-03-151-30/+30
| | | | | | | | | | | | Use TargetConstant to save a conversion in the isel table. The asm parser generates the immediate without the SAE bit. So for consistency we should generate the MCInst the same way from CodeGen. Since they are now both the same, remove the masking from the printer and replace with an llvm_unreachable. Use a target constant since we're rebuilding the node anyway. Then we don't have to have isel convert it. Saves about 500 bytes from the isel table. llvm-svn: 356294
* [X86] Add SCALAR_SINT_TO_FP/SCALAR_UINT_TO_FP ISD opcodes without rounding mode.Craig Topper2019-03-111-13/+16
| | | | | | After this we no longer need to match FROUND_CURRENT or FROUND_NO_EXC during isel so I remove those. llvm-svn: 355807
* [X86] Split SCALEF(S) ISD opcodes into a version without rounding mode.Craig Topper2019-03-111-23/+22
| | | | llvm-svn: 355806
* [X86] Split RCP28/RSQRT/GETEXP/EXP2 ISD opcodes into SAE and current ↵Craig Topper2019-03-111-42/+46
| | | | | | direction nodes. Remove rounding mode operand. llvm-svn: 355805
* [X86] Rename _RND versions of RANGE/REDUCE/GETMANT/RDNSCALE ISD opcodes to ↵Craig Topper2019-03-111-28/+25
| | | | | | | | _SAE. Remove SAE operand. No need to explicitly store it and match it during isel. llvm-svn: 355804
* [X86] Rename X86ISD::CVTPH2PS_RND to CVTPH2PS_SAE. Remove SAE operand.Craig Topper2019-03-111-2/+1
| | | | llvm-svn: 355803
* [X86] Rename the CVTT*_RND ISD nodes to _SAE and remove the SAE operand. ↵Craig Topper2019-03-111-47/+44
| | | | | | | | Split VFPROUNDS_RND/VFPEXT(S)_RND into versions without rounding operand. For VFPEXT(S) we only need current rounding mode and an SAE version. Neither need extra operand. llvm-svn: 355802
* [X86] Rename X86ISD::CMPM_RND and X86ISD::FSETCCM_RND to _SAE instead of ↵Craig Topper2019-03-111-7/+5
| | | | | | | | _RND. Remove rounding operand. The operand could only be the SAE encoding so no need to include it. llvm-svn: 355801
* [X86] Split the VFIXUPIMM/VFIXUPIMMS nodes into a current rounding mode and ↵Craig Topper2019-03-111-47/+39
| | | | | | | | SAE ISD opcode. Remove matching of FROUND_CURRENT and FROUND_NO_EXC for these nodes from isel table. llvm-svn: 355800
* [X86] Begin removing matching of FROUND_CURRENT and FROUND_NO_EXC from isel ↵Craig Topper2019-03-111-26/+21
| | | | | | | | | | tables. Instead I plan to have dedicated nodes for FROUND_CURRENT and FROUND_NO_EXC. This patch starts with FADDS/FSUBS/FMULS/FDIVS/FMAXS/FMINS/FSQRTS. llvm-svn: 355799
* [X86] Remove unneeded isel patterns from VCVTSI2SDZ and VCVTUSI2SDZ. NFCCraig Topper2019-03-111-3/+3
| | | | | | | | We had patterns using X86ISD::SCALAR_SINT_TO_FP_RND/SCALAR_UINT_TO_FP_RND for these instructions. There's nothing to round. Instead, we use a regular sint_to_fp/uint_to_fp and a movsd as the pattern for these. llvm-svn: 355796
* [X86] Remove VCVTSI2SDZrrb_Int as it shouldn't exist.Craig Topper2019-03-111-1/+1
| | | | | | This would convert a signed 32-bit integer to double precision with rounding. But there's nothing to round. llvm-svn: 355795
* Recommit r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers ↵Craig Topper2019-03-101-47/+41
| | | | | | | | | | | | | | | | | | for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary." Includes a fix to emit a CheckOpcode for build_vector when immAllZerosV/immAllOnesV is used as a pattern root. This means it can't be used to look through bitcasts when used as a root, but that's probably ok. This extra CheckOpcode will ensure that the first match in the isel table will be a SwitchOpcode which is needed by the caching optimization in the ISel Matcher. Original commit message: Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts. By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up. This removes something like 40,000 bytes from the X86 isel table. Differential Revision: https://reviews.llvm.org/D58595 llvm-svn: 355784
* Revert r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers for ↵Craig Topper2019-03-051-41/+47
| | | | | | | | immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary." This caused the first matcher in the isel table for many targets to Opc_Scope instead of Opc_SwitchOpcode. This leads to a significant increase in isel match failures. llvm-svn: 355433
* [TableGen][SelectionDAG][X86] Add specific isel matchers for ↵Craig Topper2019-03-011-47/+41
| | | | | | | | | | | | | | immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary. Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts. By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up. This removes something like 40,000 bytes from the X86 isel table. Differential Revision: https://reviews.llvm.org/D58595 llvm-svn: 355224
* [X86] Use PreprocessISelDAG to convert vector sra/srl/shl to the X86 ↵Craig Topper2019-02-281-106/+10
| | | | | | | | | | | | specific variable shift ISD opcodes. These allows use to use the same set of isel patterns for sra/srl/shl which are undefined for out of range shifts and intrinsic shifts which aren't undefined. Doing this late allows DAG combine to have every opportunity to optimize the sra/srl/shl nodes. This removes about 7000 bytes from the isel table and simplies the td files. llvm-svn: 355071
* [X86][AVX] Update VBROADCAST folds to always use v2i64 X86vzloadSimon Pilgrim2019-02-191-2/+2
| | | | | | | | The VBROADCAST combines and SimplifyDemandedVectorElts improvements mean that we now more consistently use shorter (128-bit) X86vzload input operands. Follow up to D58053 llvm-svn: 354346
* [X86] Move some vector InstAliases out from under unnecessary 'let ↵Craig Topper2019-02-101-49/+45
| | | | | | | | Predicates'. NFCI We don't have any assembler predicates for vector ISAs so this isn't necessary. It just adds extra lines and identation. llvm-svn: 353631
* [X86][Btver2] Improved latency/throughput model for scalar int-to-float ↵Andrea Di Biagio2019-01-291-3/+3
| | | | | | | | | | | | | | conversions. Account for bypass delays when computing the latency of scalar int-to-float conversions. On Jaguar we need to account for an extra 6cy latency (see AMD fam16h SOG). This patch also fixes the number of micropcodes for the register-memory variants of scalar int-to-float conversions. Differential Revision: https://reviews.llvm.org/D57148 llvm-svn: 352518
* [X86] Add DAG combine to merge vzext_movl with the various fp<->int ↵Craig Topper2019-01-261-32/+0
| | | | | | | | | | | | | | | | conversion operations that only write the lower 64-bits of an xmm register and zero the rest. Summary: We have isel patterns for this, but we're missing some load patterns and all broadcast patterns. A DAG combine seems like a better fit for this. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56971 llvm-svn: 352260
* [X86][SSE] Add selective commutation support for insertps (PR40340)Simon Pilgrim2019-01-221-0/+1
| | | | | | | | When we are inserting 1 "inline" element, and zeroing 2 of the other elements then we can safely commute the insertps source inputs to improve memory folding. Differential Revision: https://reviews.llvm.org/D56843 llvm-svn: 351807
* [X86] Use X86ISD::VFPROUND instead of ISD::FP_ROUND for 256 and 512 bit ↵Craig Topper2019-01-211-11/+64
| | | | | | | | | | | | | | | | | | | cvtpd2ps intrinsics. Summary: Use X86ISD::VFPROUND in the instruction isel patterns. Add new patterns for ISD::FP_ROUND to maintain support for fptrunc in IR. In the process I found a couple duplicate isel patterns which I also deleted in this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56991 llvm-svn: 351762
* [X86] Change avx512 COMPRESS and EXPAND lowering to use a single masked node ↵Craig Topper2019-01-211-4/+17
| | | | | | | | | | | | | | | | | | | | | | | instead of expand/compress+select. Summary: For compress, a select node doesn't semantically reflect the behavior of the instruction. The mask would have holes in it, but the resulting write is to contiguous elements at the bottom of the vector. Furthermore, as far as the compressing and expanding is concerned the behavior is depended on the mask. You can't just have an expand/compress node that only reads the input vector. That node would have no meaning by itself. This all only works because we pattern match the compress/expand+select back to the instruction. But conceivably an optimization of the select could break the pattern and leave something meaningless. This patch modifies the expand and compress node to take the mask and passthru as additional inputs and gets rid of the select all together. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57002 llvm-svn: 351761
* [X86] Add masked MCVTSI2P/MCVTUI2P ISD opcodes to model the cvtqq2ps ↵Craig Topper2019-01-191-7/+64
| | | | | | | | cvtuqq2ps nodes that produce less than 128-bits of results. These nodes zero the upper half of the result and can't be represented with vselect. llvm-svn: 351666
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [X86] Add X86ISD::VSHLV and X86ISD::VSRLV nodes for psllv and psrlvCraig Topper2019-01-161-23/+56
| | | | | | | | | | | | | | Previously we used ISD::SHL and ISD::SRL to represent these in SelectionDAG. ISD::SHL/SRL interpret an out of range shift amount as undefined behavior and will constant fold to undef. While the intrinsics are defined to return 0 for out of range shift amounts. A previous patch added a special node for VPSRAV to produce all sign bits. This was previously believed safe because undefs frequently get turned into 0 either from the constant pool or a desire to not have a false register dependency. But undef is treated specially in some optimizations. For example, its ignored in detection of vector splats. So if the ISD::SHL/SRL can be constant folded and all of the elements with in bounds shift amounts are the same, we might fold it to single element broadcast from the constant pool. This would not put 0s in the elements with out of bounds shift amounts. We do have an existing InstCombine optimization to use shl/lshr when the shift amounts are all constant and in bounds. That should prevent some loss of constant folding from this change. Patch by zhutianyang and Craig Topper Differential Revision: https://reviews.llvm.org/D56695 llvm-svn: 351381
* [X86] Add more ISD nodes to handle masked versions of ↵Craig Topper2019-01-131-3/+121
| | | | | | | | | | VCVT(T)PD2DQZ128/VCVT(T)PD2UDQZ128 which only produce 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877 llvm-svn: 351018
* [X86] Add X86ISD::VMFPROUND to handle the masked case of VCVTPD2PSZ128 which ↵Craig Topper2019-01-131-12/+69
| | | | | | | | | | only produces 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877. llvm-svn: 351017
* [X86] Add ISD node for masked version of CVTPS2PH.Craig Topper2019-01-121-6/+23
| | | | | | | | | | The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do. This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there. Fixes another instruction for PR34877. llvm-svn: 350994
* [X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not ↵Craig Topper2019-01-121-0/+2
| | | | | | | | | | avx512dq, use v16i1 as the intermediate mask type instead of v8i1. We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type. By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned. llvm-svn: 350989
* [X86] Add ISD nodes for masked truncate so we can properly represent when ↵Craig Topper2019-01-121-62/+186
| | | | | | | | | | | | the output has more elements than the input due to needing to be 128 bits. We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask. This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that. This fixes some of PR34877 llvm-svn: 350985
* [X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. ↵Craig Topper2019-01-111-0/+11
| | | | | | | | | | | | Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector. We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions. Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector. This fixes some of the regressions from D350800. llvm-svn: 350918
* [x86] add load fold patterns for movddup with vzext_loadSanjay Patel2018-12-221-0/+2
| | | | | | | | | | | The missed load folding noticed in D55898 is visible independent of that change either with an adjusted IR pattern to start or with AVX2/AVX512 (where the build vector becomes a broadcast first; movddup is not produced until we get into isel via tablegen patterns). Differential Revision: https://reviews.llvm.org/D55936 llvm-svn: 350005
OpenPOWER on IntegriCloud