summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Connect the output chain properly when combining vzext_movl+load into ↵Craig Topper2019-06-281-1/+1
| | | | | | vzext_load. llvm-svn: 364625
* [X86] Remove some duplicate patterns that already exist as part of their ↵Craig Topper2019-06-281-5/+1
| | | | | | instruction definition. NFC llvm-svn: 364623
* [x86] prevent crashing from select narrowing with AVX512Sanjay Patel2019-06-271-0/+9
| | | | llvm-svn: 364585
* [X86] combineX86ShufflesRecursively - merge shuffles with more than 2 inputsSimon Pilgrim2019-06-271-4/+0
| | | | | | We already had the infrastructure for this, but were waiting for the fix for a number of regressions which were handled by the recent shuffle(extract_subvector(),extract_subvector()) -> extract_subvector(shuffle()) shuffle combines llvm-svn: 364569
* Use getConstantOperandAPInt instead of getConstantOperandVal for comparisons.Simon Pilgrim2019-06-271-8/+8
| | | | | | getConstantOperandAPInt avoids any large integer issues - these are unlikely but the fuzzers do like to mess around..... llvm-svn: 364564
* [X86] getTargetVShiftByConstNode - reduce variable scope. NFCI.Simon Pilgrim2019-06-271-8/+7
| | | | | | Fixes cppcheck warning. llvm-svn: 364561
* [Backend] Keep call site info valid through the backendDjordje Todorovic2019-06-271-0/+1
| | | | | | | | | | | | | | | | | | | | Handle call instruction replacements and deletions in order to preserve valid state of the call site info of the MachineFunction. NOTE: If the call site info is enabled for a new target, the assertion from the MachineFunction::DeleteMachineInstr() should help to locate places where the updateCallSiteInfo() should be called in order to preserve valid state of the call site info. ([10/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D61062 llvm-svn: 364536
* [X86] getFauxShuffle - add DemandedElts as a filterSimon Pilgrim2019-06-271-8/+17
| | | | | | This is currently benign but will be used in the future based on the elements referenced by the parent shuffle(s). llvm-svn: 364530
* [X86][AVX] SimplifyDemandedVectorElts - combine PERMPD(x) -> EXTRACTF128(X) Simon Pilgrim2019-06-271-0/+16
| | | | | | If we only use the bottom lane, see if we can simplify this to extract_subvector - which is always at least as quick as PERMPD/PERMQ. llvm-svn: 364518
* [ISEL][X86] Tracking of registers that forward call argumentsDjordje Todorovic2019-06-271-1/+9
| | | | | | | | | | | | | | | | | While lowering calls, collect info about registers that forward arguments into following function frame. We store such info into the MachineFunction of the call. This is used very late when dumping DWARF info about call site parameters. ([9/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60715 llvm-svn: 364516
* [GlobalISel] Accept multiple vregs for lowerCall's argsDiana Picus2019-06-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | Change the interface of CallLowering::lowerCall to accept several virtual registers for each argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63551 llvm-svn: 364512
* [GlobalISel] Accept multiple vregs for lowerCall's resultDiana Picus2019-06-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Change the interface of CallLowering::lowerCall to accept several virtual registers for the call result, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63550 llvm-svn: 364511
* [GlobalISel] Accept multiple vregs in lowerFormalArgsDiana Picus2019-06-272-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510
* [GlobalISel] Allow multiple VRegs in ArgInfo. NFCDiana Picus2019-06-271-5/+9
| | | | | | | | | | | Allow CallLowering::ArgInfo to contain more than one virtual register. This is useful when passes split aggregates into several virtual registers, but need to also provide information about the original type to the call lowering. Used in follow-up patches. Differential Revision: https://reviews.llvm.org/D63548 llvm-svn: 364509
* Silence gcc warning after r364458Mikael Holmen2019-06-271-1/+1
| | | | | | | | | | | | Without the fix gcc 7.4.0 complains with ../lib/Target/X86/X86ISelLowering.cpp: In function 'bool getFauxShuffleMask(llvm::SDValue, llvm::SmallVectorImpl<int>&, llvm::SmallVectorImpl<llvm::SDValue>&, llvm::SelectionDAG&)': ../lib/Target/X86/X86ISelLowering.cpp:6690:36: error: enumeral and non-enumeral type in conditional expression [-Werror=extra] int Idx = (ZeroMask[j] ? SM_SentinelZero : (i + j + Ofs)); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1plus: all warnings being treated as errors llvm-svn: 364507
* [X86] Remove (vzext_movl (scalar_to_vector (load))) matching code from ↵Craig Topper2019-06-271-17/+0
| | | | | | | | selectScalarSSELoad. I think this will be turning into vzext_load during DAG combine. llvm-svn: 364499
* [X86] Teach selectScalarSSELoad to not narrow volatile loads.Craig Topper2019-06-271-5/+7
| | | | llvm-svn: 364498
* [X86] Rework the logic in LowerBuildVectorv16i8 to make better use of ↵Craig Topper2019-06-261-37/+37
| | | | | | | | | | | | any_extend and break false dependencies. Other improvements This patch rewrites the loop iteration to only visit every other element starting with element 0. And we work on the "even" element and "next" element at the same time. The "First" logic has been moved to the bottom of the loop and doesn't run on every element. I believe it could create dangling nodes previously since we didn't check if we were going to use SCALAR_TO_VECTOR for the first insertion. I got rid of the "First" variable and just do a null check on V which should be equivalent. We also no longer use undef as the starting V for vectors with no zeroes to avoid false dependencies. This matches v8i16. I've changed all the extends and OR operations to use MVT::i32 since that's what they'll be promoted to anyway. I've tried to use zero_extend only when necessary and use any_extend otherwise. This resulted in some improvements in tests where we are now able to promote aligned (i32 (extload i8)) to a 32-bit load. Differential Revision: https://reviews.llvm.org/D63702 llvm-svn: 364469
* [X86] Remove isTypePromotionOfi1ZeroUpBits and its helpers.Craig Topper2019-06-261-71/+0
| | | | | | | | | This was trying to optimize concat_vectors with zero of setcc or kand instructions. But I think it produced the same code we produce for a concat_vectors with 0 even it it doesn't come from one of those operations. llvm-svn: 364463
* [X86][SSE] getFauxShuffleMask - handle OR(x,y) where x and y have no ↵Simon Pilgrim2019-06-261-0/+34
| | | | | | | | | | overlapping bits Create a per-byte shuffle mask based on the computeKnownBits from each operand - if for each byte we have a known zero (or both) then it can be safely blended. Fixes PR41545 llvm-svn: 364458
* [X86][SSE] X86TargetLowering::isCommutativeBinOp - add PMULDQSimon Pilgrim2019-06-261-3/+1
| | | | | | Allows narrowInsertExtractVectorBinOp to reduce vector size instead of the more restricted SimplifyDemandedVectorEltsForTargetNode llvm-svn: 364434
* [X86][SSE] X86TargetLowering::isCommutativeBinOp - add PCMPEQSimon Pilgrim2019-06-261-0/+1
| | | | | | Allows narrowInsertExtractVectorBinOp to reduce vector size llvm-svn: 364432
* [X86][SSE] X86TargetLowering::isBinOp - add PCMPGTSimon Pilgrim2019-06-261-0/+1
| | | | | | Allows narrowInsertExtractVectorBinOp to reduce vector size llvm-svn: 364431
* [X86] shouldScalarizeBinop - never scalarize target opcodes.Simon Pilgrim2019-06-261-2/+9
| | | | | | We have (almost) no target opcodes that have scalar/vector equivalents - for now assume we can't scalarize them (we can add exceptions if we need to). llvm-svn: 364429
* [X86][Codegen] X86DAGToDAGISel::matchBitExtract(): consistently capture ↵Roman Lebedev2019-06-261-7/+6
| | | | | | lambdas by value llvm-svn: 364420
* [X86] X86DAGToDAGISel::matchBitExtract(): pattern c: truncation awarenessRoman Lebedev2019-06-261-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | Summary: The one thing of note here is that the 'bitwidth' constant (32/64) was previously pessimistic. Given `x & (-1 >> (C - z))`, we were taking `C` to be `bitwidth(x)`, but in reality we want `(-1 >> (C - z))` pattern to mean "low z bits must be all-ones". And for that, `C` should be `bitwidth(-1 >> (C - z))`, i.e. of the shift operation itself. Last pattern D does not seem to exhibit any of these truncation issues. Although it has the opposite problem - if we extract low bits (no shift) from i64, and then truncate to i32, then we fail to shrink this 64-bit extraction into 32-bit extraction. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62806 llvm-svn: 364419
* [X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awarenessRoman Lebedev2019-06-261-5/+17
| | | | | | | | | | | | | | | | | | Summary: (Not so) boringly identical to pattern a (D62786) Not yet sure how do deal with the last pattern c. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62793 llvm-svn: 364418
* [X86] X86DAGToDAGISel::matchBitExtract(): pattern a: truncation awarenessRoman Lebedev2019-06-261-18/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Finally tying up loose ends here. The problem is quite simple: If we have pattern `(x >> start) & (1 << nbits) - 1`, and then truncate the result, that truncation will be propagated upwards, into the `and`. And that isn't currently handled. I'm only fixing pattern `a` here, the same fix will be needed for patterns `b`/`c` too. I *think* this isn't missing any extra legality checks, since we only look past truncations. Similary, i don't think we can get any other truncation there other than i64->i32. Reviewers: craig.topper, RKSimon, spatel Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62786 llvm-svn: 364417
* Fix the build after r364401Hans Wennborg2019-06-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | It was failing with: /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:18772:66: error: call of overloaded 'makeArrayRef(<brace-enclosed initializer list>)' is ambiguous scaleShuffleMask<int>(Scale, makeArrayRef<int>({ 0, 2, 1, 3 }), Mask); ^ /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:18772:66: note: candidates are: In file included from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/CodeGen/MachineFunction.h:20:0, from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/CodeGen/CallingConvLower.h:19, from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.h:17, from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:14: /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/ADT/ArrayRef.h:480:15: note: llvm::ArrayRef<T> llvm::makeArrayRef(const std::vector<_RealType>&) [with T = int] ArrayRef<T> makeArrayRef(const std::vector<T> &Vec) { ^ /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/ADT/ArrayRef.h:485:37: note: llvm::ArrayRef<T> llvm::makeArrayRef(const llvm::ArrayRef<T>&) [with T = int] template <typename T> ArrayRef<T> makeArrayRef(const ArrayRef<T> &Vec) { ^ llvm-svn: 364414
* [X86][AVX] combineExtractSubvector - 'little to big' ↵Simon Pilgrim2019-06-261-0/+28
| | | | | | | | extract_subvector(bitcast()) support Ideally this needs to be a generic combine in DAGCombiner::visitEXTRACT_SUBVECTOR but there's some nasty regressions in aarch64 due to neon shuffles not handling bitcasts at all..... llvm-svn: 364407
* [X86][AVX] truncateVectorWithPACK - avoid bitcasted shufflesSimon Pilgrim2019-06-261-2/+5
| | | | | | | | truncateVectorWithPACK is often used in conjunction with ComputeNumSignBits which struggles when peeking through bitcasts. This fix tries to avoid bitcast(shuffle(bitcast())) patterns in the 256-bit 64-bit sublane shuffles so we can still see through at least until lowering when the shuffles will need to be bitcasted to widen the shuffle type. llvm-svn: 364401
* [ExpandMemCmp] Honor prefer-vector-width.Clement Courbet2019-06-261-2/+3
| | | | | | | | | | | | Reviewers: gchatelet, echristo, spatel, atdt Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63769 llvm-svn: 364384
* Don't look for the TargetFrameLowering in the implementationMatt Arsenault2019-06-251-2/+1
| | | | | | The same oddity was apparently copy-pasted between multiple targets. llvm-svn: 364349
* [X86] Remove isel patterns that look for (vzext_movl (scalar_to_vector (load)))Craig Topper2019-06-253-99/+0
| | | | | | | | I believe these all get canonicalized to vzext_movl. The only case where that wasn't true was when the load was loadi32 and the load was an extload aligned to 32 bits. But that was fixed in r364207. Differential Revision: https://reviews.llvm.org/D63701 llvm-svn: 364337
* [X86] Add a DAG combine to turn vzmovl+load into vzload if the load isn't ↵Craig Topper2019-06-253-24/+20
| | | | | | | | | | | | volatile. Remove isel patterns for vzmovl+load We currently have some isel patterns for treating vzmovl+load the same as vzload, but that shrinks the load which we shouldn't do if the load is volatile. Rather than adding isel checks for volatile. This patch removes the patterns and teachs DAG combine to merge them into vzload when its legal to do so. Differential Revision: https://reviews.llvm.org/D63665 llvm-svn: 364333
* [X86] lowerShuffleAsSpecificZeroOrAnyExtend - add ANY_EXTEND TODO.Simon Pilgrim2019-06-251-0/+1
| | | | | | lowerShuffleAsSpecificZeroOrAnyExtend should be able to lower to ANY_EXTEND_VECTOR_INREG as well as ZER_EXTEND_VECTOR_INREG. llvm-svn: 364313
* [ExpandMemCmp] Move all options to TargetTransformInfo.Clement Courbet2019-06-253-32/+18
| | | | | | Split off from D60318. llvm-svn: 364281
* [X86] Don't a vzext_movl in LowerBuildVectorv16i8/LowerBuildVectorv8i16 if ↵Craig Topper2019-06-241-3/+3
| | | | | | | | | | | | | | there are no zeroes in the vector we're building. In LowerBuildVectorv16i8 we took care to use an any_extend if the first pair is in the lower 16-bits of the vector and no elements are 0. So bits [31:16] will be undefined. But we still emitted a vzext_movl to ensure that bits [127:32] are 0. If we don't need any zeroes we should be consistent and make all of 127:16 undefined. In LowerBuildVectorv8i16 we can just delete the vzext_movl code because we only use the scalar_to_vector when there are no zeroes. So the vzext_movl is always unnecessary. Found while investigating whether (vzext_movl (scalar_to_vector (loadi32)) patterns are necessary. At least one of the cases where they were necessary was where the loadi32 matched 32-bit aligned 16-bit extload. Seemed weird that we required vzext_movl for that case. Differential Revision: https://reviews.llvm.org/D63700 llvm-svn: 364207
* [X86] Cleanups and safety checks around the isFNEGCraig Topper2019-06-241-11/+20
| | | | | | | | | | | | | This patch does a few things to start cleaning up the isFNEG function. -Remove the Op0/Op1 peekThroughBitcast calls that seem unnecessary. getTargetConstantBitsFromNode has its own peekThroughBitcast inside. And we have a separate peekThroughBitcast on the return value. -Add a check of the scalar size after the first peekThroughBitcast to ensure we haven't changed the element size and just did something like f32->i32 or f64->i64. -Remove an unnecessary check that Op1's type is floating point after the peekThroughBitcast. We're just going to look for a bit pattern from a constant. We don't care about its type. -Add VT checks on several places that consume the return value of isFNEG. Due to the peekThroughBitcasts inside, the type of the return value isn't guaranteed. So its not safe to use it to build other nodes without ensuring the type matches the type being used to build the node. We might be able to replace these checks with bitcasts instead, but I don't have a test case so a bail out check seemed better for now. Differential Revision: https://reviews.llvm.org/D63683 llvm-svn: 364206
* GlobalISel: Remove unsigned variant of SrcOpMatt Arsenault2019-06-242-14/+14
| | | | | | | | | Force using Register. One downside is the generated register enums require explicit conversion. llvm-svn: 364194
* CodeGen: Introduce a class for registersMatt Arsenault2019-06-246-26/+26
| | | | | | | | | Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg(). llvm-svn: 364191
* [X86] Turn v16i16->v16i8 truncate+store into a any_extend+truncstore if we ↵Craig Topper2019-06-232-3/+14
| | | | | | | | | | | | | | | | avx512f, but not avx512bw. Ideally we'd be able to represent this truncate as a any_extend to v16i32 and a truncate, but SelectionDAG doens't know how to not fold those together. We have isel patterns to use a vpmovzxwd+vpdmovdb for the truncate, but we aren't able to simultaneously fold the load and the store from the isel pattern. By pulling the truncate into the store we can successfully hide it from the DAG combiner. Then we can isel pattern match the truncstore and load+any_extend separately. llvm-svn: 364163
* [X86] Fix isel pattern that was looking for a bitcasted load. Remove what ↵Craig Topper2019-06-231-13/+1
| | | | | | | | | | appears to be a copy/paste mistake. DAG combine should ensure bitcasts of loads don't exist. Also remove 3 patterns that are identical to the block above them. llvm-svn: 364158
* [X86][SelectionDAG] Cleanup and simplify masked_load/masked_store in ↵Craig Topper2019-06-233-59/+35
| | | | | | | | | | | | | | | | | | | | tablegen. Use more precise PatFrags for scalar masked load/store. Rename masked_load/masked_store to masked_ld/masked_st to discourage their direct use. We need to check truncating/extending and compressing/expanding before using them. This revealed that our scalar masked load/store patterns were misusing these. With those out of the way, renamed masked_load_unaligned and masked_store_unaligned to remove the "_unaligned". We didn't check the alignment anyway so the name was somewhat misleading. Make the aligned versions inherit from masked_load/store instead from a separate identical version. Merge the 3 different alignments PatFrags into a single version that uses the VT from the SDNode to determine the size that the alignment needs to match. llvm-svn: 364150
* [X86][SSE] Fold extract_subvector(vselect(x,y,z),0) -> ↵Simon Pilgrim2019-06-221-0/+10
| | | | | | vselect(extract_subvector(x,0),extract_subvector(y,0),extract_subvector(z,0)) llvm-svn: 364136
* [X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into ↵Craig Topper2019-06-213-57/+16
| | | | | | | | | | | | | | | | (insert_subvector allzeros, (vzmovl X), 0) 128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg. This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns. Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining. I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload. Differential Revision: https://reviews.llvm.org/D63512 llvm-svn: 364095
* [X86] Don't mark v64i8/v32i16 ISD::SELECT as custom unless they are legal types.Craig Topper2019-06-211-7/+4
| | | | | | | | | We don't have any Custom handling during type legalization. Only operation legalization. Fixes PR42355 llvm-svn: 364093
* [X86] Add a debug print of the node in the default case for unhandled ↵Craig Topper2019-06-211-0/+4
| | | | | | | | | | opcodes in ReplaceNodeResults. This should be unreachable, but bugs can make it reachable. This adds a debug print so we can see the bad node in the output when the llvm_unreachable triggers. llvm-svn: 364091
* [X86][AVX] Combine INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) as shuffleSimon Pilgrim2019-06-211-2/+15
| | | | | | Subvector shuffling often ends up as insert/extract subvector. llvm-svn: 364090
* [X86] Use vmovq for v4i64/v4f64/v8i64/v8f64 vzmovl.Craig Topper2019-06-212-53/+35
| | | | | | | | | | | | | | We already use vmovq for v2i64/v2f64 vzmovl. But we were using a blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or movsd+xorpd under optsize. I think the blend with 0 or movss/d is only needed for vXi32 where we don't have an instruction that can move 32 bits from one xmm to another while zeroing upper bits. movq is no worse than blendpd on any known CPUs. llvm-svn: 364079
OpenPOWER on IntegriCloud