summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86ISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add custom type legalization for v16i64->v16i8 truncate and ↵Craig Topper2019-10-061-3/+23
| | | | | | | | | | | | | | | | | | | | | v8i64->v8i8 truncate when v8i64 isn't legal Summary: The default legalization for v16i64->v16i8 tries to create a multiple stage truncate concatenating after each stage and truncating again. But avx512 implements truncates with multiple uops. So it should be better to truncate all the way to the desired element size and then concatenate the pieces using unpckl instructions. This minimizes the number of 2 uop truncates. The unpcks are all single uop instructions. I tried to handle this by just custom splitting the v16i64->v16i8 shuffle. And hoped that the DAG combiner would leave the two halves in the state needed to make D68374 do the job for each half. This worked for the first half, but the second half got messed up. So I've implemented custom handling for v8i64->v8i8 when v8i64 needs to be split to produce the VTRUNCs directly. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68428 llvm-svn: 373864
* [X86][SSE] resolveTargetShuffleInputs - call getTargetShuffleInputs instead ↵Simon Pilgrim2019-10-061-5/+4
| | | | | | of using setTargetShuffleZeroElements directly. NFCI. llvm-svn: 373855
* [X86][AVX] combineExtractSubvector - merge duplicate variables. NFCI.Simon Pilgrim2019-10-061-18/+17
| | | | llvm-svn: 373849
* [X86][SSE] matchVectorShuffleAsBlend - use Zeroable element mask directly.Simon Pilgrim2019-10-061-34/+13
| | | | | | | | | | We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly. This allows us to remove createTargetShuffleMask. This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask. llvm-svn: 373846
* [X86] Enable AVX512BW for memcmp()David Zarzycki2019-10-061-2/+7
| | | | llvm-svn: 373845
* [X86][AVX] Push sign extensions of comparison bool results through bitops ↵Simon Pilgrim2019-10-051-6/+26
| | | | | | | | | | | | (PR42025) As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop. This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended. Differential Revision: https://reviews.llvm.org/D68226 llvm-svn: 373834
* [X86] lowerShuffleAsLanePermuteAndRepeatedMask - variable renames. NFCI.Simon Pilgrim2019-10-051-27/+27
| | | | | | Rename some variables to match lowerShuffleAsRepeatedMaskAndLanePermute - prep work toward adding some equivalent sublane functionality. llvm-svn: 373832
* [X86] Add DAG combine to form saturating VTRUNCUS/VTRUNCS from VTRUNCCraig Topper2019-10-041-0/+14
| | | | | | | | We already do this for ISD::TRUNCATE, but we can do the same for X86ISD::VTRUNC Differential Revision: https://reviews.llvm.org/D68432 llvm-svn: 373765
* [X86] Add v32i8 shuffle lowering strategy to recognize two v4i64 vectors ↵Craig Topper2019-10-031-0/+44
| | | | | | | | | | | | | truncated to v4i8 and concatenated into the lower 8 bytes with undef/zero upper bytes. This patch recognizes the shuffle pattern we get from a v8i64->v8i8 truncate when v8i64 isn't a legal type. With VLX we can use two VTRUNCs, unpckldq, and a insert_subvector. Diffrential Revision: https://reviews.llvm.org/D68374 llvm-svn: 373645
* [X86] matchShuffleWithSHUFPD - use Zeroable element mask directly. NFCI.Simon Pilgrim2019-10-031-7/+7
| | | | | | | | | | We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly. This only leaves one user of createTargetShuffleMask which we can hopefully get rid of in a similar manner. This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask. llvm-svn: 373641
* [X86] Add DAG combine to turn (bitcast (vbroadcast_load)) into just a ↵Craig Topper2019-10-031-0/+15
| | | | | | | | | | | | | | | | vbroadcast_load if the scalar size is the same. This improves broadcast load folding of i64 elements on 32-bit targets where i64 isn't legal. Previously we had to represent these as vXf64 vbroadcast_loads and a bitcast to vXi64. But we didn't have any isel patterns looking for that. This also allows us to remove or simplify some isel patterns that were looking for bitcasted vbroadcast_loads. llvm-svn: 373566
* [X86] Rewrite to the vXi1 subvector insertion code to not rely on the value ↵Craig Topper2019-10-021-14/+26
| | | | | | | | | | | | | | of bits that might be undef The previous code tried to do a trick where we would extract the subvector from the location we were inserting. Then xor that with the new value. Take the xored value and clear out the bits above the subvector size. Then shift that xored subvector to the insert location. And finally xor that with the original vector. Since the old subvector was used in both xors, this would leave just the new subvector at the inserted location. Since the surrounding bits had been zeroed no other bits of the original vector would be modified. Unfortunately, if the old subvector came from undef we might aggressively propagate the undef. Then we end up with the XORs not cancelling because they aren't using the same value for the two uses of the old subvector. @bkramer gave me a case that demonstrated this, but we haven't reduced it enough to make it easily readable to see what's happening. This patch uses a safer, but more costly approach. It isolate the bits above the insertion and bits below the insert point and ORs those together leaving 0 for the insertion location. Then widens the subvector with 0s in the upper bits, shifts it into position with 0s in the lower bits. Then we do another OR. Differential Revision: https://reviews.llvm.org/D68311 llvm-svn: 373495
* [X86] Add a DAG combine to shrink vXi64 gather/scatter indices that are ↵Craig Topper2019-10-011-0/+34
| | | | | | | | | | | | constant with sufficient sign bits to fit in vXi32 The gather/scatter instructions can implicitly sign extend the indices. If we're operating on 32-bit data, an v16i64 index can force a v16i32 gather to be split in two since the index needs 2 registers. If we can shrink the index to the i32 we can avoid the split. It should always be safe to shrink the index regardless of the number of elements. We have gather/scatter instructions that can use v2i32 index stored in a v4i32 register with v2i64 data size. I've limited this to before legalize types to avoid creating a v2i32 after type legalization. We could check for it, but we'd also need testing. I'm also only handling build_vectors with no bitcasts to be sure the truncate will constant fold. Differential Revision: https://reviews.llvm.org/D68247 llvm-svn: 373408
* [X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load ↵Craig Topper2019-10-011-9/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | broadcasted to a vector. Summary: This adds the ISD opcode and a DAG combine to create it. There are probably some places where we can directly create it, but I'll leave that for future work. This updates all of the isel patterns to look for this new node. I had to add a few additional isel patterns for aligned extloads which we should probably fix with a DAG combine or something. This does mean that the broadcast load folding for avx512 can no longer match a broadcasted aligned extload. There's still some work to do here for combining a broadcast of a broadcast_load. We also need to improve extractelement or demanded vector elements of a broadcast_load. I'll try to get those done before I submit this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68198 llvm-svn: 373349
* TLI: Remove DAG argument from getRegisterByNameMatt Arsenault2019-10-011-6/+4
| | | | | | | | | | | Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292
* [X86] Mask off upper bits of splat element in LowerBUILD_VECTORvXi1 when ↵Craig Topper2019-09-301-2/+12
| | | | | | | | | | | | | | | | forming a SELECT. The i1 scalar would have been type legalized to i8, but that doesn't guarantee anything about the upper bits. If we're going to use it as condition we need to make sure the upper bits are 0. I've special cased ISD::SETCC conditions since that should guarantee zero upper bits. We could go further and use computeKnownBits, but we have no tests that would need that. Fixes PR43507. llvm-svn: 373246
* [X86] Address post-commit review from code I accidentally commited in r373136.Craig Topper2019-09-301-3/+6
| | | | | | See https://reviews.llvm.org/D68167 llvm-svn: 373245
* [X86] Add ANY_EXTEND to switch in ReplaceNodeResults, but just fall back to ↵Craig Topper2019-09-301-0/+6
| | | | | | | | | | | | | default handling. ANY_EXTEND of v8i8 is marked Custom on AVX512 for handling extends from v8i8. But the type legalization infrastructure will call ReplaceNodeResults for v8i8 results. We should just defer it the default handling instead of asserting in the default of the switch. Fixes PR43509. llvm-svn: 373234
* [X86] Split v16i32/v8i64 bitreverse on avx512f targets without avx512bw to ↵Craig Topper2019-09-301-1/+12
| | | | | | enable the use of vpshufb on the 256-bit halves. llvm-svn: 373177
* [X86] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after r373174Fangrui Song2019-09-301-0/+1
| | | | llvm-svn: 373175
* [X86] Remove -x86-experimental-vector-widening-legalization command line flagCraig Topper2019-09-291-1186/+130
| | | | | | | | | This was added back to allow some performance regressions to be investigated. The main perf issue was fixed shortly after adding this back and no other major issues have been reported. So I think its safe to remove this again. llvm-svn: 373174
* [X86] Enable canonicalizeBitSelect for AVX512 since we can use VPTERNLOG now.Craig Topper2019-09-291-5/+7
| | | | llvm-svn: 373155
* [X86] Stop using UpdateNodeOperands in combineGatherScatter. Create new ↵Craig Topper2019-09-281-35/+58
| | | | | | | | | | | | | | nodes like most other DAG combines. Creating new nodes is what we usually do. Have to explicitly check that we don't update to an existing node and having to manually manage the worklist is unusual. We can probably add a helper function to reduce the duplication of having to check if we should create a gather or scatter, but I wanted to just get the simple thing done. llvm-svn: 373137
* [X86] Split combineGatherScatter into a version for generic ISD nodes and ↵Craig Topper2019-09-281-5/+39
| | | | | | | | | | | | | | | | another version for X86 specific nodes. The majority of the code doesn't run on the X86 nodes today since its gated by isBeforeLegalizeOps and we don't formm X86 nodes until after that. Except for a couple special case in type legalization. But I think we would probably break those if some of the transforms fire on them. I want to remove the hardcoded operand numbers and the unusual use of UpdateNodeOperands. Being able to know which ISD opcodes are present should help with that. llvm-svn: 373136
* [X86] Call SimplifyDemandedBits in combineGatherScatter any time the mask ↵Craig Topper2019-09-271-3/+3
| | | | | | | | | element is wider than i1, not just when AVX512 is disabled. The AVX2 intrinsics can still be used when AVX512 is enabled and those go through this path. So we should simplify them. llvm-svn: 373108
* [Alignment][NFC] Remove unneeded llvm:: scoping on Align typesGuillaume Chatelet2019-09-271-2/+2
| | | | llvm-svn: 373081
* Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression ↵Ilya Biryukov2019-09-241-118/+17
| | | | | | | | | | to a target hook (PR42863) Reason: this caused severe compile time regressions in JAX. See email thread of original revision on llvm-commits for details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html llvm-svn: 372756
* [X86] Use TargetConstant for condition code on X86ISD::SETCC/CMOV/BRCOND nodes.Craig Topper2019-09-231-60/+55
| | | | | | | | | | This removes the need for ConvertToTarget opcodes in the isel table. It's also consistent with the recent changes to use TargetConstant for intrinsic nodes that always take immediates. Differential Revision: https://reviews.llvm.org/D67902 llvm-svn: 372645
* [x86] fix assert with horizontal math + broadcast of vector (PR43402)Sanjay Patel2019-09-231-4/+4
| | | | | | https://bugs.llvm.org/show_bug.cgi?id=43402 llvm-svn: 372606
* [X86] Remove SETEQ/SETNE canonicalization code from LowerIntVSETCC_AVX512 to ↵Craig Topper2019-09-231-7/+0
| | | | | | | | | | | | prevent an infinite loop. The attached test case would previous infinite loop after r365711. I'm going to move this to X86ISelDAGToDAG.cpp to get the setcc to match VPTEST in 32-bit mode in a follow up commit. llvm-svn: 372543
* Prefer AVX512 memcpy when applicableDavid Zarzycki2019-09-231-0/+5
| | | | | | | | | | | When AVX512 is available and the preferred vector width is 512-bits or more, we should prefer AVX512 for memcpy(). https://bugs.llvm.org/show_bug.cgi?id=43240 https://reviews.llvm.org/D67874 llvm-svn: 372540
* [X86] Convert to Constant arguments to MMX shift by i32 intrinsics to ↵Craig Topper2019-09-231-4/+7
| | | | | | | | | | | | | | TargetConstant during lowering. This allows us to use timm in the isel table which is more consistent with other intrinsics that take an immediate now. We can't declare the intrinsic as taking an ImmArg because we need to match non-constants to the shift by MMX register instruction which we do by mutating the intrinsic id during lowering. llvm-svn: 372537
* [X86][SelectionDAGBuilder] Move the hack for handling MMX shift by i32 ↵Craig Topper2019-09-231-0/+52
| | | | | | | | | | | | | | | | | | | | | | intrinsics into the X86 backend. This intrinsics should be shift by immediate, but gcc allows any i32 scalar and clang needs to match that. So we try to detect the non-constant case and move the data from an integer register to an MMX register. Previously this was done by creating a v2i32 build_vector and bitcast in SelectionDAGBuilder. This had to be done early since v2i32 isn't a legal type. The bitcast+build_vector would be DAG combined to X86ISD::MMX_MOVW2D which isel will turn into a GPR->MMX MOVD. This commit just moves the whole thing to lowering and emits the X86ISD::MMX_MOVW2D directly to avoid the illegal type. The test changes just seem to be due to nodes being linearized in a different order. llvm-svn: 372535
* Fix missed case of switching getConstant to getTargetConstant. Try 2.Sterling Augustine2019-09-201-1/+1
| | | | | | | | | | | | | | Summary: This fixes a crasher introduced by r372338. Reviewers: echristo, arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67850 llvm-svn: 372434
* Revert r372366 "Use getTargetConstant for BLENDI, and add a test to catch it."Nico Weber2019-09-201-1/+1
| | | | | | | | | | | | | | | | | | | This reverts commit 52621307bcab2013e8833f3317cebd63a6db3885. Tests have been failing all night with [0/2] ACTION //llvm/test:check-llvm(//llvm/utils/gn/build/toolchain:unix) -- Testing: 33647 tests, 64 threads -- Testing: 0 .. 10.. UNRESOLVED: LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll (6943 of 33647) ******************** TEST 'LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll' FAILED ******************** Test has no run line! ******************** Since there were other concerns on https://reviews.llvm.org/D67785, I'm just reverting for now. llvm-svn: 372383
* [X86] Convert tbm_bextri_u32/tbm_bextri_u64 intrinsics TargetConstant ↵Craig Topper2019-09-201-0/+10
| | | | | | | | | | | | | | | | | argument to a regular Constant during lowering. We reuse an ISD opcode here that can be reached from BMI that doesn't require it to be an immediate. Our isel patterns to match the TBM immediate form require a Constant and not a TargetConstant. We were accidentally getting the Constant due to a quirk of combineBEXTR calling SimplifyDemandedBits. The call to SimplifyDemandedBits ended up constant folding the TargetConstant to a regular Constant. But we should probably instead be asserting if SimplifyDemandedBits on a TargetConstant so we shouldn't rely on this behavior. llvm-svn: 372373
* Use getTargetConstant for BLENDI, and add a test to catch it.Sterling Augustine2019-09-201-1/+1
| | | | | | | | | | | | | | | | Summary: This fixes a crasher introduced by r372338. Reviewers: echristo, arsenm Subscribers: jvesely, wdng, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67785 Tighten up the test case. llvm-svn: 372366
* [X86] Remove the special isBuildVectorOfConstantSDNodes handling from ↵Craig Topper2019-09-201-26/+2
| | | | | | | | | | LowerBUILD_VECTORvXi1. The later code that generates a constant when there are some non-const elements works basically the same and doesn't require there to be any non-const elements. llvm-svn: 372365
* Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Matt Arsenault2019-09-191-136/+134
| | | | | | | | | This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
* [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook ↵Simon Pilgrim2019-09-191-17/+118
| | | | | | | | | | | | | | | | (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and includes a demonstration X86 implementation. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). I've only begun to replace X86's FNEG handling here, handling FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA negatation propagation). We can build on this in future patches. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 372333
* Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Hans Wennborg2019-09-191-133/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC*. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_* instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
* [X86] Prevent crash in LowerBUILD_VECTORvXi1 for v64i1 vectors on 32-bit ↵Craig Topper2019-09-191-6/+14
| | | | | | | | | targets when the vector is a mix of constants and non-constant. We need to materialize the constants as two 32-bit values that are casted to v32i1 and then concatenated. llvm-svn: 372304
* [X86] Change a SmallVector& argument to SmallVectorImpl&. NFCCraig Topper2019-09-191-1/+1
| | | | | | Avoids repeating the size. llvm-svn: 372302
* [X86] Remove unused argument from a helper function. NFCCraig Topper2019-09-191-4/+3
| | | | llvm-svn: 372301
* GlobalISel: Don't materialize immarg arguments to intrinsicsMatt Arsenault2019-09-191-135/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encode them directly as an imm argument to G_INTRINSIC*. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_* instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285
* [Alignment][NFC] Use Align::None instead of 1Guillaume Chatelet2019-09-181-3/+3
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67704 llvm-svn: 372230
* [X86] Break non-power of 2 vXi1 vectors into scalars for argument passing ↵Craig Topper2019-09-181-10/+15
| | | | | | | | | | with avx512. This generates worse code, but matches what is done for avx2 and prevents crashes when more arguments are passed than we have registers for. llvm-svn: 372200
* [X86] Prevent assertion when calling a function that returns double with ↵Craig Topper2019-09-181-0/+4
| | | | | | | | -mno-sse2 on x86-64. As seen in the most recent updates to PR10498 llvm-svn: 372197
* [X86] Use APInt::operator<<= and APInt::lshrInPlace. NFCCraig Topper2019-09-171-4/+4
| | | | llvm-svn: 372159
* [X86] Simplify b2b KSHIFTL+KSHIFTR using demanded elts.Craig Topper2019-09-171-13/+66
| | | | llvm-svn: 372155
OpenPOWER on IntegriCloud