summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86ISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86ISelLowering] LowerSELECT - remove duplicate value type. NFCI.Simon Pilgrim2019-08-311-2/+0
| | | | | | VT of SELECT result and selection ops will be the same. llvm-svn: 370581
* [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturnReid Kleckner2019-08-301-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users have complained llvm.trap produce two ud2 instructions on Win64, one for the trap, and one for unreachable. This change fixes that. TrapUnreachable was added and enabled for Win64 in r206684 (April 2014) to avoid poorly understood issues with the Windows unwinder. There seem to be two major things in play: - the unwinder - C++ EH, _CxxFrameHandler3 & co The unwinder disassembles forward from the return address to scan for epilogues. Inserting a ud2 had the effect of stopping the unwinder, and ensuring that it ran the EH personality function for the current frame. However, it's not clear what the unwinder does when the return address happens to be the last address of one function and the first address of the next function. The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out what the current EH state number is. It does this by consulting the ip2state table, which maps from PC to state number. This seems to go wrong when the return address is the last PC of the function or catch funclet. I'm not sure precisely which system is involved here, but in order to address these real or hypothetical problems, I believe it is enough to insert int3 after a call site if it would otherwise be the last instruction in a function or funclet. I was able to reproduce some similar problems locally by arranging for a noreturn call to appear at the end of a catch block immediately before an unrelated function, and I confirmed that the problems go away when an extra trailing int3 instruction is added. MSVC inserts int3 after every noreturn function call, but I believe it's only necessary to do it if the call would be the last instruction. This change inserts a pseudo instruction that expands to int3 if it is in the last basic block of a function or funclet. I did what I could to run the Microsoft compiler EH tests, and the ones I was able to run showed no behavior difference before or after this change. Differential Revision: https://reviews.llvm.org/D66980 llvm-svn: 370525
* [X86] Pass v32i16/v64i8 in zmm registers on KNL target.Craig Topper2019-08-301-0/+15
| | | | | | | | | | | | | | | gcc and icc pass these types in zmm registers in zmm registers. This patch implements a quick hack to override the register type before calling convention handling to one that is legal. Longer term we might want to do something similar to 256-bit integer registers on AVX1 where we just split all the operations. Fixes PR42957 Differential Revision: https://reviews.llvm.org/D66708 llvm-svn: 370495
* [X86][SSE] combinePMULDQ - pmuldq(x, 0) -> zero vector (PR43159)Simon Pilgrim2019-08-291-3/+5
| | | | | | ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element. llvm-svn: 370404
* [X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to ↵Roman Lebedev2019-08-291-42/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | X86DAGToDAGISel Summary: We were previously doing it in DAGCombine. But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine. So if we had `sub %x, -1`, we'll transform it to `add %x, 1`, which `combineIncDecVector()` will immediately transform back into `sub %x, -1`, and here we go again... I've marked this as NFC since not a single test changes, but since that 'changes' DAGCombine, probably this isn't fully NFC. Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62327 llvm-svn: 370327
* [X86] Add a DAG combine to combine INSERTPS and VBROADCAST of a scalar load. ↵Craig Topper2019-08-291-34/+45
| | | | | | | | | | Remove corresponding isel patterns. We had an isel pattern to perform this, but its better to do it in DAG combine as a simplification. This also fixes the lack of patterns for AVX512 targets. llvm-svn: 370294
* [X86] Make inline assembly 'x' and 'v' constraints work for f128.Craig Topper2019-08-291-2/+3
| | | | | | | | | Including a type legalizer fix to make bitcast operand promotion work correctly when getSoftenedFloat returns f128 instead of i128. Fixes PR43157 llvm-svn: 370293
* [SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711)Hans Wennborg2019-08-281-0/+8
| | | | | | | | | Neither libgcc or compiler-rt are usually used on Windows, so these functions can't be called. Differential revision: https://reviews.llvm.org/D66880 llvm-svn: 370204
* [X86][AVX] Add SimplifyDemandedVectorElts support for KSHIFTL/KSHIFTRSimon Pilgrim2019-08-271-0/+25
| | | | | | Differential Revision: https://reviews.llvm.org/D66527 llvm-svn: 370055
* [X86] Delay combineIncDecVector until after op legalization.Craig Topper2019-08-261-5/+15
| | | | | | | | | Probably better to keep add over sub in early DAG combines. It might make sense to push this to lowering or delay it all the way to isel. But this was the simplest change. llvm-svn: 369981
* [X86] Add a hack to combinePMULDQ to manually turn ↵Craig Topper2019-08-261-0/+28
| | | | | | | | | | | | SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG inputs into an ANY_EXTEND_VECTOR_INREG style shuffle ANY_EXTEND_VECTOR_INREG isn't currently marked Legal which prevents SimplifyDemandedBits from turning SIGN/ZERO_EXTEND_VECTOR_INREG into it after op legalization. And even if we did make it Legal, combineExtInVec doesn't do shuffle combining on the VECTOR_INREG nodes until AVX1. This patch adds a quick hack to combinePMULDQ to directly emit a vector shuffle corresponding to an ANY_EXTEND_VECTOR_INREG operation. This avoids both of those issues without creating any other regressions on our tests. The xop-ifma.ll change here also showed up when I tried to resurrect D56306 and seemed to be the only improvement that patch creates now. This is a more direct way to get the benefit. Differential Revision: https://reviews.llvm.org/D66436 llvm-svn: 369942
* [X86][DAGCombiner] Teach narrowShuffle to use concat_vectors instead of ↵Craig Topper2019-08-251-2/+10
| | | | | | | | | | | | | | | | | | | | | inserting into undef Summary: Concat_vectors is more canonical during early DAG combine. For example, its what's used by SelectionDAGBuilder when converting IR shuffles into SelectionDAG shuffles when element counts between inputs and mask don't match. We also have combines in DAGCombiner than can pull concat_vectors through a shuffle. See partitionShuffleOfConcats. So it seems like concat_vectors is a better operation to use here. I had to teach DAGCombiner's SimplifyVBinOp to also handle concat_vectors with undef. I haven't checked yet if we can remove the INSERT_SUBVECTOR version in there or not. I didn't want to mess with the other caller of getShuffleHalfVectors that's used during shuffle lowering where insert_subvector probably is what we want to produce so I've enabled this via a boolean passed to the function. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66504 llvm-svn: 369872
* [X86] Add an assert to mark more code that needs to be removed when the ↵Craig Topper2019-08-241-1/+4
| | | | | | vector widening legalization switch is removed again. llvm-svn: 369837
* [X86] Move a transform out of combineConcatVectorOps so we don't prematurely ↵Craig Topper2019-08-231-9/+13
| | | | | | | | | | | | turn CONCAT_VECTORS into INSERT_SUBVECTORS. CONCAT_VECTORS and INSERT_SUBVECTORS can both call combineConcatVectorOps, but we shouldn't produce INSERT_SUBVECTORS from there. We should keep CONCAT_VECTORS until vector legalization. Noticed while looking at the madd_quad_reduction test from madd.ll llvm-svn: 369802
* [SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 ↵Craig Topper2019-08-231-0/+20
| | | | | | | | | | | | | | | | SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression Patch showing the effect of enabling bool vector oversimplification. Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify: insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i Preventing the removal of the AND to clear the upper bits of result Differential Revision: https://reviews.llvm.org/D53022 llvm-svn: 369780
* Use VT::getHalfNumVectorElementsVT helpers in a few places. NFCI.Simon Pilgrim2019-08-231-5/+2
| | | | llvm-svn: 369751
* [X86] Make combineLoopSADPattern use CONCAT_VECTORS instead of ↵Craig Topper2019-08-231-3/+5
| | | | | | | | | INSERT_SUBVECTORS for widening with zeros. CONCAT_VECTORS is more canonical for the early DAG combine runs until we start getting into the op legalization phases. llvm-svn: 369734
* [X86] Improve lowering of v2i32 SAD handling in combineLoopSADPattern.Craig Topper2019-08-231-3/+10
| | | | | | | | | | | | | | | | | | | | | For v2i32 we only feed 2 i8 elements into the psadbw instructions with 0s in the other 14 bytes. The resulting psadbw instruction will produce zeros in bits [127:16] of the output. We need to take the result and feed it to a v2i32 add where the first element includes bits [15:0] of the sad result. The other element should be zero. Prior to this patch we were using a truncate to take 0 from bits 95:64 of the psadbw. This results in a pshufd to move those bits to 63:32. But since we also have zeroes in bits 63:32 of the psadbw output, we should just take those bits. The previous code probably worked better with promoting legalization, but now we use widening legalization. I've preserved the old behavior if -x86-experimental-vector-widening-legalization=false until we get that option removed. llvm-svn: 369733
* [MVT] Add MVT equivalent to EVT::getHalfNumVectorElementsVT() helper. NFCI.Simon Pilgrim2019-08-221-21/+11
| | | | | | Allows for some cleanup in a lot of SSE/AVX vector splitting code llvm-svn: 369640
* [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors ↵Craig Topper2019-08-201-0/+14
| | | | | | | | | | | | | | (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef) I also had to add a new combine to X86's combineExtractSubvector to prevent a regression. This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation. I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that. Differential Revision: https://reviews.llvm.org/D66456 llvm-svn: 369459
* [X86] Add a DAG combine to transform (i8 (bitcast (v8i1 (extract_subvector ↵Craig Topper2019-08-201-0/+12
| | | | | | | | | | (v16i1 X), 0)))) -> (i8 (trunc (i16 (bitcast (v16i1 X))))) on KNL target Without AVX512DQ we don't have KMOVB so we can't really copy 8-bits of a k-register to a GPR. We have to copy 16 bits instead. We do this even if the DAG copy is from v8i1->v16i1. If we detect the (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) we should rewrite the types to match the copy we do support. By doing this, we can help known bits to propagate without losing the upper 8 bits of the input to the extract_subvector. This allows some zero extends to be removed since we have an isel pattern to use kmovw for (zero_extend (i16 (bitcast (v16i1 X))). Differential Revision: https://reviews.llvm.org/D66489 llvm-svn: 369434
* [X86] Use isNullConstant instead of getConstantOperandVal == 0. NFCCraig Topper2019-08-201-2/+2
| | | | llvm-svn: 369410
* [X86] Add back the -x86-experimental-vector-widening-legalization comand ↵Craig Topper2019-08-201-130/+1175
| | | | | | | | | | | | | | | | | | | | | line flag and all associated code, but leave it enabled by default Google is reporting performance issues with the new default behavior and have asked for a way to switch back to the old behavior while we investigate and make fixes. I've restored all of the code that had since been removed and added additional checks of the command flag onto code paths that are not otherwise guarded by a check of getTypeAction. I've also modified the cost model tables to hopefully get us back to the previous costs. Hopefully we won't need to support this for very long since we have no test coverage of the old behavior so we can very easily break it. llvm-svn: 369332
* [X86] Teach lowerV4I32Shuffle to only use broadcasts if the mask has more ↵Craig Topper2019-08-191-9/+11
| | | | | | | | | | | | | | than one undef element. Prioritize shifts over broadcast in lowerV8I16Shuffle. The motivating case are the changes in vector-reduce-add.ll where we were doing extra work in the scalar domain instead of shuffling. There may be some one use check that needs to be looked into there, but this patch sidesteps the issue by avoiding broadcasts that aren't really broadcasting. Differential Revision: https://reviews.llvm.org/D66071 llvm-svn: 369287
* [X86] Teach lower1BitShuffle to match right shifts with upper zero elements ↵Craig Topper2019-08-191-19/+20
| | | | | | | | | | on types that don't natively support KSHIFT. We can support these by widening to a supported type, then shifting all the way to the left and then back to the right to ensure that we shift in zeroes. llvm-svn: 369232
* [X86] Fix the lower1BitShuffle code added in r369215 to correctly pass the ↵Craig Topper2019-08-191-1/+1
| | | | | | | | | | widened vector to the KSHIFT node. Not sure how to test this as we have tests that exercise this code, but nothing failed for the types not matching. Since all the k-registers use equivalent register classes everything just ends up working. llvm-svn: 369228
* [X86] Teach lower1BitShuffle to match KSHIFTR that doesn't use Zeroable and ↵Craig Topper2019-08-191-0/+48
| | | | | | | | | | | only relies on undef. This allows us to widen the type when the KSHIFTR instruction doesn't exist for the type. If we need to shift in zeroes into the upper elements we would need more work to guarantee zeroes when widening. llvm-svn: 369227
* [X86] Teach lower1BitShuffle to recognize padding a subvector with zeros ↵Craig Topper2019-08-191-7/+16
| | | | | | | | | with V2 as the source and V1 as the zero vector. Shuffle canonicalization can swap the sources so the zero vector might be V1 and the subvector that's being padded can be V2. llvm-svn: 369226
* [X86] Add a special case to LowerCONCAT_VECTORSvXi1 to handle concatenating ↵Craig Topper2019-08-181-14/+30
| | | | | | | | | | zero vectors followed by one non-zero vector followed by undef vectors. For such a case we should only need a KSHIFTL, but we were previously generating a KSHIFTL followed by a KSHIFTR because we mistakenly believed we need to zero the undef elements. llvm-svn: 369224
* [X86] Replace uses of getZeroVector for vXi1 vectors with DAG.getConstant.Craig Topper2019-08-181-4/+4
| | | | | | vXi1 vectors don't need special handling. llvm-svn: 369222
* [X86] Improve lower1BitShuffle handling for KSHIFTL on narrow vectors.Craig Topper2019-08-181-8/+24
| | | | | | | We can insert the value into a larger legal type and shift that by the desired amount. llvm-svn: 369215
* Fix signed/unsigned comparison warning. NFCI.Simon Pilgrim2019-08-181-2/+2
| | | | llvm-svn: 369213
* [X86] isTargetShuffleEquivalent - add BUILD_VECTOR matchingSimon Pilgrim2019-08-181-3/+21
| | | | | | | | | | Add similar functionality to isShuffleEquivalent - if the mask elements don't match, try matching the BUILD_VECTOR scalars instead. As target shuffles need to handle SM_Sentinel values, this can get a bit tricky, so commit just adds actual mask element index handling - full SM_SentinelZero support will be added when the need arises. Also, enables support in matchVectorShuffleWithPACK llvm-svn: 369212
* [X86] isTargetShuffleEquivalent - early out on illegal shuffle masks. NFCI.Simon Pilgrim2019-08-181-8/+10
| | | | | | Simplifies shuffle mask comparisons by just bailing out if the shuffle mask has any out of range values - will make an upcoming patch much simpler. llvm-svn: 369211
* [X86] Add a one use check to the combineStore code that handles ↵Craig Topper2019-08-171-1/+1
| | | | | | | | | v16i16->v16i8 truncate+store by extending to v16i32 and then emitting a v16i32->v16i8 truncstore. This prevent us from emitting a separate truncate and a truncating store instruction. llvm-svn: 369200
* Revert [X86] SimplifyDemandedVectorElts - attempt to recombine target ↵Jordan Rupprecht2019-08-161-17/+0
| | | | | | | | | | shuffle using DemandedElts mask (reapplied) This reverts r368662 (git commit 1a8d790cf5f89c1df718844f13e934e39bef6ef5) The compile-time regression repro is in https://bugs.llvm.org/show_bug.cgi?id=43024 llvm-svn: 369167
* [X86] resolveTargetShuffleInputs - add DemandedElts variant. NFCI.Simon Pilgrim2019-08-161-3/+10
| | | | | | Nothing calls this yet, everything still goes through the non (all) DemandedElts wrapper. llvm-svn: 369136
* [X86] combineExtractWithShuffle - handle extract(truncate(x), 0)Simon Pilgrim2019-08-161-1/+11
| | | | | | Eventually we need to generalize combineExtractWithShuffle to handle all faux shuffles and handle truncate (and X86ISD::VTRUNC etc.) there, but we're not ready yet (still creates nodes on the fly, incomplete DemandedElts support, bad use of recursive Depth limit). llvm-svn: 369134
* Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVMDaniel Sanders2019-08-151-58/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041
* [X86] Add custom type legalization for bitcasting mmx to v2i32/v4i16/v8i8 to ↵Craig Topper2019-08-151-0/+10
| | | | | | use movq2dq instead of going through memory. llvm-svn: 369031
* [SDAG][x86] check for relaxed math when matching an FP reductionSanjay Patel2019-08-151-2/+2
| | | | | | | | | | | | | | | | If the last step in an FP add reduction allows reassociation and doesn't care about -0.0, then we are free to recognize that computation as a reduction that may reorder the intermediate steps. This is requested directly by PR42705: https://bugs.llvm.org/show_bug.cgi?id=42705 and solves PR42947 (if horizontal math instructions are actually faster than the alternative): https://bugs.llvm.org/show_bug.cgi?id=42947 Differential Revision: https://reviews.llvm.org/D66236 llvm-svn: 368995
* [X86] Disable custom type legalization for v2i32/v4i16/v8i8->i64.Craig Topper2019-08-151-2/+1
| | | | | | The default legalization can take care of this. llvm-svn: 368967
* [X86] Disable custom type legalization for v2i32/v4i16/v8i8->f64 bitcast.Craig Topper2019-08-151-1/+2
| | | | | | | The generic legalization handles this in the same way so just use that. llvm-svn: 368966
* [X86] Remove some unreachable code from LowerBITCAST.Craig Topper2019-08-151-42/+26
| | | | llvm-svn: 368965
* [X86] Remove some dead code and combine some repeated code that's left.Craig Topper2019-08-151-17/+3
| | | | | | | | If the width is 256 bits, then we must have AVX so the else here was unnecessary. Once that's removed then the >= 256 bit code is identical to the 128 bit code with a different VT so combine them. llvm-svn: 368956
* [X86] Use PSADBW for v8i8 addition reductions.Craig Topper2019-08-141-2/+12
| | | | | | | | Improves the 8 byte case from PR42674. Differential Revision: https://reviews.llvm.org/D66069 llvm-svn: 368864
* [X86] XFormVExtractWithShuffleIntoLoad - handle shuffle mask scalingSimon Pilgrim2019-08-131-13/+27
| | | | | | | | | | If the target shuffle mask is from a wider type, attempt to scale the mask so that the extraction can attempt to peek through. Fixes the regression mentioned in rL368662 Reapplying this as rL368308 had to be reverted as part of rL368660 to revert rL368276 llvm-svn: 368663
* [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using ↵Simon Pilgrim2019-08-131-0/+17
| | | | | | | | | | | | | | DemandedElts mask (reapplied) If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit. Reapplying this as rL368307 had to be reverted as part of rL368660 to revert rL368276 llvm-svn: 368662
* Revert r368276 "[TargetLowering] SimplifyDemandedBits - call ↵Hans Wennborg2019-08-131-44/+13
| | | | | | | | | | | | | | | | | | | | | | SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT" This introduced a false positive MemorySanitizer warning about use of uninitialized memory in a vectorized crc function in Chromium. That suggests maybe something is not right with this transformation. See https://crbug.com/992853#c7 for a reproducer. This also reverts the follow-up commits r368307 and r368308 which depended on this. > This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. > > In particular this helps remove some unnecessary scalar->vector->scalar patterns. > > The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. > > Differential Revision: https://reviews.llvm.org/D65887 llvm-svn: 368660
* [X86] Allow combineTruncateWithSat to use pack instructions for i16->i8 ↵Craig Topper2019-08-121-1/+2
| | | | | | | | | | | without AVX512BW. We need AVX512BW to be able to truncate an i16 vector. If we don't have that we have to extend i16->i32, then trunc, i32->i8. But we won't be able to remove the min/max if we do that. At least not without more special handling. llvm-svn: 368623
OpenPOWER on IntegriCloud