summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Teach lower1BitShuffle to match right shifts with upper zero elements ↵Craig Topper2019-08-191-19/+20
| | | | | | | | | | on types that don't natively support KSHIFT. We can support these by widening to a supported type, then shifting all the way to the left and then back to the right to ensure that we shift in zeroes. llvm-svn: 369232
* [X86] Fix the lower1BitShuffle code added in r369215 to correctly pass the ↵Craig Topper2019-08-191-1/+1
| | | | | | | | | | widened vector to the KSHIFT node. Not sure how to test this as we have tests that exercise this code, but nothing failed for the types not matching. Since all the k-registers use equivalent register classes everything just ends up working. llvm-svn: 369228
* [X86] Teach lower1BitShuffle to match KSHIFTR that doesn't use Zeroable and ↵Craig Topper2019-08-191-0/+48
| | | | | | | | | | | only relies on undef. This allows us to widen the type when the KSHIFTR instruction doesn't exist for the type. If we need to shift in zeroes into the upper elements we would need more work to guarantee zeroes when widening. llvm-svn: 369227
* [X86] Teach lower1BitShuffle to recognize padding a subvector with zeros ↵Craig Topper2019-08-191-7/+16
| | | | | | | | | with V2 as the source and V1 as the zero vector. Shuffle canonicalization can swap the sources so the zero vector might be V1 and the subvector that's being padded can be V2. llvm-svn: 369226
* [X86] Add a special case to LowerCONCAT_VECTORSvXi1 to handle concatenating ↵Craig Topper2019-08-181-14/+30
| | | | | | | | | | zero vectors followed by one non-zero vector followed by undef vectors. For such a case we should only need a KSHIFTL, but we were previously generating a KSHIFTL followed by a KSHIFTR because we mistakenly believed we need to zero the undef elements. llvm-svn: 369224
* [X86] Replace uses of getZeroVector for vXi1 vectors with DAG.getConstant.Craig Topper2019-08-181-4/+4
| | | | | | vXi1 vectors don't need special handling. llvm-svn: 369222
* [X86] Improve lower1BitShuffle handling for KSHIFTL on narrow vectors.Craig Topper2019-08-181-8/+24
| | | | | | | We can insert the value into a larger legal type and shift that by the desired amount. llvm-svn: 369215
* Fix signed/unsigned comparison warning. NFCI.Simon Pilgrim2019-08-181-2/+2
| | | | llvm-svn: 369213
* [X86] isTargetShuffleEquivalent - add BUILD_VECTOR matchingSimon Pilgrim2019-08-181-3/+21
| | | | | | | | | | Add similar functionality to isShuffleEquivalent - if the mask elements don't match, try matching the BUILD_VECTOR scalars instead. As target shuffles need to handle SM_Sentinel values, this can get a bit tricky, so commit just adds actual mask element index handling - full SM_SentinelZero support will be added when the need arises. Also, enables support in matchVectorShuffleWithPACK llvm-svn: 369212
* [X86] isTargetShuffleEquivalent - early out on illegal shuffle masks. NFCI.Simon Pilgrim2019-08-181-8/+10
| | | | | | Simplifies shuffle mask comparisons by just bailing out if the shuffle mask has any out of range values - will make an upcoming patch much simpler. llvm-svn: 369211
* [X86] Add a one use check to the combineStore code that handles ↵Craig Topper2019-08-171-1/+1
| | | | | | | | | v16i16->v16i8 truncate+store by extending to v16i32 and then emitting a v16i32->v16i8 truncstore. This prevent us from emitting a separate truncate and a truncating store instruction. llvm-svn: 369200
* Revert [X86] SimplifyDemandedVectorElts - attempt to recombine target ↵Jordan Rupprecht2019-08-161-17/+0
| | | | | | | | | | shuffle using DemandedElts mask (reapplied) This reverts r368662 (git commit 1a8d790cf5f89c1df718844f13e934e39bef6ef5) The compile-time regression repro is in https://bugs.llvm.org/show_bug.cgi?id=43024 llvm-svn: 369167
* [X86] Use Register/MCRegister in more places in X86Craig Topper2019-08-169-43/+45
| | | | | | | | | | This was a quick pass through some obvious places. I haven't tried the clang-tidy check. I also replaced the zeroes in getX86SubSuperRegister with X86::NoRegister which is the real sentinel name. Differential Revision: https://reviews.llvm.org/D66363 llvm-svn: 369151
* [X86] resolveTargetShuffleInputs - add DemandedElts variant. NFCI.Simon Pilgrim2019-08-161-3/+10
| | | | | | Nothing calls this yet, everything still goes through the non (all) DemandedElts wrapper. llvm-svn: 369136
* [X86] combineExtractWithShuffle - handle extract(truncate(x), 0)Simon Pilgrim2019-08-161-1/+11
| | | | | | Eventually we need to generalize combineExtractWithShuffle to handle all faux shuffles and handle truncate (and X86ISD::VTRUNC etc.) there, but we're not ready yet (still creates nodes on the fly, incomplete DemandedElts support, bad use of recursive Depth limit). llvm-svn: 369134
* [X86] Alphabetize pass initialization definitions. NFCI.Simon Pilgrim2019-08-161-1/+1
| | | | llvm-svn: 369126
* [X86] Remove unused include. NFCI.Simon Pilgrim2019-08-161-1/+0
| | | | | | We don't use anything from TargetOptions.h directly and its included via TargetLowering.h anyhow. llvm-svn: 369110
* [X86] Manually reimplement getTargetInsertSubreg in ↵Craig Topper2019-08-161-2/+6
| | | | | | | | | | X86DAGToDAGISel::matchBitExtract so we can call insertDAGNode on the target constant. This is needed to maintain the topological sort order. Fixes PR42992. llvm-svn: 369084
* Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVMDaniel Sanders2019-08-1525-236/+233
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041
* [X86] Add custom type legalization for bitcasting mmx to v2i32/v4i16/v8i8 to ↵Craig Topper2019-08-153-0/+21
| | | | | | use movq2dq instead of going through memory. llvm-svn: 369031
* [X86] Improve cost model for subvector extraction of less than 128-bit vectorsCraig Topper2019-08-151-0/+33
| | | | | | | | Now that we're using widening legalization. We need to improve our extract_subvector cost model for these types. This patch begins by modeling these as a subvector extract followed by a permute. I've left FIXMEs in the code for future improvements. Differential Revision: https://reviews.llvm.org/D65892 llvm-svn: 369022
* [llvm] Migrate llvm::make_unique to std::make_uniqueJonas Devlieghere2019-08-157-21/+21
| | | | | | | | Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013
* [SDAG][x86] check for relaxed math when matching an FP reductionSanjay Patel2019-08-151-2/+2
| | | | | | | | | | | | | | | | If the last step in an FP add reduction allows reassociation and doesn't care about -0.0, then we are free to recognize that computation as a reduction that may reorder the intermediate steps. This is requested directly by PR42705: https://bugs.llvm.org/show_bug.cgi?id=42705 and solves PR42947 (if horizontal math instructions are actually faster than the alternative): https://bugs.llvm.org/show_bug.cgi?id=42947 Differential Revision: https://reviews.llvm.org/D66236 llvm-svn: 368995
* [X86] Add isel pattern to match VZEXT_MOVL and a v2i64 scalar_to_vector ↵Craig Topper2019-08-151-0/+4
| | | | | | | | | bitcasted from x86mmx to MOVQ2DQ. We already had the pattern for just the scalar to vector and bitcast, but not the case where we wanted zeroes in the high half of the xmm. llvm-svn: 368972
* [X86] Make sure load is non-volatile in the MMX_X86movdq2q (loadv2i64) isel ↵Craig Topper2019-08-151-1/+1
| | | | | | | | | pattern. This pattern will narrow the load so we should make sure its not volatile. llvm-svn: 368971
* [X86] Remove unneeded isel pattern for v4f32->v4i32 fp_to_sint and ↵Craig Topper2019-08-151-3/+0
| | | | | | | | | | conversion to MMX. fp_to_sint is turned into X86cvttp2si during isel preprocessing. The other redundant isel patterns were removed previously, but I missed this one because its in the MMX td file. llvm-svn: 368968
* [X86] Disable custom type legalization for v2i32/v4i16/v8i8->i64.Craig Topper2019-08-151-2/+1
| | | | | | The default legalization can take care of this. llvm-svn: 368967
* [X86] Disable custom type legalization for v2i32/v4i16/v8i8->f64 bitcast.Craig Topper2019-08-151-1/+2
| | | | | | | The generic legalization handles this in the same way so just use that. llvm-svn: 368966
* [X86] Remove some unreachable code from LowerBITCAST.Craig Topper2019-08-151-42/+26
| | | | llvm-svn: 368965
* [X86] Remove some dead code and combine some repeated code that's left.Craig Topper2019-08-151-17/+3
| | | | | | | | If the width is 256 bits, then we must have AVX so the else here was unnecessary. Once that's removed then the >= 256 bit code is identical to the 128 bit code with a different VT so combine them. llvm-svn: 368956
* [X86] Use PSADBW for v8i8 addition reductions.Craig Topper2019-08-141-2/+12
| | | | | | | | Improves the 8 byte case from PR42674. Differential Revision: https://reviews.llvm.org/D66069 llvm-svn: 368864
* [X86][CostModel] Adjust the costs of ZERO_EXTEND/SIGN_EXTEND with less than ↵Craig Topper2019-08-141-10/+12
| | | | | | | | | | | | 128-bit inputs Now that we legalize by widening, the element types here won't change. Previously these were modeled as the elements being widened and then the instruction might become an AND or SHL/ASHR pair. But now they'll become something like a ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG. For AVX2, when the destination type is legal its clear the cost should be 1 since we have extend instructions that can produce 256 bit vectors from less than 128 bit vectors. I'm a little less sure about AVX1 costs, but I think the ones I changed were definitely too high, but they might still be too high. Differential Revision: https://reviews.llvm.org/D66169 llvm-svn: 368858
* [X86] Add llvm_unreachable to a switch that covers all expected values.Craig Topper2019-08-141-0/+1
| | | | llvm-svn: 368857
* [X86] XFormVExtractWithShuffleIntoLoad - handle shuffle mask scalingSimon Pilgrim2019-08-131-13/+27
| | | | | | | | | | If the target shuffle mask is from a wider type, attempt to scale the mask so that the extraction can attempt to peek through. Fixes the regression mentioned in rL368662 Reapplying this as rL368308 had to be reverted as part of rL368660 to revert rL368276 llvm-svn: 368663
* [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using ↵Simon Pilgrim2019-08-131-0/+17
| | | | | | | | | | | | | | DemandedElts mask (reapplied) If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit. Reapplying this as rL368307 had to be reverted as part of rL368660 to revert rL368276 llvm-svn: 368662
* Revert r368276 "[TargetLowering] SimplifyDemandedBits - call ↵Hans Wennborg2019-08-131-44/+13
| | | | | | | | | | | | | | | | | | | | | | SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT" This introduced a false positive MemorySanitizer warning about use of uninitialized memory in a vectorized crc function in Chromium. That suggests maybe something is not right with this transformation. See https://crbug.com/992853#c7 for a reproducer. This also reverts the follow-up commits r368307 and r368308 which depended on this. > This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. > > In particular this helps remove some unnecessary scalar->vector->scalar patterns. > > The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. > > Differential Revision: https://reviews.llvm.org/D65887 llvm-svn: 368660
* [GlobalISel] Make the InstructionSelector instance non-const, allowing state ↵Amara Emerson2019-08-133-19/+14
| | | | | | | | | | | | | | | | to be maintained. Currently we can't keep any state in the selector object that we get from subtarget. As a result we have to plumb through all our variables through multiple functions. This change makes it non-const and adds a virtual init() method to allow further state to be captured for each target. AArch64 makes use of this in this patch to cache a call to hasFnAttribute() which is expensive to call, and is used on each selection of G_BRCOND. Differential Revision: https://reviews.llvm.org/D65984 llvm-svn: 368652
* [WinEH] Fix catch block parent frame pointer offsetReid Kleckner2019-08-121-3/+8
| | | | | | | | | | | | r367088 made it so that funclets store XMM registers into their local frame instead of storing them to the parent frame. However, that change forgot to update the parent frame pointer offset for catch blocks. This change does that. Fixes crashes when an exception is rethrown in a catch block that saves XMMs, as described in https://crbug.com/992860. llvm-svn: 368631
* [X86] Allow combineTruncateWithSat to use pack instructions for i16->i8 ↵Craig Topper2019-08-121-1/+2
| | | | | | | | | | | without AVX512BW. We need AVX512BW to be able to truncate an i16 vector. If we don't have that we have to extend i16->i32, then trunc, i32->i8. But we won't be able to remove the min/max if we do that. At least not without more special handling. llvm-svn: 368623
* [X86] Remove unreachable code from LowerTRUNCATE. NFCCraig Topper2019-08-121-16/+4
| | | | | | | | All three 256->128 bit cases were already handled above. Noticed while looking at the coverage report. llvm-svn: 368609
* [X86] Add a paranoia type check to the code that detects AVG patterns from ↵Craig Topper2019-08-121-5/+6
| | | | | | | | | | | | | truncating stores. If we're after type legalize, we should make sure we won't create a store with an illegal type when we separate the AVG pattern from the truncating store. I don't know of a way to fail for this today. Just noticed while I was in the vicinity. llvm-svn: 368608
* [X86] Simplify creation of saturating truncating stores.Craig Topper2019-08-121-41/+11
| | | | | | | We just need to check if the truncating store is legal instead of going through isSATValidOnAVX512Subtarget. llvm-svn: 368607
* [X86] Replace call to isTruncStoreLegalOrCustom with isTruncStoreLegal. NFCCraig Topper2019-08-121-1/+1
| | | | | | We have no custom trunc stores on X86. llvm-svn: 368606
* [X86] Disable use of zmm registers for varargs musttail calls under ↵Craig Topper2019-08-121-1/+1
| | | | | | | | | prefer-vector-width=256 and min-legal-vector-width=256. Under this config, the v16f32 type we try to use isn't to a register class so the getRegClassFor call will fail. llvm-svn: 368594
* [X86][SSE] ComputeKnownBits - add basic PSADBW handlingSimon Pilgrim2019-08-121-2/+11
| | | | llvm-svn: 368558
* [X86] Support -march=tigerlakePengfei Wang2019-08-121-0/+13
| | | | | | | | | | | | Support -march=tigerlake for x86. Compare with Icelake Client, It include 4 more new features ,they are avx512vp2intersect, movdiri, movdir64b, shstk. Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D65840 llvm-svn: 368543
* [X86] Simplify some of the type checks in combineSubToSubus.Craig Topper2019-08-111-5/+10
| | | | | | | If we have SSE2 we can handle any i8/i16 type and let type legalization deal with it. llvm-svn: 368538
* [X86] Don't use SplitOpsAndApply for ISD::USUBSAT.Craig Topper2019-08-111-10/+4
| | | | | | | Target independent type legalization and custom lowering should be able to handle it. llvm-svn: 368537
* [X86] Remove some more code from combineShuffle that is no longer needed ↵Craig Topper2019-08-111-47/+0
| | | | | | with widening legalization. llvm-svn: 368523
* [X86] Remove some code from combineShuffle that seems largely unnecessary ↵Craig Topper2019-08-111-60/+0
| | | | | | | | | | | with widening legalization. The test case that changed is probably better served through allowing combineTruncatedArithmetic to create narrow vectors. It also appears InstCombine would have simplified this test case to remove the zext and trunc anyway. llvm-svn: 368522
OpenPOWER on IntegriCloud