summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Use PSADBW for v8i8 addition reductions.Craig Topper2019-08-141-2/+12
| | | | | | | | Improves the 8 byte case from PR42674. Differential Revision: https://reviews.llvm.org/D66069 llvm-svn: 368864
* [X86][CostModel] Adjust the costs of ZERO_EXTEND/SIGN_EXTEND with less than ↵Craig Topper2019-08-141-10/+12
| | | | | | | | | | | | 128-bit inputs Now that we legalize by widening, the element types here won't change. Previously these were modeled as the elements being widened and then the instruction might become an AND or SHL/ASHR pair. But now they'll become something like a ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG. For AVX2, when the destination type is legal its clear the cost should be 1 since we have extend instructions that can produce 256 bit vectors from less than 128 bit vectors. I'm a little less sure about AVX1 costs, but I think the ones I changed were definitely too high, but they might still be too high. Differential Revision: https://reviews.llvm.org/D66169 llvm-svn: 368858
* [X86] Add llvm_unreachable to a switch that covers all expected values.Craig Topper2019-08-141-0/+1
| | | | llvm-svn: 368857
* [X86] XFormVExtractWithShuffleIntoLoad - handle shuffle mask scalingSimon Pilgrim2019-08-131-13/+27
| | | | | | | | | | If the target shuffle mask is from a wider type, attempt to scale the mask so that the extraction can attempt to peek through. Fixes the regression mentioned in rL368662 Reapplying this as rL368308 had to be reverted as part of rL368660 to revert rL368276 llvm-svn: 368663
* [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using ↵Simon Pilgrim2019-08-131-0/+17
| | | | | | | | | | | | | | DemandedElts mask (reapplied) If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit. Reapplying this as rL368307 had to be reverted as part of rL368660 to revert rL368276 llvm-svn: 368662
* Revert r368276 "[TargetLowering] SimplifyDemandedBits - call ↵Hans Wennborg2019-08-131-44/+13
| | | | | | | | | | | | | | | | | | | | | | SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT" This introduced a false positive MemorySanitizer warning about use of uninitialized memory in a vectorized crc function in Chromium. That suggests maybe something is not right with this transformation. See https://crbug.com/992853#c7 for a reproducer. This also reverts the follow-up commits r368307 and r368308 which depended on this. > This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. > > In particular this helps remove some unnecessary scalar->vector->scalar patterns. > > The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. > > Differential Revision: https://reviews.llvm.org/D65887 llvm-svn: 368660
* [GlobalISel] Make the InstructionSelector instance non-const, allowing state ↵Amara Emerson2019-08-133-19/+14
| | | | | | | | | | | | | | | | to be maintained. Currently we can't keep any state in the selector object that we get from subtarget. As a result we have to plumb through all our variables through multiple functions. This change makes it non-const and adds a virtual init() method to allow further state to be captured for each target. AArch64 makes use of this in this patch to cache a call to hasFnAttribute() which is expensive to call, and is used on each selection of G_BRCOND. Differential Revision: https://reviews.llvm.org/D65984 llvm-svn: 368652
* [WinEH] Fix catch block parent frame pointer offsetReid Kleckner2019-08-121-3/+8
| | | | | | | | | | | | r367088 made it so that funclets store XMM registers into their local frame instead of storing them to the parent frame. However, that change forgot to update the parent frame pointer offset for catch blocks. This change does that. Fixes crashes when an exception is rethrown in a catch block that saves XMMs, as described in https://crbug.com/992860. llvm-svn: 368631
* [X86] Allow combineTruncateWithSat to use pack instructions for i16->i8 ↵Craig Topper2019-08-121-1/+2
| | | | | | | | | | | without AVX512BW. We need AVX512BW to be able to truncate an i16 vector. If we don't have that we have to extend i16->i32, then trunc, i32->i8. But we won't be able to remove the min/max if we do that. At least not without more special handling. llvm-svn: 368623
* [X86] Remove unreachable code from LowerTRUNCATE. NFCCraig Topper2019-08-121-16/+4
| | | | | | | | All three 256->128 bit cases were already handled above. Noticed while looking at the coverage report. llvm-svn: 368609
* [X86] Add a paranoia type check to the code that detects AVG patterns from ↵Craig Topper2019-08-121-5/+6
| | | | | | | | | | | | | truncating stores. If we're after type legalize, we should make sure we won't create a store with an illegal type when we separate the AVG pattern from the truncating store. I don't know of a way to fail for this today. Just noticed while I was in the vicinity. llvm-svn: 368608
* [X86] Simplify creation of saturating truncating stores.Craig Topper2019-08-121-41/+11
| | | | | | | We just need to check if the truncating store is legal instead of going through isSATValidOnAVX512Subtarget. llvm-svn: 368607
* [X86] Replace call to isTruncStoreLegalOrCustom with isTruncStoreLegal. NFCCraig Topper2019-08-121-1/+1
| | | | | | We have no custom trunc stores on X86. llvm-svn: 368606
* [X86] Disable use of zmm registers for varargs musttail calls under ↵Craig Topper2019-08-121-1/+1
| | | | | | | | | prefer-vector-width=256 and min-legal-vector-width=256. Under this config, the v16f32 type we try to use isn't to a register class so the getRegClassFor call will fail. llvm-svn: 368594
* [X86][SSE] ComputeKnownBits - add basic PSADBW handlingSimon Pilgrim2019-08-121-2/+11
| | | | llvm-svn: 368558
* [X86] Support -march=tigerlakePengfei Wang2019-08-121-0/+13
| | | | | | | | | | | | Support -march=tigerlake for x86. Compare with Icelake Client, It include 4 more new features ,they are avx512vp2intersect, movdiri, movdir64b, shstk. Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D65840 llvm-svn: 368543
* [X86] Simplify some of the type checks in combineSubToSubus.Craig Topper2019-08-111-5/+10
| | | | | | | If we have SSE2 we can handle any i8/i16 type and let type legalization deal with it. llvm-svn: 368538
* [X86] Don't use SplitOpsAndApply for ISD::USUBSAT.Craig Topper2019-08-111-10/+4
| | | | | | | Target independent type legalization and custom lowering should be able to handle it. llvm-svn: 368537
* [X86] Remove some more code from combineShuffle that is no longer needed ↵Craig Topper2019-08-111-47/+0
| | | | | | with widening legalization. llvm-svn: 368523
* [X86] Remove some code from combineShuffle that seems largely unnecessary ↵Craig Topper2019-08-111-60/+0
| | | | | | | | | | | with widening legalization. The test case that changed is probably better served through allowing combineTruncatedArithmetic to create narrow vectors. It also appears InstCombine would have simplified this test case to remove the zext and trunc anyway. llvm-svn: 368522
* [X86][SSE] Lower shuffle as ANY_EXTEND_VECTOR_INREGSimon Pilgrim2019-08-101-3/+3
| | | | | | | | | | On SSE41+ targets we always lower vector shuffles to ZERO_EXTEND_VECTOR_INREG, even if we don't need the extended bits. This patch relaxes this so that we lower to ANY_EXTEND_VECTOR_INREG if we can, meaning that shuffle combines have a better idea of what elements need to be kept zero. This helps the multiple reduction code as we can now combine away a lot more of the pack+extend codes. Differential Revision: https://reviews.llvm.org/D65741 llvm-svn: 368515
* [X86] Match the IR pattern form movmsk on SSE1 only targets where v4i32 ↵Craig Topper2019-08-101-3/+22
| | | | | | | | | | | | | | | | | | | isn't legal Summary: This patch adds a special DAG combine for SSE1 to recognize the IR pattern InstCombine gives us for movmsk. This only does the recognition for a few cases where its obvious the input won't be scalarized resulting in building a vector just do to the movmsk. I've made it separate from our existing matching for movmsk since that's called in multiple places and I didn't spend time to see if the other callers would make sense here. Plus the restrictions and additional checks would complicate that. This fixes the case from PR42870. Buts its probably still broken the presence of logic ops feeding the movmsk pattern which would further hide the v4f32 type. Reviewers: spatel, RKSimon, xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65689 llvm-svn: 368506
* [X86] Improve the diagnostic for larger than 4-bit immediate for ↵Craig Topper2019-08-103-3/+22
| | | | | | vpermil2pd/ps. Only allow MCConstantExprs. llvm-svn: 368505
* [X86] Fix stack probe issue on windows32.Luo, Yuanke2019-08-104-8/+26
| | | | | | | | | | | | | | | | | | | | | Summary: On windows if the frame size exceed 4096 bytes, compiler need to generate a call to _alloca_probe. X86CallFrameOptimization pass changes the reserved stack size and cause of stack probe function not be inserted. This patch fix the issue by detecting the call frame size, if the size exceed 4096 bytes, drop X86CallFrameOptimization. Reviewers: craig.topper, wxiao3, annita.zhang, rnk, RKSimon Reviewed By: rnk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65923 llvm-svn: 368503
* [globalisel] Add G_SEXT_INREGDaniel Sanders2019-08-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487
* Remove variable only used in an assert.Eric Christopher2019-08-091-2/+1
| | | | llvm-svn: 368486
* [X86] Remove custom handling for extloads from LowerLoad.Craig Topper2019-08-091-183/+1
| | | | | | We don't appear to need this with widening legalization. llvm-svn: 368479
* [X86][SSE] Swap X86ISD::BLENDV inputs with an inverted selection mask (PR42825)Simon Pilgrim2019-08-091-0/+6
| | | | | | | | As discussed on PR42825, if we are inverting the selection mask we can just swap the inputs and avoid the inversion. Differential Revision: https://reviews.llvm.org/D65522 llvm-svn: 368438
* GlobalISel: pack various parameters for lowerCall into a struct.Tim Northover2019-08-092-23/+19
| | | | | | | | | I've now needed to add an extra parameter to this call twice recently. Not only is the signature getting extremely unwieldy, but just updating all of the callsites and implementations is a pain. Putting the parameters in a struct sidesteps both issues. llvm-svn: 368408
* [X86] Remove code that expands truncating stores from combineStore.Craig Topper2019-08-091-76/+1
| | | | | | | We shouldn't form trunc stores that need to be expanded now that we are using widening legalization. llvm-svn: 368400
* [X86] Remove stale FIXME from combineMaskedStore. NFCCraig Topper2019-08-091-4/+0
| | | | | | | I believe PR34584 was tracking that FIXME, but its since been closed and a test case was added. llvm-svn: 368397
* [X86] Remove DAG combine expansion of extending masked load and truncating ↵Craig Topper2019-08-091-181/+24
| | | | | | | | | | masked store. The only way to generate these was through promoting legalization of narrow vectors, but we widen those types now. So we shouldn't produce these nodes. llvm-svn: 368396
* [X86] Remove handler for (U/S)(ADD/SUB)SAT from ReplaceNodeResults. Remove ↵Craig Topper2019-08-091-9/+4
| | | | | | | | TypeWidenVector check from code that handles X86ISD::VPMADDWD and X86ISD::AVG. More unneeded code since we now legalize narrow vectors by widening. llvm-svn: 368395
* [X86] Remove ISD::SETCC handling from ReplaceNodeResults.Craig Topper2019-08-091-27/+0
| | | | | | This is no longer needed since we widen v2i32 instead of promoting. llvm-svn: 368394
* [X86] Simplify ISD::LOAD handling in ReplaceNodeResults and ISD::STORE ↵Craig Topper2019-08-091-12/+10
| | | | | | handling in LowerStore now that v2i32 is widened to v4i32. llvm-svn: 368390
* [X86] Merge v2f32 and v2i32 gather/scatter handling in ↵Craig Topper2019-08-091-86/+12
| | | | | | ReplaceNodeResults/LowerMSCATTER now that v2i32 is also widened like v2f32. llvm-svn: 368389
* [X86] Now unreachable handling for f64->v2i32/v4i16/v8i8 bitcasts from ↵Craig Topper2019-08-091-14/+0
| | | | | | | | ReplaceNodeResults. We rely on the generic type legalizer for this now. llvm-svn: 368388
* [X86] Simplify ReplaceNodeResults handling for FP_TO_SINT/UINT for vectors ↵Craig Topper2019-08-091-44/+10
| | | | | | to only handle widening. llvm-svn: 368387
* [X86] Simplify ReplaceNodeResults handling for ↵Craig Topper2019-08-091-4/+5
| | | | | | SIGN_EXTEND/ZERO_EXTEND/TRUNCATE for vectors to only handle widening. llvm-svn: 368386
* [X86] Simplify ReplaceNodeResults handling for UDIV/UREM/SDIV/SREM for ↵Craig Topper2019-08-091-12/+3
| | | | | | vectors to only handle widening. llvm-svn: 368385
* [X86] Remove vector promotion handling from the ReplaceNodeResults ISD::MUL ↵Craig Topper2019-08-091-28/+14
| | | | | | | | handling code. We now widen illegal vector types so we don't need this anymore. llvm-svn: 368384
* [X86] Improve codegen of v8i64->v8i16 and v16i32->v16i8 truncate with ↵Craig Topper2019-08-081-1/+22
| | | | | | | | | | | | avx512vl, avx512bw, min-legal-vector-width<=256 and prefer-vector-width=256 Under this configuration we'll want to split the v8i64 or v16i32 into two vectors. The default legalization will try to truncate each of those 256-bit pieces one step to 128-bit, concatenate those, then truncate one more time from the new 256 to 128 bits. With this patch we now truncate the two splits to 64-bits then concatenate those. We have to do this two different ways depending on whether have widening legalization enabled. Without widening legalization we have to manually construct X86ISD::VTRUNC to prevent the ISD::TRUNCATE with a narrow result being promoted to 128 bits with a larger element type than what we want followed by something like a pshufb to grab the lower half of each element to finish the job. With widening legalization we just get the right thing. When we switch to widening by default we can just delete the other code path. Differential Revision: https://reviews.llvm.org/D65626 llvm-svn: 368349
* [X86] Make CMPXCHG16B feature imply CMPXCHG8B feature.Craig Topper2019-08-081-1/+2
| | | | | | | | | | This fixes znver1 so that it properly enables CMPXHG8B. We can probably remove explicit CMPXCHG8B from CPUs that also have CMPXCHG16B, but keeping this simple to allow cherry pick to 9.0. Fixes PR42935. llvm-svn: 368324
* [X86] XFormVExtractWithShuffleIntoLoad - handle shuffle mask scalingSimon Pilgrim2019-08-081-13/+27
| | | | | | | | If the target shuffle mask is from a wider type, attempt to scale the mask so that the extraction can attempt to peek through. Fixes the regression mentioned in rL368307 llvm-svn: 368308
* [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using ↵Simon Pilgrim2019-08-081-0/+17
| | | | | | | | | | | | DemandedElts mask If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit. llvm-svn: 368307
* [X86][SSE] matchBinaryPermuteShuffle - split INSERTPS combinesSimon Pilgrim2019-08-081-8/+17
| | | | | | We need to prefer INSERTPS with zeros over SHUFPS, but fallback to INSERTPS if that fails. llvm-svn: 368292
* [X86] Remove -x86-experimental-vector-widening-legalization command line ↵Craig Topper2019-08-081-354/+45
| | | | | | | | | option and all its uses. This option is now defaulted to true and we don't want to support turning it off so remove the option. llvm-svn: 368258
* [X86] Add CMOV_FR32X and CMOV_FR64X to the isCMOVPseudo function.Craig Topper2019-08-081-0/+2
| | | | llvm-svn: 368250
* Recommit "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG"Amy Huang2019-08-071-0/+5
| | | | | | with a fix to clear the SDNode map when SelectionDAG is cleared. llvm-svn: 368230
* [X86] Allow pack instructions to be used for 512->256 truncates when ↵Craig Topper2019-08-071-2/+9
| | | | | | | | | | -mprefer-vector-width=256 is causing 512-bit vectors to be split If we're splitting the 512-bit vector anyway and we have zero/sign bits, then we might as well use pack instructions to concat and truncate at once. Differential Revision: https://reviews.llvm.org/D65904 llvm-svn: 368210
OpenPOWER on IntegriCloud