summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Add custom type legalization for v2i8/v4i8/v8i8 mul under ↵Craig Topper2018-11-161-2/+20
| | | | | | | | -x86-experimental-vector-widening. By early promoting the multiply to use an i16 element type we can avoid op legalization emit a second multiply for the 8 upper elements of the v16i8 type we would otherwise get. llvm-svn: 347032
* [X86] Use ANY_EXTEND instead of SIGN_EXTEND in the AVX2 and later path for ↵Craig Topper2018-11-161-2/+2
| | | | | | | | | | legalizing vXi8 multiply. We aren't going to use the upper bits of the multiply result that the extend would effect. So we don't need a specific type of extend. This makes some reduction test cases shorter because we were previously trying to sign_extend a truncate which we can't eliminate. llvm-svn: 347011
* [X86] Update a couple comments to remove a mention of a sign extending that ↵Craig Topper2018-11-161-3/+3
| | | | | | no longer happens. NFC llvm-svn: 347010
* [X86] Remove ANY_EXTEND special case from canReduceVMulWidthCraig Topper2018-11-151-18/+2
| | | | | | | | | | Removing this code doesn't affect any lit tests so it doesn't appear to be tested anymore. I assume it was when it was added, but I guess something else changed? Code coverage report also says its unused. I mostly didn't like that it seemed to count the sign bits as if it was a sign_extend, but then set isPositive as if it was a zero_extend. It feels like we should have picked one interpretation? Differential Revision: https://reviews.llvm.org/D54596 llvm-svn: 346995
* [X86] Minor cleanup to getExtendInVec. NFCICraig Topper2018-11-151-4/+7
| | | | | | | | | | Use unsigned to calculate the subvector index to avoid a cast. Remove an unnecessary condition and replace it with a stronger assert. Use the InVT variable we updated when we extracted instead of grabbing it from the In SDValue. llvm-svn: 346983
* [X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and ↵Craig Topper2018-11-151-14/+22
| | | | | | | | | | | | combineMulToPMADDWD In reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us. In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs. Differential Revision: https://reviews.llvm.org/D54512 llvm-svn: 346980
* [X86] Fix MCNullStreamer support for modules with a CodeView flag Simon Pilgrim2018-11-151-8/+8
| | | | | | | | | | | | This fixes -filetype=null support when compiling for a Win32 target and the module has a CodeView flag. The only places changed are the uses of getTargetStreamer function - this patch guards both of them with null checks. Committed on behalf of @eush (Eugene Sharygin) Differential Revision: https://reviews.llvm.org/D54008 llvm-svn: 346962
* [X86] Add some custom type legalization rules for truncate with ↵Craig Topper2018-11-151-0/+64
| | | | | | | | -x86-experimental-vector-widening-legalization. This avoids some nasty shuffles when we have avx512. It will also prevent using zmm truncate instructions when a ymm instruction that zeroes part of an xmm register will do. Also avoid using avx512 truncate instructions when the input is 128 bits or less. These instructions are 2 uops on skx so we can probably find a better single uop shuffle like pshufb. llvm-svn: 346936
* [X86] Don't mark SEXTLOADS with narrow types as Custom with ↵Craig Topper2018-11-151-8/+27
| | | | | | | | -x86-experimental-vector-widening-legalization. The narrow types end up requesting widening, but generic legalization will end up scalaring and using a build_vector to do the widening. llvm-svn: 346916
* [X86] Remove unused variableBenjamin Kramer2018-11-141-1/+0
| | | | llvm-svn: 346909
* [X86] Support v2i32/v4i16/v8i8 load/store using f64 on 32-bit targets under ↵Craig Topper2018-11-141-15/+38
| | | | | | | | | | -x86-experimental-vector-widening-legalization. On 64-bit targets the type legalizer will use i64 to legalize these. But when i64 isn't legal, the type legalizer won't try an FP type. So do it manually instead. There are a few regressions in here due to some v2i32 operations like mul and div now being reassembled into a full vector just to store instead of storing the pieces. But this was already occuring in 64-bit mode so its not a new issue. llvm-svn: 346908
* Bias physical register immediate assignmentsNirav Dave2018-11-142-4/+4
| | | | | | | | | | | | | | | | | | | | | | | The machine scheduler currently biases register copies to/from physical registers to be closer to their point of use / def to minimize their live ranges. This change extends this to also physical register assignments from immediate values. This causes a reduction in reduction in overall register pressure and minor reduction in spills and indirectly fixes an out-of-registers assertion (PR39391). Most test changes are from minor instruction reorderings and register name selection changes and direct consequences of that. Reviewers: MatzeB, qcolombet, myatsina, pcc Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya, javed.absar, arphaman, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54218 llvm-svn: 346894
* [X86] Allow pmulh to be formed from narrow vXi16 vectors under ↵Craig Topper2018-11-141-2/+4
| | | | | | | | | | -x86-experimental-vector-widening-legalization Narrower vectors will be widened to 128 bits without changing the element size. And generic type legalization can already handle widening mulhu/mulhs. Differential Revision: https://reviews.llvm.org/D54513 llvm-svn: 346879
* [CostModel] Add generic expansion funnel shift cost supportSimon Pilgrim2018-11-141-13/+11
| | | | | | | | Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost. This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs. llvm-svn: 346854
* [X86][AVX512] Remove constant pool shuffle decoding from SelectionDAGSimon Pilgrim2018-11-141-3/+3
| | | | | | | | | | This patch removes the last use of the constant pool shuffle decode helper and consistently uses the 'getTargetShuffleMaskIndices' versions instead. The constant pool versions are now purely used for assembly comments. The avx512vbmi intrinsic upgrades had to be altered as they were being decoded as broadcasts, similar to what I fixed in rL346032. I don't think the change is critical - although its annoying that we lose the {k}{z} instruction test coverage as they are tricky to generate.... Differential Revision: https://reviews.llvm.org/D54083 llvm-svn: 346850
* [SelectionDAG][X86] Relax restriction on the width of an input to ↵Craig Topper2018-11-135-210/+245
| | | | | | | | | | | | | | | | | | *_EXTEND_VECTOR_INREG. Use them and regular *_EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784
* [CostModel][X86] Fix constant vector XOP rights shiftsSimon Pilgrim2018-11-131-2/+11
| | | | | | | | We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760
* Fix comment for XOP rotates. NFCI.Simon Pilgrim2018-11-131-1/+1
| | | | llvm-svn: 346753
* [X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387)Simon Pilgrim2018-11-121-8/+115
| | | | | | | | This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location. Differential Revision: https://reviews.llvm.org/D54267 llvm-svn: 346706
* [X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of ↵Craig Topper2018-11-121-13/+18
| | | | | | | | | | directly emitting PACKUS. Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis. This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not. llvm-svn: 346697
* [CostModel][X86] Add funnel shift rotation special case costsSimon Pilgrim2018-11-121-1/+82
| | | | | | When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688
* [CostModel][X86] Add SHLD/SHRD scalar funnel shift costsSimon Pilgrim2018-11-121-2/+11
| | | | | | The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level llvm-svn: 346683
* [CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is ↵Simon Pilgrim2018-11-121-5/+13
| | | | | | aligned within the source vector llvm-svn: 346664
* [X86] Use DAG.getConstant instead of getZeroVector.Craig Topper2018-11-111-1/+1
| | | | llvm-svn: 346605
* [X86] Replace calls to getOnesVector/getZeroVector with getConstant.Craig Topper2018-11-111-2/+2
| | | | | | getConstant will create a BUILD_VECTOR for us and use a legal type if necessary. So just create the simple node and let BUILD_VECTOR legalization do the canonicalization. llvm-svn: 346603
* [X86] Remove unused variableBenjamin Kramer2018-11-101-1/+0
| | | | llvm-svn: 346592
* [X86] Remove apparently unneeded code from combineVSZext.Craig Topper2018-11-101-50/+0
| | | | | | | | No lit tests fail with this code removed. This is a pre-commit for D54346. llvm-svn: 346590
* [CostModel][X86] SK_ExtractSubvector costs must only be tested for vector ↵Simon Pilgrim2018-11-101-1/+1
| | | | | | types (PR39615) llvm-svn: 346589
* [X86][BdVer2] Fix loads/stores throughput for Piledriver (PR39465)Roman Lebedev2018-11-101-4/+4
| | | | | | | | | | | | | There are two AGU units, and per 1cy, there can be either two loads, or a load and a store; but not two stores, or two loads and a store. Additionally, loads shouldn't affect the store scheduler and vice versa. (but *should* affect the PdEX scheduler.) Required rL346545. Fixes https://bugs.llvm.org/show_bug.cgi?id=39465 llvm-svn: 346587
* [X86] Use a MOVSX instruction instead of a MOVZX instruction in isel for an ↵Craig Topper2018-11-101-0/+9
| | | | | | | | any_extend of the remainder from an 8-bit sdivrem. The sdivrem will emit its own MOVSX to move %ah to the low byte of a register. By using a MOVSX for an any_extend this allows a post-isel peephole to merge them. llvm-svn: 346581
* [X86] In LowerHorizontalByteSum, emit vector_shuffle nodes instead of ↵Craig Topper2018-11-101-5/+5
| | | | | | | | | | directly using X86ISD::UNPCKL/X86ISD::UNPCKH. This gives shuffle lowering the freedom to use zero_extend_vector_inreg for the unpckl shuffle. Shuffle combining usually makes this swap later, but not when AVX512 is enabled it seems. While there also use DAG.getConstant to create a 0 vector instead of using the helper the forces a specific BUILD_VECTOR. I don't think that helper is usually needed. We're basically free to create a constant build_vector anytime and it will be legalized on its own. llvm-svn: 346574
* [X86] Move the promotion of v16i16->v16i8 for avx512f but not avx512bw from ↵Craig Topper2018-11-092-8/+18
| | | | | | | | | | lowering to isel. Change to use vpmovzx instead of vpmovsx. With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs. llvm-svn: 346552
* [X86] Turn X86ISD::VSEXT into X86ISD::VZEXT if the upper bits aren't demanded.Craig Topper2018-11-091-0/+12
| | | | | | | | This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND. I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch. llvm-svn: 346539
* [CostModel][X86] SK_ExtractSubvector is free if the subvector is at the ↵Simon Pilgrim2018-11-091-181/+187
| | | | | | start of the source vector llvm-svn: 346538
* [x86] try to form broadcast before widening shuffle elementsSanjay Patel2018-11-091-0/+8
| | | | | | | | | | | | | | I noticed that we weren't generating broadcasts as much I thought we would with D54271, and this is part of the problem. Widening the shuffle elements means adding bitcasts and hiding the relationship between a splatted scalar and the vector. If we can form a broadcast, do that before going through the rest of the shuffle lowering because broadcasts should be cheap and can often be load-folded. Differential Revision: https://reviews.llvm.org/D54280 llvm-svn: 346498
* [X86] Add Subtarget to more lowerVectorShuffle functions. NFCI.Simon Pilgrim2018-11-091-22/+26
| | | | | | This will be necessary for an update to D54267 llvm-svn: 346490
* [llvm-exegesis][NFC] Add a way to declare the default counter binding for ↵Clement Courbet2018-11-091-0/+4
| | | | | | | | | | | | | | | | unbound CPUs for a target. Summary: This simplifies the code and moves everything to tablegen for consistency. This also prepares the ground for adding issue counters. Reviewers: gchatelet, john.brawn, jsji Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54297 llvm-svn: 346489
* [X86] Fix VZEROUPPER scheduling info on SNB,HSW,BDW,SXL,SKX.Clement Courbet2018-11-095-12/+19
| | | | | | | | | | | | | | | | | | | | | | | Summary: Starting from SNB, VZEROUPPER is handled by the renamer and uses no proc resources. After HSW, it also has zero latency. This fixes PR35606. To reproduce: Uops: llvm-exegesis -mode=uops -opcode-name=VZEROUPPER Latency: echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper' | /tmp/llvm-exegesis -mode=latency -snippets-file=- echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper\naddps %xmm0, %xmm1' | /tmp/llvm-exegesis -mode=latency -snippets-file=- Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D54107 llvm-svn: 346482
* [x86] use shuffles for scalar insertion into high elements of a constant vectorSanjay Patel2018-11-081-4/+18
| | | | | | | | | | | | | | | As discussed in D54073, we have a potential regression from more aggressive vector narrowing here, so let's try to avoid that by changing build-vector lowering slightly. Insert-vector-element lowering always does this since there's no "pinsr" for ymm/zmm: // If the vector is wider than 128 bits, extract the 128-bit subvector, insert // into that, and then insert the subvector back into the result. ...but we can sometimes do better for insert-into-constant-vector by using shuffle lowering. Differential Revision: https://reviews.llvm.org/D54271 llvm-svn: 346433
* [X86] improve split-stack machine BB placementThan McIntosh2018-11-071-2/+2
| | | | | | | | | | | | | | | Summary: The conditional branch created to support -fsplit-stack for X86 is left unbiased/unhinted, resulting in less than ideal block placement: the __morestack call block is kept on the main hot path. Bias the branch to insure that the stack allocation block is treated as a "cold" block during machine basic block placement. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54123 llvm-svn: 346336
* fix typos aggressively; NFCSanjay Patel2018-11-071-1/+1
| | | | llvm-svn: 346316
* [X86][FixupLEA] Avoid checking target features for every single processed ↵Andrea Di Biagio2018-11-071-18/+24
| | | | | | instruction. NFCI llvm-svn: 346309
* [X86] Add custom promotion of v2i8/v2i16 fp_to_sint to avoid over promotion ↵Craig Topper2018-11-061-1/+24
| | | | | | to v2i64 which would force scalarization. llvm-svn: 346259
* LivePhysRegs/IfConversion: Change some types from unsigned to MCPhysReg; NFCMatthias Braun2018-11-061-1/+1
| | | | | | | | Change the type in a couple of lists and sets that only store physical registers from unsigned to MCPhysRegs. The later is only 16bits and saves us a bit of memory. llvm-svn: 346254
* [X86][NFC] Fix comment.Clement Courbet2018-11-061-4/+4
| | | | llvm-svn: 346226
* [TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take ↵Craig Topper2018-11-052-3/+3
| | | | | | | | an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. llvm-svn: 346180
* [X86] Don't turn any_extend from a mask register into a sign_extend during ↵Craig Topper2018-11-052-2/+14
| | | | | | | | | | lowering. Add patterns to match any_extend during isel instead. SimplifyDemandedBits can turn a sign_extend back into an any_extend and trigger an infinite loop. So instead legalize it the same way as a sign_extend, but preserve the opcode. Then just pattern match it the same as sign_extend during isel. I don't have a reduced test case for such an infinite loop yet. llvm-svn: 346170
* [X86] Custom type legalize v2i8/v2i16/v2i32 mul to use to pmuludq.Craig Topper2018-11-051-0/+42
| | | | | | v2i8/v2i16/v2i32 are promoted to v2i64. pmuludq takes a v2i64 input and produces a v2i64 output. Since we don't about the upper bits of the type legalized multiply we can use the pmuludq to produce the multiply result for the bits we do care about. llvm-svn: 346115
* [X86] Add vector shift by immediate to SimplifyDemandedBitsForTargetNode.Craig Topper2018-11-041-0/+42
| | | | | | | | | | | | | | Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54069 llvm-svn: 346102
* [SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG ↵Craig Topper2018-11-041-14/+19
| | | | | | | | | | nodes. Move asserts into getNode. These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers. The rest of the patch is just changing all callers to use getNode directly. llvm-svn: 346087
OpenPOWER on IntegriCloud