summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* Reduce variable scope to just the if() block its actually used in. NFCI.Simon Pilgrim2019-05-031-2/+1
| | | | llvm-svn: 359869
* [X86] Add more one checks to masked compare patterns that were missed in ↵Craig Topper2019-05-031-46/+48
| | | | | | | | | r358358. This covers the patterns we use for widening 128/256 comparisons to 512-bit when AVX512VL isn't supported. llvm-svn: 359863
* [X86] Remove LEA16r references from X86FixupLEAs. NFCICraig Topper2019-05-021-9/+2
| | | | | | As far as I know, we never emit LEA16r llvm-svn: 359840
* [X86] Correct the register class for specific mask register constraints in ↵Craig Topper2019-05-021-0/+28
| | | | | | | | | | | | | | | | getRegForInlineAsmConstraint when the VT is a scalar type The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes. Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying. This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg. Fixes PR41678 Differential Revision: https://reviews.llvm.org/D61453 llvm-svn: 359837
* [X86] Remove string literal from an if. NFCCraig Topper2019-05-021-2/+1
| | | | | | | | This if used to be an assert that got refactored into an if, but left the string literal behind. Fixes PR41718 llvm-svn: 359833
* [X86][SSE] lowerAddSubToHorizontalOp - enable ymm extraction+foldSimon Pilgrim2019-05-021-6/+5
| | | | | | | | | | | Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway. Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result: https://godbolt.org/z/0R-U-K Differential Revision: https://reviews.llvm.org/D61426 llvm-svn: 359786
* [X86][SSE] Move shouldUseHorizontalOp inside isHorizontalBinOp. NFCI.Simon Pilgrim2019-05-021-13/+15
| | | | | | Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921 llvm-svn: 359782
* [X86] Remove the redundant suffix in vfpclassp[d,s]'s broadcasting variantCraig Topper2019-05-021-9/+9
| | | | | | | | | | The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D61295 llvm-svn: 359753
* [X86][SSE] Fold scalar horizontal add/sub for non-0/1 element extractionsSimon Pilgrim2019-05-011-6/+11
| | | | | | | | We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector). Differential Revision: https://reviews.llvm.org/D61263 llvm-svn: 359707
* Fix 80 column violation. NFCI.Simon Pilgrim2019-05-011-5/+6
| | | | llvm-svn: 359694
* [X86][SSE] Add demanded elts support X86ISD::PMULDQ\PMULUDQSimon Pilgrim2019-05-011-3/+24
| | | | | | Add to SimplifyDemandedVectorEltsForTargetNode and SimplifyDemandedBitsForTargetNode llvm-svn: 359686
* [X86][SSE] Add SSE vector shift support to ↵Simon Pilgrim2019-05-011-0/+21
| | | | | | SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359680
* [X86][SSE] Split 512-bit -> 128-bit vector directly in ↵Simon Pilgrim2019-05-011-1/+4
| | | | | | SimplifyDemandedVectorEltsForTargetNode llvm-svn: 359678
* [X86][SSE] Add 512-bit vector support to ↵Simon Pilgrim2019-05-011-8/+15
| | | | | | SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359677
* [X86][SSE] Add X86ISD::PACKSS\PACKUS to ↵Simon Pilgrim2019-05-011-1/+7
| | | | | | SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359673
* [X86][SSE] Add X86ISD::UNPCKL\UNPCK to ↵Simon Pilgrim2019-05-011-2/+4
| | | | | | SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359670
* [X86][SSE] Move extract_subvector(pshufb) fold to ↵Simon Pilgrim2019-05-011-12/+3
| | | | | | | | SimplifyDemandedVectorEltsForTargetNode This lets us hit more cases than combineExtractSubvector and allows us reuse more code. llvm-svn: 359669
* [X86] SimplifyDemandedVectorEltsForTargetNode - pull out vector halving ↵Simon Pilgrim2019-05-011-10/+13
| | | | | | | | code. NFCI. Pull out the HADD/HSUB code to halve vector widths if the upper half isn't used - prep work to adding support for other opcodes. llvm-svn: 359667
* [X86][SSE] Extract i1 elements from vXi1 bool vectorsSimon Pilgrim2019-05-011-0/+33
| | | | | | | | This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK. Differential Revision: https://reviews.llvm.org/D61189 llvm-svn: 359666
* [X86FixupLEAs] Hoist the calls to isLEA out of the 3 separate functions and ↵Craig Topper2019-05-011-14/+9
| | | | | | | | put it in the basic block instruction loop. NFC Now need to check it 3 different times. Just do it once at the top of the loop. llvm-svn: 359658
* [X86][SSE] Fold extract_subvector(extend(x)) -> extend_vector_inreg(x)Simon Pilgrim2019-04-301-5/+7
| | | | | | | | This adds any extend support - folding to zero_extend_vector_inreg (PMOVZX) for legality Minor improvement for PR39709 llvm-svn: 359608
* [X86] Remove if that's always trueCraig Topper2019-04-301-2/+1
| | | | | | | | It's been like this since it was added in a refactor of this code. Fixes PR41659 llvm-svn: 359597
* [X86] If PreprocessISelDAG reorders a load before a call, make sure we ↵Craig Topper2019-04-301-0/+5
| | | | | | | | | | | | remove dead nodes from the graph The reordering can leave at least a dead TokenFactor in the graph. This cause the linearize scheduler to fail with something like the assert seen in PR22614. This is only one of many ways we can break the linearize scheduler today so I can't say for sure that any of the other failures in that bug were caused by this issue. This takes the heavy hammer approach of just running RemoveDeadNodes unconditionally at the end of the PreprocessISelDAG. If this turns out to be a compile time hit, we can try to refine it. Differential Revision: https://reviews.llvm.org/D61164 llvm-svn: 359582
* [X86] Initial cleanups on the FixupLEAs pass. Separate Atom LEA creation ↵Craig Topper2019-04-301-91/+75
| | | | | | | | | | | | | | | | | | | | | | from other LEA optimizations. This removes some of the class variables. Merge basic block processing into runOnMachineFunction to keep the flags local. Pass MachineBasicBlock around instead of an iterator. We can get the iterator in the few places that need it. Allows a range-based outer for loop. Separate the Atom optimization from the rest of the optimizations. This allows fixupIncDec to create INC/DEC and still allow Atom to turn it back into LEA when profitable by its heuristics. I'd like to improve fixupIncDec to turn LEAs into ADD any time the base or index register is equal to the destination register. This is profitable regardless of the various slow flags. But again we would want Atom to be able to undo that. Differential Revision: https://reviews.llvm.org/D60993 llvm-svn: 359581
* Fix for bug 41512: lower INSERT_VECTOR_ELT(ZeroVec, 0, Elt) to ↵Simon Pilgrim2019-04-301-0/+8
| | | | | | | | | | | | | | | | | SCALAR_TO_VECTOR(Elt) for all SSE flavors Current LLVM uses pxor+pinsrb on SSE4+ for INSERT_VECTOR_ELT(ZeroVec, 0, Elt) insead of much simpler movd. INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is idiomatic construct which is used e.g. for _mm_cvtsi32_si128(Elt) and for lowest element initialization in _mm_set_epi32. So such inefficient lowering leads to significant performance digradations in ceratin cases switching from SSSE3 to SSE4. https://bugs.llvm.org/show_bug.cgi?id=41512 Here INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is simply converted to SCALAR_TO_VECTOR(Elt) when applicable since latter is closer match to desired behavior and always efficiently lowered to movd and alike. Committed on behalf of @Serge_Preis (Serge Preis) Differential Revision: https://reviews.llvm.org/D60852 llvm-svn: 359545
* [TargetLowering] Change getOptimalMemOpType to take a function attribute listSjoerd Meijer2019-04-302-8/+6
| | | | | | | | | | | | The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537
* [X86] Run CFIInstrInserter on Windows if Dwarf is usedMartin Storsjo2019-04-291-1/+5
| | | | | | | | | | | This is necessary since SVN r330706, as tail merging can include CFI instructions since then. This fixes PR40322 and PR40012. Differential Revision: https://reviews.llvm.org/D61252 llvm-svn: 359496
* [X86][SSE] isHorizontalBinOp - add support for target shufflesSimon Pilgrim2019-04-291-30/+49
| | | | | | | | | | | | Add target shuffle decoding to isHorizontalBinOp as well as ISD::VECTOR_SHUFFLE support. This does mean we can go through bitcasts so we need to bitcast the extracted args to ensure they are the correct type Fixes PR39936 and should help with PR39920/PR39921 Differential Revision: https://reviews.llvm.org/D61245 llvm-svn: 359491
* [X86] scaleShuffleMask - avoid potential signed overflow warning.Simon Pilgrim2019-04-291-3/+3
| | | | | | | | | | Use size_t assignment to prevent a bad explicit type conversion warning. Given the typical size of shuffle masks this was never going to happen, but this at least stops the warning. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359479
* [X86] Remove some intel syntax aliases on (v)cvtpd2(u)dq, (v)cvtpd2ps, ↵Craig Topper2019-04-292-42/+160
| | | | | | | | | | | | | | | | | | | (v)cvt(u)qq2ps. Add 'x' and'y' suffix aliases to masked version of the same in att syntax. The 128/256 bit version of these instructions require an 'x' or 'y' suffix to disambiguate the memory form in att syntax. We were allowing the same suffix in intel syntax, but it appears gas does not do that. gas does allow the 'x' and 'y' suffix on register and broadcast forms even though its not needed. We were allowing it on unmasked register form, but not on masked versions or on masked or unmasked broadcast form. While there fix some test coverage holes so they can be extended with the 'x' and 'y' suffix tests. llvm-svn: 359418
* [X86][SSE] combineExtractVectorElt - add early-out to return zero/undef for ↵Simon Pilgrim2019-04-281-2/+4
| | | | | | out-of-range extraction indices. llvm-svn: 359406
* [X86][AVX] Combine non-lane crossing binary shuffles using X86ISD::VPERMV3Simon Pilgrim2019-04-281-0/+22
| | | | | | Some of the combines might be further improved if we lower more shuffles with X86ISD::VPERMV3 directly, instead of waiting to combine the results. llvm-svn: 359400
* [X86][SSE] Optimize llvm.experimental.vector.reduce.xor.vXi1 parity ↵Simon Pilgrim2019-04-281-2/+12
| | | | | | | | | | reduction (PR38840) An xor reduction of a bool vector can be optimized to a parity check of the MOVMSK/BITCAST'd integer - if the population count is odd return 1, else return 0. Differential Revision: https://reviews.llvm.org/D61230 llvm-svn: 359396
* [X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result ↵Craig Topper2019-04-283-70/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MOVD/MOVQ and COPY_TO_REGCLASS instead Summary: The register form of these instructions are CodeGenOnly instructions that cover GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for the opposite bitcast. Due to the patterns using bitcasts these instructions get marked as "bitcast" machine instructions as well. The peephole pass is able to look through these as well as other copies to try to avoid register bank copies. Because FR32/FR64/VR128 are all coalescable to each other we can end up in a situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to GR32->GR64 which the copyPhysReg code can't handle. To prevent this, this patch removes one set of the 'bitcast' instructions. So now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that converts from GR32/GR64->VR128 has no special significance to the peephole pass and won't be looked through. I guess the other option would be to add support to copyPhysReg to just promote the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined anyway. But removing the CodeGenOnly instruction in favor of one that won't be optimized seemed safer. I deleted the peephole test because it couldn't be made to work with the bitcast instructions removed. The load version of the instructions were unnecessary as the pattern that selects them contains a bitcasted load which should never happen. Fixes PR41619. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61223 llvm-svn: 359392
* Revert rL359389: [X86][SSE] Add support for <64 x i1> bool reductionSimon Pilgrim2019-04-271-12/+10
| | | | | | | | Minor generalization of the existing <32 x i1> pre-AVX2 split code. ........ Causing irregular buildbot failures. llvm-svn: 359391
* [X86][SSE] Add support for <64 x i1> bool reductionSimon Pilgrim2019-04-271-10/+12
| | | | | | Minor generalization of the existing <32 x i1> pre-AVX2 split code. llvm-svn: 359389
* [X86][AVX512] Improve vector bool reductionsSimon Pilgrim2019-04-271-11/+22
| | | | | | As predicate masks are legal on AVX512 targets, we avoid MOVMSK in these cases, but we can just bitcast the bool vector to the integer equivalent directly - avoiding expansion of the reduction to a shuffle pattern. llvm-svn: 359386
* [X86][AVX] Merge mask select with shuffles across extract_subvector (PR40332)Simon Pilgrim2019-04-271-4/+44
| | | | | | | | Fixes PR40332 in the limited case where we're selecting between a target shuffle and a zero vector. We can extend this in the future to handle more opcodes and non-zero selections. llvm-svn: 359378
* [X86] Use MOVQ for i64 atomic_stores when SSE2 is enabledCraig Topper2019-04-275-19/+72
| | | | | | | | | | | | | | | | Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store. Reviewers: spatel, RKSimon, reames, jfb, efriedma Reviewed By: RKSimon Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60546 llvm-svn: 359368
* [AsmPrinter] refactor to support %c w/ GlobalAddress'Nick Desaulniers2019-04-262-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when printing the address of a MachineOperand::MO_GlobalAddress. Move that handling into a new overriden method in each base class. A virtual method was added to the base class for handling the generic case. Refactors a few subclasses to support the target independent %a, %c, and %n. The patch also contains small cleanups for AVRAsmPrinter and SystemZAsmPrinter. It seems that NVPTXTargetLowering is possibly missing some logic to transform GlobalAddressSDNodes for TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended inline assembly asm constraints. Fixes: - https://bugs.llvm.org/show_bug.cgi?id=41402 - https://github.com/ClangBuiltLinux/linux/issues/449 Reviewers: echristo, void Reviewed By: void Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60887 llvm-svn: 359337
* [X86][AVX] Fold extract_subvector(broadcast(x)) -> broadcast(x) iff x has ↵Simon Pilgrim2019-04-261-0/+7
| | | | | | one use llvm-svn: 359332
* [X86] Sink NoRegister creation for unused Base/Index registers into ↵Craig Topper2019-04-261-36/+26
| | | | | | getAddressOperands. NFCI llvm-svn: 359318
* [X86] Segment registers should have i16 type not i32.Craig Topper2019-04-261-1/+1
| | | | | | Probably doesn't really matter, but was inconsistent with the rest of the code. llvm-svn: 359317
* Fix Wparentheses warning. NFCI.Simon Pilgrim2019-04-261-5/+5
| | | | llvm-svn: 359299
* [X86][SSE] Pull out OR(EXTRACTELT(X,0),OR(EXTRACTELT(X,1),...)) matching ↵Simon Pilgrim2019-04-261-50/+59
| | | | | | | | | | code from LowerVectorAllZeroTest Create a matchBitOpReduction helper that checks for the pattern with any opcode. First step towards reusing this code to recognize other scalar reduction patterns. llvm-svn: 359296
* [X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 ↵Simon Pilgrim2019-04-263-2/+23
| | | | | | | | | | targets (PR40758) As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs). Differential Revision: https://reviews.llvm.org/D61068 llvm-svn: 359293
* [X86][AVX] Combine shuffles extracted from a common vectorSimon Pilgrim2019-04-261-0/+45
| | | | | | | | A small step towards combining shuffles across vector sizes - this recognizes when a shuffle's operands are all extracted from the same larger source and tries to combine to an unary shuffle of that source instead. Fixes one of the test cases from PR34380. Differential Revision: https://reviews.llvm.org/D60512 llvm-svn: 359292
* Assigning to a local object in a return statement prevents copy elision. NFC.David Blaikie2019-04-251-1/+2
| | | | | | | | | | | | | | | | I added a diagnostic along the lines of `-Wpessimizing-move` to detect `return x = y` suppressing copy elision, but I don't know if the diagnostic is really worth it. Anyway, here are the places where my diagnostic reported that copy elision would have been possible if not for the assignment. P1155R1 in the post-San-Diego WG21 (C++ committee) mailing discusses whether WG21 should fix this pitfall by just changing the core language to permit copy elision in cases like these. (Kona update: The bulk of P1155 is proceeding to CWG review, but specifically *not* the parts that explored the notion of permitting copy-elision in these specific cases.) Reviewed By: dblaikie Author: Arthur O'Dwyer Differential Revision: https://reviews.llvm.org/D54885 llvm-svn: 359236
* [X86][SSE] combineBitcastvxi1 - add support for bitcasting to non-scalar ↵Simon Pilgrim2019-04-251-3/+6
| | | | | | | | | | integers Truncate the movmsk scalar integer result to the equivalent scalar integer width as before but then bitcast to the requested type. We still have the issue identified in PR41594 but D61114 should handle this. llvm-svn: 359176
* [X86] Remove part of an if condition that should always be true.Craig Topper2019-04-251-1/+1
| | | | | | | | The IndexReg will always be non-null at this point. Earlier in the function, if IndexReg was null we set it to CurDAG->getRegister(0, VT) which made it non-null. llvm-svn: 359170
OpenPOWER on IntegriCloud