summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [DebugInfo] Stop describing imms in TargetInstrInfo's describeLoadedValue() implDavid Stenberg2019-10-231-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The default implementation of the describeLoadedValue() hook uses the MoveImm property to determine if an instruction moves an immediate. If an instruction has that property the function returns the second operand, assuming that that is the immediate value the instruction moves. As far as I can tell, the MoveImm property does not imply that the second operand is the immediate value, nor that any other operand necessarily holds the immediate value; it just means that the instruction moves some immediate value. One example where the second operand is not the immediate is SystemZ's LZER instruction, which moves a zero immediate implicitly: $f0S = LZER. That case triggered an out-of-bound assertion when getting the operand. I have added a test case for that instruction. Another example is ARM's MVN instruction, which holds the logical bitwise NOT'd value of the immediate that is moved. For the following reproducer: extern void foo(int); int main() { foo(-11); } an incorrect call site value would be emitted: $ clang --target=arm foo.c -O1 -g -Xclang -femit-debug-entry-values \ -c -o - | ./build/bin/llvm-dwarfdump - | \ grep -A2 call_site_parameter 0x00000058: DW_TAG_GNU_call_site_parameter DW_AT_location (DW_OP_reg0 R0) DW_AT_GNU_call_site_value (DW_OP_lit10) Another example is the A2_combineii instruction on Hexagon which moves two immediates to a super-register: $d0 = A2_combineii 20, 10. Perhaps these are rare exceptions, and most MoveImm instructions hold the immediate in the second operand, but in my opinion the default implementation of the hook should only describe values that it can, by some contract, guarantee are safe to describe, rather than leaving it up to the targets to override the exceptions, as that can silently result in incorrect call site values. This patch adds X86's relevant move immediate instructions to the target's hook implementation, so this commit should be a NFC for that target. We need to do the same for ARM and AArch64. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: vsk Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D69109
* [X86][BMI] Pull out schedule classes from bmi_andn<> and bmi_bls<>Simon Pilgrim2019-10-212-14/+15
| | | | | | Stop hardwiring classes llvm-svn: 375470
* [X86][SSE] Add OR(EXTRACTELT(X,0),OR(EXTRACTELT(X,1))) -> MOVMSK+CMP ↵Simon Pilgrim2019-10-211-0/+18
| | | | | | reduction combine llvm-svn: 375463
* [X86] Rename matchBitOpReduction to matchScalarReduction. NFCI.Simon Pilgrim2019-10-211-4/+4
| | | | | | This doesn't need to be just for bitops, but the ops do need to be fully associative. llvm-svn: 375445
* [X86] Check Subtarget.hasSSE3() before calling shouldUseHorizontalOp and ↵Craig Topper2019-10-201-1/+1
| | | | | | | | | | emitting X86ISD::FHADD in LowerUINT_TO_FP_i64. This was a regression from r375341. Fixes PR43729. llvm-svn: 375381
* [X86] Pulled out helper to decode target shuffle element sentinel values to ↵Simon Pilgrim2019-10-191-13/+22
| | | | | | | | 'Zeroable' known undef/zero bits. NFCI. Renamed 'resolveTargetShuffleAndZeroables' to 'resolveTargetShuffleFromZeroables' to match. llvm-svn: 375348
* [X86][SSE] lowerV16I8Shuffle - tryToWidenViaDuplication - undef unpack argsSimon Pilgrim2019-10-191-1/+9
| | | | | | tryToWidenViaDuplication lowers using the shuffle_v8i16(unpack_v16i8(shuffle_v8i16(x),shuffle_v8i16(x))) pattern, but the unpack only needs the even/odd 16i8 args if the original v16i8 shuffle mask references the even/odd elements - which isn't true for many extension style shuffles. llvm-svn: 375342
* [X86][SSE] LowerUINT_TO_FP_i64 - only use HADDPD for size/fast-hopsSimon Pilgrim2019-10-191-12/+11
| | | | | | | | We were always generating a single source HADDPD, but really we should only do this if shouldUseHorizontalOp says its a good idea. Differential Revision: https://reviews.llvm.org/D69175 llvm-svn: 375341
* Prune a LegacyDivergenceAnalysis and MachineLoopInfo include eachReid Kleckner2019-10-191-1/+1
| | | | | | Now X86ISelLowering doesn't depend on many IR analyses. llvm-svn: 375320
* Prune Analysis includes from SelectionDAG.hReid Kleckner2019-10-192-1/+2
| | | | | | Only forward declarations are needed here. Follow-on to r375311. llvm-svn: 375319
* Move endian constant from Host.h to SwapByteOrder.h, prune includeReid Kleckner2019-10-191-2/+3
| | | | | | | | | | | | | | Works on this dependency chain: ArrayRef.h -> Hashing.h -> --CUT-- Host.h -> StringMap.h / StringRef.h ArrayRef is very popular, but Host.h is rarely needed. Move the IsBigEndianHost constant to SwapByteOrder.h. Clients of that header are more likely to need it. llvm-svn: 375316
* Prune two MachineInstr.h includes, fix up depsReid Kleckner2019-10-191-1/+1
| | | | | | | | | | MachineInstr.h included AliasAnalysis.h, which includes a world of IR constructs mostly unneeded in CodeGen. Prune it. Same for DebugInfoMetadata.h. Noticed with -ftime-trace. llvm-svn: 375311
* [X86] Fix register parsing in .seh_* in Intel syntaxReid Kleckner2019-10-181-4/+3
| | | | | | | | | | Previously, the parser checked for a '%' prefix to indicate a register. In Intel syntax mode, LLVM does not print a '%' prefix on registers, so LLVM could not parse its own assembly output. Instead, require that register numbers be integer literals, or at least start with an integer literal, which is consistent with .cfi_* directive register parsing. llvm-svn: 375287
* [GISel][CallLowering] Make isIncomingArgumentHandler a pure virtual methodQuentin Colombet2019-10-181-0/+2
| | | | | | | | | | | | | The default implementation of isIncomingArgumentHandler could lead to generating incorrect code. Make it a pure virtual method, so that targets know they have to override it to produce correct code. NFC Differential Revision: https://reviews.llvm.org/D69187 llvm-svn: 375277
* [X86] combineX86ShufflesRecursively - pull out isTargetShuffleVariableMask. ↵Simon Pilgrim2019-10-181-1/+2
| | | | | | NFCI. llvm-svn: 375253
* [X86] Emit KTEST when possibleDavid Zarzycki2019-10-181-8/+23
| | | | | | https://reviews.llvm.org/D69111 llvm-svn: 375197
* [DAGCombine][ARM] Enable extending masked loadsSam Parker2019-10-171-0/+3
| | | | | | | | | | | Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085
* [Alignment][NFC] Use Align for TargetFrameLowering/SubtargetGuillaume Chatelet2019-10-175-17/+16
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68993 llvm-svn: 375084
* [X86] combineX86ShufflesRecursively - split the getTargetShuffleInputs call ↵Simon Pilgrim2019-10-151-3/+9
| | | | | | | | | | from the resolveTargetShuffleAndZeroables call. Exposes an issue in getFauxShuffleMask where the OR(SHUFFLE,SHUFFLE) decode should always resolve zero/undef elements. Part of the fix for PR43024 where ideally we shouldn't call resolveTargetShuffleAndZeroables for Depth == 0 llvm-svn: 374928
* [X86] Make memcmp() use PTEST if possible and also enable AVX1David Zarzycki2019-10-151-17/+46
| | | | llvm-svn: 374922
* [X86] Resolve KnownUndef/KnownZero bits into target shuffle masks in helper. ↵Simon Pilgrim2019-10-151-9/+18
| | | | | | NFCI. llvm-svn: 374878
* [X86] Don't check for VBROADCAST_LOAD being a user of the source of a ↵Craig Topper2019-10-151-3/+1
| | | | | | | | | | | | | | | | | | VBROADCAST when trying to share broadcasts. The only things VBROADCAST_LOAD uses is an address and a chain node. It has no vector inputs. So if its a user of the source of another broadcast that could only mean one of two things. The other broadcast is broadcasting the address of the broadcast_load. Or the source is a load and the use we're seeing is the chain result from that load. Neither of these cases make sense to combine here. This issue was reported post-commit r373871. Test case has not been reduced yet. llvm-svn: 374862
* [X86] Teach X86MCodeEmitter to properly encode zmm16-zmm31 as index register ↵Craig Topper2019-10-141-0/+3
| | | | | | | | | | | | to vgatherpf/vscatterpf. We need to encode bit 4 into the EVEX.V' bit. We do this right for regular gather/scatter which use either MRMSrcMem or MRMDestMem formats. The prefetches use MRM*m formats. Fixes an issue recently added to PR36202. llvm-svn: 374849
* [CostModel][X86] Add CTLZ scalar costsSimon Pilgrim2019-10-141-1/+22
| | | | | | | | Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it. For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs. llvm-svn: 374786
* [CostModel][X86] Add CTPOP scalar costs (PR43656)Simon Pilgrim2019-10-141-0/+23
| | | | | | | | Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have). For targets supporting POPCNT, we provide overrides that assume 1cy costs. llvm-svn: 374775
* [X86][BtVer2] Improved latency and throughput of float/vector loads and stores.Andrea Di Biagio2019-10-141-6/+6
| | | | | | | | | | | | | | | | | | | | This patch introduces the following changes to the btver2 scheduling model: - The number of micro opcodes for YMM loads and stores is now 2 (it was incorrectly set to 1 for both aligned and misaligned loads/stores). - Increased the number of AGU resource cycles for YMM loads and stores to 2cy (instead of 1cy). - Removed JFPU01 and JFPX from the list of resources consumed by pure float/vector loads (no MMX). I verified with llvm-exegesis that pure XMM/YMM loads are no-pipe. Those are dispatched to the FPU but not really issues on JFPU01. Differential Revision: https://reviews.llvm.org/D68871 llvm-svn: 374765
* [NFC][TTI] Add Alignment for isLegalMasked[Load/Store]Sam Parker2019-10-142-7/+8
| | | | | | | | | Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763
* [X86] Teach EmitTest to handle ISD::SSUBO/USUBO in order to use the Z flag ↵Craig Topper2019-10-141-0/+7
| | | | | | | | | | | | | | | | | | | | | | | from the subtract directly during isel. This prevents isel from emitting a TEST instruction that optimizeCompareInstr will need to remove later. In some of the modified tests, the SUB gets duplicated due to the flags being needed in two places and being clobbered in between. optimizeCompareInstr was able to optimize away the TEST that was using the result of one of them, but optimizeCompareInstr doesn't know to turn SUB into CMP after removing the TEST. It only knows how to turn SUB into CMP if the result was already dead. With this change the TEST never exists, so optimizeCompareInstr doesn't have to remove it. Then it can just turn the SUB into CMP immediately. Fixes PR43649. llvm-svn: 374755
* [X86] getTargetShuffleInputs - Control KnownUndef mask element resolution as ↵Simon Pilgrim2019-10-131-11/+11
| | | | | | | | well as KnownZero. We were already controlling whether the KnownZero elements were being written to the target mask, this extends it to the KnownUndef elements as well so we can prevent the target shuffle mask being manipulated at all. llvm-svn: 374732
* [X86] Enable use of avx512 saturating truncate instructions in more cases.Craig Topper2019-10-131-35/+42
| | | | | | | | | | This enables use of the saturating truncate instructions when the result type is less than 128 bits. It also enables the use of saturating truncate instructions on KNL when the input is less than 512 bits. We can do this by widening the input and then extracting the result. llvm-svn: 374731
* [X86] SimplifyMultipleUseDemandedBitsForTargetNode - use ↵Simon Pilgrim2019-10-131-12/+11
| | | | | | getTargetShuffleInputs with KnownUndef/Zero results. llvm-svn: 374725
* [X86] getTargetShuffleInputs - add KnownUndef/Zero output supportSimon Pilgrim2019-10-131-25/+25
| | | | | | Adjust SimplifyDemandedVectorEltsForTargetNode to use the known elts masks instead of recomputing it locally. llvm-svn: 374724
* [X86] Add a one use check on the setcc to the min/max canonicalization code ↵Craig Topper2019-10-131-0/+1
| | | | | | | | | | | | | | | | | | | in combineSelect. This seems to improve std::midpoint code where we have a min and a max with the same condition. If we split the setcc we can end up with two compares if the one of the operands is a constant. Since we aggressively canonicalize compares with constants. For non-constants it can interfere with our ability to share control flow if we need to expand cmovs into control flow. I'm also not sure I understand this min/max canonicalization code. The motivating case talks about comparing with 0. But we don't check for 0 explicitly. Removes one instruction from the codegen for PR43658. llvm-svn: 374706
* [X86] Enable v4i32->v4i16 and v8i16->v8i8 saturating truncates to use pack ↵Craig Topper2019-10-131-0/+1
| | | | | | instructions with avx512. llvm-svn: 374705
* [X86] scaleShuffleMask - use size_t Scale to avoid overflow warningsSimon Pilgrim2019-10-122-7/+7
| | | | llvm-svn: 374674
* Replace for-loop of SmallVector::push_back with SmallVector::append. NFCI.Simon Pilgrim2019-10-121-4/+2
| | | | llvm-svn: 374669
* Fix cppcheck shadow variable name warnings. NFCI.Simon Pilgrim2019-10-121-6/+6
| | | | llvm-svn: 374668
* [X86] Use any_of/all_of patterns in shuffle mask pattern recognisers. NFCI.Simon Pilgrim2019-10-121-24/+13
| | | | llvm-svn: 374667
* [X86][SSE] Avoid unnecessary PMOVZX in v4i8 sum reductionSimon Pilgrim2019-10-121-7/+18
| | | | | | This should go away once D66004 has landed and we can simplify shuffle chains using demanded elts. llvm-svn: 374658
* [CostModel][X86] Improve sum reduction costs.Simon Pilgrim2019-10-121-22/+23
| | | | | | | | I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2. I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674. llvm-svn: 374655
* [X86] Use pack instructions for packus/ssat truncate patterns when 256-bit ↵Craig Topper2019-10-121-1/+4
| | | | | | | | | | | is the largest legal vector and the result type is at least 256 bits. Since the input type is larger than 256-bits we'll need to some concatenating to reassemble the results. The pack instructions ability to concatenate while packing make this a shorter/faster sequence. llvm-svn: 374643
* recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure ↵Zi Xuan Wu2019-10-122-2/+3
| | | | | | | | | | | | | | | | | | | | | | | separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634
* [X86] Fold a VTRUNCS/VTRUNCUS+store into a saturating truncating store.Craig Topper2019-10-121-13/+11
| | | | | | | | We already did this for VTRUNCUS with a specific combination of types. This extends this to VTRUNCS and handles any types where a truncating store is legal. llvm-svn: 374615
* [X86][SSE] Add support for v4i8 add reductionSimon Pilgrim2019-10-111-2/+7
| | | | llvm-svn: 374579
* [X86] isFNEG - add recursion depth limitSimon Pilgrim2019-10-111-5/+9
| | | | | | Now that its used by isNegatibleForFree we should try to avoid costly deep recursion llvm-svn: 374534
* [X86] Add a DAG combine to turn v16i16->v16i8 VTRUNCUS+store into a ↵Craig Topper2019-10-111-0/+13
| | | | | | saturating truncating store. llvm-svn: 374509
* [X86] Improve the AVX512 bailout in combineTruncateWithSat to allow pack ↵Craig Topper2019-10-111-2/+9
| | | | | | | | | | instructions in more situations. If we don't have VLX we won't end up selecting a saturating truncate for 256-bit or smaller vectors so we should just use the pack lowering. llvm-svn: 374487
* [X86] Guard against leaving a dangling node in combineTruncateWithSat.Craig Topper2019-10-101-4/+13
| | | | | | | | | | | | | | | | | | When handling the packus pattern for i32->i8 we do a two step process using a packss to i16 followed by a packus to i8. If the final i8 step is a type with less than 64-bits the packus step will return SDValue(), but the i32->i16 step might have succeeded. This leaves the nodes from the middle step dangling. Guard against this by pre-checking that the number of elements is at least 8 before doing the middle step. With that check in place this should mean the only other case the middle step itself can fail is when SSE2 is disabled. So add an early SSE2 check then just assert that neither the middle or final step ever fail. llvm-svn: 374460
* [X86] Use packusdw+vpmovuswb to implement v16i32->V16i8 that clamps signed ↵Craig Topper2019-10-101-0/+15
| | | | | | | | | | inputs to be between 0 and 255 when zmm registers are disabled on SKX. If we've disable zmm registers, the v16i32 will need to be split. This split will propagate through min/max the truncate. This creates two sequences that need to be concatenated back to v16i8. We can instead use packusdw to do part of the clamping, truncating, and concatenating all at once. Then we can use a vpmovuswb to finish off the clamp. Differential Revision: https://reviews.llvm.org/D68763 llvm-svn: 374431
* [X86] combineFMA - Convert to use isNegatibleForFree/GetNegatedExpression.Simon Pilgrim2019-10-101-7/+84
| | | | | | Split off from D67557. llvm-svn: 374356
OpenPOWER on IntegriCloud