summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] combineOrShiftToFunnelShift - use getShiftAmountTy instead of ↵Simon Pilgrim2019-10-301-5/+8
| | | | hardwiring to MVT::i8
* [Alignment] Use Align for TFI.getStackAlignment() in X86ISelLoweringGuillaume Chatelet2019-10-301-26/+18
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, craig.topper, rnk Reviewed By: rnk Subscribers: rnk, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69034
* [X86] Make memcmp vector lowering handle arbitrary expansionsDavid Zarzycki2019-10-301-23/+43
| | | | | | | | | | Teach combineVectorSizedSetCCEquality() to handle arbitrary memcmp expansions but do not change any default policy for now. This also fixes a bug in the memcmp expansion itself when large displacements are needed. https://reviews.llvm.org/D69507
* [SelectionDAG] Enable lowering unordered atomics loads w/LoadSDNode (and ↵Philip Reames2019-10-291-1/+1
| | | | | | | | | | stores w/StoreSDNode) by default Enable the new SelectionDAG representation for unordered loads and stores introduced in r371441 by default. As a reminder, the new lowering changes the representation of an unordered atomic load from an AtomicSDNode - which is essentially a black box which gets passed through without combines messing with it - to a LoadSDNode w/a atomic marker on the MMO. The later parallels the way we handle volatiles, and I've audited the code to ensure that every location which checks one checks the other. This has been fairly heavily fuzzed, and I examined diffs in a reasonable large corpus of assembly by hand, so I'm reasonable sure this is correct for the common case. Late in the review for this, it was discovered that I hadn't correctly handled cases which could be legalized into CAS operations. This points out that there's a strong bias in the IR of the frontend I'm working with towards only legal atomics. If there are problems with this patch, the most likely area will be legalization. Differential Revision: https://reviews.llvm.org/D69219
* [X86] Narrow i64 compares with constant to i32 when the upper 32-bits are ↵Craig Topper2019-10-291-5/+17
| | | | | | | | | | | | | | | | known zero. This catches some cases. There are probably ways to improve this. I tried doing it as a combine on the setcc, but that broke some cases involving flag reuse in place of test. I renamed the isX86CCUnsigned to isX86CCSigned and flipped its polarity to make it consistent with the similar functions for ISD::SETCC. This avoids calling EQ/NE as being signed or unsigned. Fixes PR43823. Differential Revision: https://reviews.llvm.org/D69499
* [X86] Pull out combineOrShiftToFunnelShift helper. NFCI.Simon Pilgrim2019-10-291-51/+64
|
* Fix a spelling mistake in a comment. NFCGreg Bedwell2019-10-291-1/+1
| | | | | (I'm currently trying to debug a strange error message I get when pushing to github, despite the pushes being successful).
* [X86] Add a DAG combine to turn (and (bitcast (vXi1 (concat_vectors (vYi1 ↵Craig Topper2019-10-281-0/+68
| | | | | | | | | | setcc), undef,))), C) into (bitcast (vXi1 (concat_vectors (vYi1 setcc), zero,))) The legalization of v2i1->i2 or v4i1->i4 bitcasts followed by a setcc can create an and after the bitcast. If we're lucky enough that the input to the bitcast is a concat_vectors where the first operand is a setcc that can natively 0 all the upper bits of ak-register, then we should replace the other operands of the concat_vectors with zero in order to remove the AND. With the AND removed we might be able to use a kortest on the result. Differential Revision: https://reviews.llvm.org/D69205
* Add Windows Control Flow Guard checks (/guard:cf).Andrew Paverd2019-10-286-3/+39
| | | | | | | | | | | | | | | | | | | Summary: A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on indirect function calls, using either the check mechanism (X86, ARM, AArch64) or or the dispatch mechanism (X86-64). The check mechanism requires a new calling convention for the supported targets. The dispatch mechanism adds the target as an operand bundle, which is processed by SelectionDAG. Another pass (CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as required by /guard:cf. This feature is enabled using the `cfguard` CC1 option. Reviewers: thakis, rnk, theraven, pcc Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65761
* [X86] Fix 48/96 byte memcmp code genDavid Zarzycki2019-10-281-2/+21
| | | | | | | Detect scalar ISD::ZERO_EXTEND generated by memcmp lowering and convert it to ISD::INSERT_SUBVECTOR. https://reviews.llvm.org/D69464
* [X86] Use 64-bit version of source register in LowerPATCHABLE_EVENT_CALL and ↵Craig Topper2019-10-271-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | LowerPATCHABLE_TYPED_EVENT_CALL Summary: The PATCHABLE_EVENT_CALL uses i32 in the intrinsic. This results in the register allocator picking a 32-bit register. We need to use the 64-bit register when forming the MOV64rr instructions. Otherwise we print illegal assembly in the text output. I think prior to this it was impossible for SrcReg to be equal to DstReg so the NOP code was not reachable. While there use Register instead of unsigned. Also add a FIXME for what looks like a bug. Reviewers: dberris Reviewed By: dberris Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69365
* [X86] Only look up boolean reduction cost tables if the reduction is not ↵Craig Topper2019-10-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | pairwise. Summary: We don't pattern match pairwise shuffles in SelectionDAG. So we should only return the optimized costs if its not a pairwise shuffle. I think SLP vectorizer gives priority to non pairwise shuffle if the cost is the same. And the look up for reduction intrinsics passes false for the pairwise flag. So this probably has no real effect today. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69083
* [X86] Prefer KORTEST on Knights Landing or later for memcmp()David Zarzycki2019-10-264-17/+60
| | | | | | | | | | | PTEST and especially the MOVMSK instructions are slow on Knights Landing or later. As a bonus, this patch increases instruction parallelism by emitting: KORTEST(PCMPNEQ(a, b), PCMPNEQ(c, d)) == 0 Instead of: KORTEST(AND(PCMPEQ(a, b), PCMPEQ(c, d))) == ~0 https://reviews.llvm.org/D69157
* [X86][GISel] Fix typo in comment. NFCCraig Topper2019-10-261-1/+1
|
* [Alignment][NFC] getMemoryOpCost uses MaybeAlignGuillaume Chatelet2019-10-252-11/+12
| | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69307
* [X86] Add a check for SSE2 to the top of combineReductionToHorizontal.Craig Topper2019-10-251-0/+4
| | | | Without this, we can create a PSADBW node that isn't legal.
* [X86][GISel] Remove unneeded custom selection code for handling shifts.Craig Topper2019-10-241-78/+0
|
* [X86] combineX86ShufflesRecursively - assert the root mask is legal. NFCI.Simon Pilgrim2019-10-231-0/+3
|
* [Mips] Use appropriate private label prefix based on Mips ABIMirko Brkusanin2019-10-231-1/+2
| | | | | | | | | | MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64 regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo we can find out Mips ABI and pick appropriate prefix. Tags: #llvm, #clang, #lldb Differential Revision: https://reviews.llvm.org/D66795
* [DebugInfo] Stop describing imms in TargetInstrInfo's describeLoadedValue() implDavid Stenberg2019-10-231-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The default implementation of the describeLoadedValue() hook uses the MoveImm property to determine if an instruction moves an immediate. If an instruction has that property the function returns the second operand, assuming that that is the immediate value the instruction moves. As far as I can tell, the MoveImm property does not imply that the second operand is the immediate value, nor that any other operand necessarily holds the immediate value; it just means that the instruction moves some immediate value. One example where the second operand is not the immediate is SystemZ's LZER instruction, which moves a zero immediate implicitly: $f0S = LZER. That case triggered an out-of-bound assertion when getting the operand. I have added a test case for that instruction. Another example is ARM's MVN instruction, which holds the logical bitwise NOT'd value of the immediate that is moved. For the following reproducer: extern void foo(int); int main() { foo(-11); } an incorrect call site value would be emitted: $ clang --target=arm foo.c -O1 -g -Xclang -femit-debug-entry-values \ -c -o - | ./build/bin/llvm-dwarfdump - | \ grep -A2 call_site_parameter 0x00000058: DW_TAG_GNU_call_site_parameter DW_AT_location (DW_OP_reg0 R0) DW_AT_GNU_call_site_value (DW_OP_lit10) Another example is the A2_combineii instruction on Hexagon which moves two immediates to a super-register: $d0 = A2_combineii 20, 10. Perhaps these are rare exceptions, and most MoveImm instructions hold the immediate in the second operand, but in my opinion the default implementation of the hook should only describe values that it can, by some contract, guarantee are safe to describe, rather than leaving it up to the targets to override the exceptions, as that can silently result in incorrect call site values. This patch adds X86's relevant move immediate instructions to the target's hook implementation, so this commit should be a NFC for that target. We need to do the same for ARM and AArch64. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: vsk Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D69109
* [X86][BMI] Pull out schedule classes from bmi_andn<> and bmi_bls<>Simon Pilgrim2019-10-212-14/+15
| | | | | | Stop hardwiring classes llvm-svn: 375470
* [X86][SSE] Add OR(EXTRACTELT(X,0),OR(EXTRACTELT(X,1))) -> MOVMSK+CMP ↵Simon Pilgrim2019-10-211-0/+18
| | | | | | reduction combine llvm-svn: 375463
* [X86] Rename matchBitOpReduction to matchScalarReduction. NFCI.Simon Pilgrim2019-10-211-4/+4
| | | | | | This doesn't need to be just for bitops, but the ops do need to be fully associative. llvm-svn: 375445
* [X86] Check Subtarget.hasSSE3() before calling shouldUseHorizontalOp and ↵Craig Topper2019-10-201-1/+1
| | | | | | | | | | emitting X86ISD::FHADD in LowerUINT_TO_FP_i64. This was a regression from r375341. Fixes PR43729. llvm-svn: 375381
* [X86] Pulled out helper to decode target shuffle element sentinel values to ↵Simon Pilgrim2019-10-191-13/+22
| | | | | | | | 'Zeroable' known undef/zero bits. NFCI. Renamed 'resolveTargetShuffleAndZeroables' to 'resolveTargetShuffleFromZeroables' to match. llvm-svn: 375348
* [X86][SSE] lowerV16I8Shuffle - tryToWidenViaDuplication - undef unpack argsSimon Pilgrim2019-10-191-1/+9
| | | | | | tryToWidenViaDuplication lowers using the shuffle_v8i16(unpack_v16i8(shuffle_v8i16(x),shuffle_v8i16(x))) pattern, but the unpack only needs the even/odd 16i8 args if the original v16i8 shuffle mask references the even/odd elements - which isn't true for many extension style shuffles. llvm-svn: 375342
* [X86][SSE] LowerUINT_TO_FP_i64 - only use HADDPD for size/fast-hopsSimon Pilgrim2019-10-191-12/+11
| | | | | | | | We were always generating a single source HADDPD, but really we should only do this if shouldUseHorizontalOp says its a good idea. Differential Revision: https://reviews.llvm.org/D69175 llvm-svn: 375341
* Prune a LegacyDivergenceAnalysis and MachineLoopInfo include eachReid Kleckner2019-10-191-1/+1
| | | | | | Now X86ISelLowering doesn't depend on many IR analyses. llvm-svn: 375320
* Prune Analysis includes from SelectionDAG.hReid Kleckner2019-10-192-1/+2
| | | | | | Only forward declarations are needed here. Follow-on to r375311. llvm-svn: 375319
* Move endian constant from Host.h to SwapByteOrder.h, prune includeReid Kleckner2019-10-191-2/+3
| | | | | | | | | | | | | | Works on this dependency chain: ArrayRef.h -> Hashing.h -> --CUT-- Host.h -> StringMap.h / StringRef.h ArrayRef is very popular, but Host.h is rarely needed. Move the IsBigEndianHost constant to SwapByteOrder.h. Clients of that header are more likely to need it. llvm-svn: 375316
* Prune two MachineInstr.h includes, fix up depsReid Kleckner2019-10-191-1/+1
| | | | | | | | | | MachineInstr.h included AliasAnalysis.h, which includes a world of IR constructs mostly unneeded in CodeGen. Prune it. Same for DebugInfoMetadata.h. Noticed with -ftime-trace. llvm-svn: 375311
* [X86] Fix register parsing in .seh_* in Intel syntaxReid Kleckner2019-10-181-4/+3
| | | | | | | | | | Previously, the parser checked for a '%' prefix to indicate a register. In Intel syntax mode, LLVM does not print a '%' prefix on registers, so LLVM could not parse its own assembly output. Instead, require that register numbers be integer literals, or at least start with an integer literal, which is consistent with .cfi_* directive register parsing. llvm-svn: 375287
* [GISel][CallLowering] Make isIncomingArgumentHandler a pure virtual methodQuentin Colombet2019-10-181-0/+2
| | | | | | | | | | | | | The default implementation of isIncomingArgumentHandler could lead to generating incorrect code. Make it a pure virtual method, so that targets know they have to override it to produce correct code. NFC Differential Revision: https://reviews.llvm.org/D69187 llvm-svn: 375277
* [X86] combineX86ShufflesRecursively - pull out isTargetShuffleVariableMask. ↵Simon Pilgrim2019-10-181-1/+2
| | | | | | NFCI. llvm-svn: 375253
* [X86] Emit KTEST when possibleDavid Zarzycki2019-10-181-8/+23
| | | | | | https://reviews.llvm.org/D69111 llvm-svn: 375197
* [DAGCombine][ARM] Enable extending masked loadsSam Parker2019-10-171-0/+3
| | | | | | | | | | | Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085
* [Alignment][NFC] Use Align for TargetFrameLowering/SubtargetGuillaume Chatelet2019-10-175-17/+16
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68993 llvm-svn: 375084
* [X86] combineX86ShufflesRecursively - split the getTargetShuffleInputs call ↵Simon Pilgrim2019-10-151-3/+9
| | | | | | | | | | from the resolveTargetShuffleAndZeroables call. Exposes an issue in getFauxShuffleMask where the OR(SHUFFLE,SHUFFLE) decode should always resolve zero/undef elements. Part of the fix for PR43024 where ideally we shouldn't call resolveTargetShuffleAndZeroables for Depth == 0 llvm-svn: 374928
* [X86] Make memcmp() use PTEST if possible and also enable AVX1David Zarzycki2019-10-151-17/+46
| | | | llvm-svn: 374922
* [X86] Resolve KnownUndef/KnownZero bits into target shuffle masks in helper. ↵Simon Pilgrim2019-10-151-9/+18
| | | | | | NFCI. llvm-svn: 374878
* [X86] Don't check for VBROADCAST_LOAD being a user of the source of a ↵Craig Topper2019-10-151-3/+1
| | | | | | | | | | | | | | | | | | VBROADCAST when trying to share broadcasts. The only things VBROADCAST_LOAD uses is an address and a chain node. It has no vector inputs. So if its a user of the source of another broadcast that could only mean one of two things. The other broadcast is broadcasting the address of the broadcast_load. Or the source is a load and the use we're seeing is the chain result from that load. Neither of these cases make sense to combine here. This issue was reported post-commit r373871. Test case has not been reduced yet. llvm-svn: 374862
* [X86] Teach X86MCodeEmitter to properly encode zmm16-zmm31 as index register ↵Craig Topper2019-10-141-0/+3
| | | | | | | | | | | | to vgatherpf/vscatterpf. We need to encode bit 4 into the EVEX.V' bit. We do this right for regular gather/scatter which use either MRMSrcMem or MRMDestMem formats. The prefetches use MRM*m formats. Fixes an issue recently added to PR36202. llvm-svn: 374849
* [CostModel][X86] Add CTLZ scalar costsSimon Pilgrim2019-10-141-1/+22
| | | | | | | | Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it. For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs. llvm-svn: 374786
* [CostModel][X86] Add CTPOP scalar costs (PR43656)Simon Pilgrim2019-10-141-0/+23
| | | | | | | | Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have). For targets supporting POPCNT, we provide overrides that assume 1cy costs. llvm-svn: 374775
* [X86][BtVer2] Improved latency and throughput of float/vector loads and stores.Andrea Di Biagio2019-10-141-6/+6
| | | | | | | | | | | | | | | | | | | | This patch introduces the following changes to the btver2 scheduling model: - The number of micro opcodes for YMM loads and stores is now 2 (it was incorrectly set to 1 for both aligned and misaligned loads/stores). - Increased the number of AGU resource cycles for YMM loads and stores to 2cy (instead of 1cy). - Removed JFPU01 and JFPX from the list of resources consumed by pure float/vector loads (no MMX). I verified with llvm-exegesis that pure XMM/YMM loads are no-pipe. Those are dispatched to the FPU but not really issues on JFPU01. Differential Revision: https://reviews.llvm.org/D68871 llvm-svn: 374765
* [NFC][TTI] Add Alignment for isLegalMasked[Load/Store]Sam Parker2019-10-142-7/+8
| | | | | | | | | Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763
* [X86] Teach EmitTest to handle ISD::SSUBO/USUBO in order to use the Z flag ↵Craig Topper2019-10-141-0/+7
| | | | | | | | | | | | | | | | | | | | | | | from the subtract directly during isel. This prevents isel from emitting a TEST instruction that optimizeCompareInstr will need to remove later. In some of the modified tests, the SUB gets duplicated due to the flags being needed in two places and being clobbered in between. optimizeCompareInstr was able to optimize away the TEST that was using the result of one of them, but optimizeCompareInstr doesn't know to turn SUB into CMP after removing the TEST. It only knows how to turn SUB into CMP if the result was already dead. With this change the TEST never exists, so optimizeCompareInstr doesn't have to remove it. Then it can just turn the SUB into CMP immediately. Fixes PR43649. llvm-svn: 374755
* [X86] getTargetShuffleInputs - Control KnownUndef mask element resolution as ↵Simon Pilgrim2019-10-131-11/+11
| | | | | | | | well as KnownZero. We were already controlling whether the KnownZero elements were being written to the target mask, this extends it to the KnownUndef elements as well so we can prevent the target shuffle mask being manipulated at all. llvm-svn: 374732
* [X86] Enable use of avx512 saturating truncate instructions in more cases.Craig Topper2019-10-131-35/+42
| | | | | | | | | | This enables use of the saturating truncate instructions when the result type is less than 128 bits. It also enables the use of saturating truncate instructions on KNL when the input is less than 512 bits. We can do this by widening the input and then extracting the result. llvm-svn: 374731
* [X86] SimplifyMultipleUseDemandedBitsForTargetNode - use ↵Simon Pilgrim2019-10-131-12/+11
| | | | | | getTargetShuffleInputs with KnownUndef/Zero results. llvm-svn: 374725
OpenPOWER on IntegriCloud