summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] isBinOp - move commutative ops to isCommutativeBinOp. NFCI.Simon Pilgrim2019-06-211-6/+6
| | | | | | TargetLoweringBase::isBinOp checks isCommutativeBinOp as a fallback, so don't duplicate. llvm-svn: 364072
* [X86] X86ISD::ANDNP is a (non-commutative) binopSimon Pilgrim2019-06-211-0/+2
| | | | | | The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better. llvm-svn: 364038
* [X86] createMMXBuildVector - call with BuildVectorSDNode directly. NFCI.Simon Pilgrim2019-06-211-7/+5
| | | | llvm-svn: 364030
* [X86] combineAndnp - use isNOT instead of manually checking for (XOR x, -1)Simon Pilgrim2019-06-211-5/+3
| | | | llvm-svn: 364026
* [X86] foldVectorXorShiftIntoCmp - use isConstOrConstSplat. NFCI.Simon Pilgrim2019-06-211-7/+4
| | | | | | Use the isConstOrConstSplat helper instead of inspecting the build vector manually. llvm-svn: 364024
* [X86][AVX] isNOT - handle concat_vectors(xor X, -1, xor Y, -1) patternSimon Pilgrim2019-06-211-0/+10
| | | | llvm-svn: 364022
* Simplify std::lower_bound with llvm::{bsearch,lower_bound}. NFCFangrui Song2019-06-215-15/+9
| | | | llvm-svn: 364006
* [X86] Add BLSI to isUseDefConvertible.Craig Topper2019-06-201-0/+4
| | | | | | | | | | | | | | | | | | | | | Summary: BLSI sets the C flag is the input is not zero. So if its followed by a TEST of the input where only the Z flag is consumed, we can replace it with the opposite check of the C flag. We should be able to do the same for BLSMSK and BLSR, but the naive test case for those is being optimized to a subo by CodeGenPrepare. Reviewers: spatel, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63589 llvm-svn: 363957
* [X86] LowerAVXExtend - handle ANY_EXTEND_VECTOR_INREG lowering as well.Simon Pilgrim2019-06-201-6/+10
| | | | llvm-svn: 363922
* [X86] Remove memory instructions form isUseDefConvertible.Craig Topper2019-06-201-15/+15
| | | | | | | | The caller of this is looking for comparisons of the input to these instructions with 0. But the memory instructions input is an addess not a value input in a register. llvm-svn: 363907
* [X86] Add v64i8/v32i16 to several places in X86CallingConv.td where they ↵Craig Topper2019-06-201-3/+4
| | | | | | seemed obviously missing. llvm-svn: 363906
* [x86] avoid vector load narrowing with extracted store uses (PR42305)Sanjay Patel2019-06-191-0/+20
| | | | | | | | | | | | This is an exception to the rule that we should prefer xmm ops to ymm ops. As shown in PR42305: https://bugs.llvm.org/show_bug.cgi?id=42305 ...the store folding opportunity with vextractf128 may result in better perf by reducing the instruction count. Differential Revision: https://reviews.llvm.org/D63517 llvm-svn: 363853
* [X86][SSE] combineToExtendVectorInReg - add ANY_EXTEND support TODO. NFCI.Simon Pilgrim2019-06-191-0/+1
| | | | | | So I don't forget - there's a load of yak shaving to do first. llvm-svn: 363847
* [X86][SSE] Combine shuffles to ANY_EXTEND/ANY_EXTEND_VECTOR_INREG.Simon Pilgrim2019-06-191-10/+15
| | | | | | We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension. llvm-svn: 363841
* [X86] getExtendInVec - take a ISD::*_EXTEND opcode instead of a IsSigned ↵Simon Pilgrim2019-06-191-15/+13
| | | | | | | | bool flag. NFCI. Prep work to support ANY_EXTEND/ANY_EXTEND_VECTOR_INREG without needing another flag. llvm-svn: 363818
* [X86] Add *_EXTEND -> *_EXTEND_VECTOR_INREG opcode conversion helper. NFCI.Simon Pilgrim2019-06-191-11/+19
| | | | | | Given a *_EXTEND or *_EXTEND_VECTOR_INREG opcode, convert it to *_EXTEND_VECTOR_INREG. llvm-svn: 363812
* [X86] Merge extract_subvector(*_EXTEND) and ↵Simon Pilgrim2019-06-191-12/+8
| | | | | | extract_subvector(*_EXTEND_VECTOR_INREG) handling. NFCI. llvm-svn: 363808
* [X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsrClement Courbet2019-06-191-0/+4
| | | | | | | | | | | | | | | | Summary: llvm.x86.sse.stmxcsr only writes to memory. llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE. Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62896 llvm-svn: 363773
* Rename ExpandISelPseudo->FinalizeISel, delay register reservationMatt Arsenault2019-06-191-1/+1
| | | | | | | | | | | This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757
* [X86] Remove unnecessary line that makes v4f32 FP_ROUND Legal. NFCCraig Topper2019-06-181-1/+0
| | | | | | | | FP_ROUND defaults to Legal for all MVT types and nothing changes the v4f32 entry way from this default. If we needed this line we'd also need one for v8f32 with AVX512 which we don't have. llvm-svn: 363719
* [X86][AVX] extract_subvector(any_extend(x)) -> any_extend_vector_inreg(x)Simon Pilgrim2019-06-181-4/+10
| | | | | | Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here. llvm-svn: 363693
* [X86] Replace any_extend* vector extensions with zero_extend* equivalentsSimon Pilgrim2019-06-183-84/+53
| | | | | | | | | | First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal. In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep...... Differential Revision: https://reviews.llvm.org/D63326 llvm-svn: 363655
* [X86] Move code that shrinks immediates for ((x << C1) op C2) into a helper ↵Craig Topper2019-06-181-108/+118
| | | | | | | | function. NFCI Preliminary step for D59909 llvm-svn: 363645
* [X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly ↵Craig Topper2019-06-183-64/+12
| | | | | | | | | | | | instructions. The isel patterns for these use a bitcast and load/store, but DAG combine should have canonicalized those away. For the purposes of the memory folding table these opcodes can be replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes. llvm-svn: 363644
* [X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class.Craig Topper2019-06-187-58/+129
| | | | | | | | | | | | | | | | | | | | | | Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt. Use the new versions in patterns that previously used a COPY_TO_REGCLASS to VR128. These patterns expect the upper bits to be zero. The current set up appears to work, but I'm not sure we should be enforcing upper bits being zero through a COPY_TO_REGCLASS. I wanted to flip the arrangement and use a COPY_TO_REGCLASS to FR32/FR64 for the patterns that need an f32/f64 result, but that complicated fastisel and globalisel. I've been doing some experiments with reducing some isel patterns and ended up in a situation where I had a (SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our post-isel peephole was unable to avoid using an instruction for the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128 instruction removes the COPY_TO_REGCLASS that was breaking this. llvm-svn: 363643
* Use VR128X instead of FR32X/FR64X for the register class in ↵Craig Topper2019-06-171-5/+5
| | | | | | | | VMOVSSZmrk/VMOVSDZmrk. Removes COPY_TO_REGCLASS from some patterns. llvm-svn: 363630
* [X86] Make an assert in LowerSCALAR_TO_VECTOR stricter to make it clear what ↵Craig Topper2019-06-171-1/+2
| | | | | | | | types are allowed here. NFC Make it clear that only integer type with i32 or smaller elements shoudl get to this part of the code. llvm-svn: 363629
* [X86] Add TB_NO_REVERSE to some memory folding table entries where the ↵Craig Topper2019-06-171-3/+3
| | | | | | | | | | | | | register form requires 64-bit mode, but the memory form does not. We don't know if its safe to unfold if we're in 32-bit mode. This is simlar to what was done to some load opcodes in r363523. I think its pretty unlikely we will try to unfold these anyway so I don't think this is testable. llvm-svn: 363595
* [X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026)Simon Pilgrim2019-06-171-0/+45
| | | | | | If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64). llvm-svn: 363592
* [X86][AVX] Split under-aligned vector nt-stores.Simon Pilgrim2019-06-171-2/+13
| | | | | | If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it. llvm-svn: 363582
* [LV] Suppress vectorization in some nontemporal casesWarren Ristow2019-06-172-0/+37
| | | | | | | | | | | | | | | | | | | | | When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type *DataType, unsigned Alignment) const; bool isLegalNTLoad(Type *DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581
* [X86] combineLoad - begun making the load split code more generic. NFCI.Simon Pilgrim2019-06-171-13/+12
| | | | | | | | This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment. This is necessary for an upcoming patch to split under-aligned non-temporal vector loads. llvm-svn: 363570
* [X86][SSE] Prevent misaligned non-temporal vector load/store combinesSimon Pilgrim2019-06-171-4/+13
| | | | | | | | | | For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway. First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets. Differential Revision: https://reviews.llvm.org/D63246 llvm-svn: 363564
* [X86] Add TB_NO_REVERSE to some folding table entries where the register ↵Craig Topper2019-06-161-9/+9
| | | | | | | | | | | | | | from uses the REX prefix, but the memory form does not. It would not be safe to unfold the memory form the register form without checking that we are compiling for 64-bit mode. This probaby isn't a real functional issue since we are unlikely to unfold any of these instructions since they don't have any tied registers, aren't commutable, and don't have any inputs other than the address. llvm-svn: 363523
* [x86] split 256-bit vector selects if operands are vector concatsSanjay Patel2019-06-161-0/+36
| | | | | | | | | | | | | | | | | | | This is similar logic/motivation to the select splitting in D62969. In D63233, the pattern changes so that we no longer have an extract_subvector of vselect, but the operands of the select are still being concatenated. The closest case is represented in either the first or last test diffs here - we have an extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions. I think that's the right trade-off for most AVX1 targets. In the example based on PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 ...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx). Differential Revision: https://reviews.llvm.org/D63364 llvm-svn: 363508
* [X86] CombineShuffleWithExtract - handle cases with different vector extract ↵Simon Pilgrim2019-06-161-6/+28
| | | | | | | | sources Insert the shorter vector source into an undef vector of the longer vector source's type. llvm-svn: 363507
* [X86] CombineShuffleWithExtract - assert all src ops types are multiples of ↵Simon Pilgrim2019-06-151-1/+2
| | | | | | rootsize. NFCI. llvm-svn: 363501
* [X86][AVX] Handle lane-crossing ↵Simon Pilgrim2019-06-151-49/+79
| | | | | | | | | | shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well. Fixes PR34380 llvm-svn: 363500
* [X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3)Simon Pilgrim2019-06-151-0/+23
| | | | | | This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0) llvm-svn: 363499
* [FPEnv] Lower STRICT_FP_EXTEND and STRICT_FP_ROUND nodes in preprocess phase ↵Kevin P. Neal2019-06-141-45/+110
| | | | | | | | | | | | | of ISelLowering to mirror non-strict nodes on x86. I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5 Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Craig Topper, Kevin P. Neal Approved by: Craig Topper Differential Revision: https://reviews.llvm.org/D63271 llvm-svn: 363417
* Move commentary on opcode translation for code16 mov instructionsEric Christopher2019-06-141-2/+2
| | | | | | | to segment registers closer to the segment register check for when we add further optimizations. llvm-svn: 363355
* [X86Disassembler] Unify the EVEX and VEX code in emitContextTable. Merge the ↵Craig Topper2019-06-131-1/+1
| | | | | | | | | | ATTR_VEXL/ATTR_EVEXL bits. NFCI Merging the two bits shrinks the context table from 16384 bytes to 8192 bytes. Remove the ATTRIBUTE_BITS macro and just create an enum directly. Then fix the ATTR_max define to be 8192 to reflect the table size so we stop hardcoding it separately. llvm-svn: 363330
* [X86] Use fresh MemOps when emitting VAARG64Simon Pilgrim2019-06-131-8/+15
| | | | | | | | | | Previously it copied over MachineMemOperands verbatim which caused MOV32rm to have store flags set, and MOV32mr to have load flags set. This fixes some assertions being thrown with EXPENSIVE_CHECKS on. Committed on behalf of @luke (Luke Lau) Differential Revision: https://reviews.llvm.org/D62726 llvm-svn: 363268
* [CodeGen] Add getMachineMemOperand + MachineMemOperand::Flags allocator ↵Simon Pilgrim2019-06-131-8/+2
| | | | | | | | helper wrapper. NFCI. Pre-commit for D62726 on behalf of @luke (Luke Lau) llvm-svn: 363257
* [X86][SSE] Avoid assert for broadcast(horiz-op()) cases for non-f64 cases.Simon Pilgrim2019-06-131-6/+9
| | | | | | Based on fuzz test from @craig.topper llvm-svn: 363251
* X86: Clean up pass initializationTom Stellard2019-06-1314-43/+21
| | | | | | | | | | | | | | | | | | | | Summary: - Remove redundant initializations from pass constructors that were already being initialized by LLVMInitializeX86Target(). - Add initialization function for the FPS pass. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63218 llvm-svn: 363221
* [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests ↵Simon Pilgrim2019-06-122-6/+6
| | | | | | | | | | | | | | (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179
* [X86][AVX] Fold concat(vpermilps(x,c),vpermilps(y,c)) -> ↵Simon Pilgrim2019-06-121-0/+30
| | | | | | | | | | vpermilps(concat(x,y),c) Handles PSHUFD/PSHUFLW/PSHUFHW (AVX2) + VPERMILPS (AVX1). An extra AVX1 PSHUFD->VPERMILPS combine will be added in a future commit. llvm-svn: 363178
* [X86] Add VCMPSSZrr_Intk and VCMPSDZrr_Intk to isNonFoldablePartialRegisterLoad.Craig Topper2019-06-121-0/+2
| | | | | | | | | The non-masked versions are already in there. I'm having some trouble coming up with a way to test this right now. Most load folding should happen during isel so I'm not sure how to get peephole pass to do it. llvm-svn: 363125
* [TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI.Simon Pilgrim2019-06-111-9/+4
| | | | | | As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048
OpenPOWER on IntegriCloud