summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [x86] fix formatting; NFCSanjay Patel2016-11-281-16/+14
| | | | llvm-svn: 288045
* [X86][SSE] Added support for combining bit-shifts with shuffles.Simon Pilgrim2016-11-281-5/+57
| | | | | | | | Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining. Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations. llvm-svn: 288040
* [X86][FMA4] Remove isCommutable from FMA4 scalar intrinsics. They aren't ↵Craig Topper2016-11-271-1/+0
| | | | | | commutable as operand 0 should pass its upper bits through to the output. llvm-svn: 288011
* [X86][FMA] Add missing Predicates qualifier around scalar FMA intrinsic ↵Craig Topper2016-11-271-6/+8
| | | | | | patterns. llvm-svn: 288010
* [X86][FMA4] Add load folding support for FMA4 scalar intrinsic instructions.Craig Topper2016-11-271-0/+20
| | | | llvm-svn: 288009
* [X86] Add SHL by 1 to the load folding tables.Craig Topper2016-11-271-0/+4
| | | | | | I don't think isel selects these today, favoring adding the register to itself instead. But the load folding tables shouldn't be so concerned with what isel will use and just represent the relationships. llvm-svn: 288007
* [X86][SSE] Add support for combining target shuffles to 128/256-bit ↵Simon Pilgrim2016-11-271-49/+22
| | | | | | PSLL/PSRL bit shifts llvm-svn: 288006
* [AVX-512] Add integer and fp unpck instructions to load folding tables.Craig Topper2016-11-271-0/+108
| | | | llvm-svn: 288004
* [X86][SSE] Split lowerVectorShuffleAsShift ready for combines. NFCI.Simon Pilgrim2016-11-271-31/+60
| | | | | | Moved most of matching code into matchVectorShuffleAsShift to share with target shuffle combines (in a future commit). llvm-svn: 288003
* [X86] Add TB_NO_REVERSE to entries in the load folding table where the ↵Craig Topper2016-11-271-188/+206
| | | | | | | | | | instruction's load size is smaller than the register size. If we were to unfold these, the load size would be increased to the register size. This is not safe to do since the enlarged load can do things like cross a page boundary into a page that doesn't exist. I probably missed some instructions, but this should be a large portion of them. llvm-svn: 288001
* [AVX-512] Add masked EVEX vpmovzx/sx instructions to load folding tables.Craig Topper2016-11-271-0/+84
| | | | llvm-svn: 287995
* [X86] Remove alignment restrictions from load folding table for some ↵Craig Topper2016-11-271-13/+13
| | | | | | | | instructions that don't have a restriction. Most of these are the SSE4.1 PMOVZX/PMOVSX instructions which all read less than 128-bits. The only other was PMOVUPD which by definition is an unaligned load. llvm-svn: 287991
* [X86] Remove hasOneUse check that is redundant with the one in ↵Craig Topper2016-11-261-2/+0
| | | | | | IsProfitableToFold. llvm-svn: 287987
* [X86] Fix the zero extending load detection in ↵Craig Topper2016-11-261-11/+12
| | | | | | | | X86DAGToDAGISel::selectScalarSSELoad to pass the load node to IsProfitableToFold and IsLegalToFold. Previously we were passing the SCALAR_TO_VECTOR node. llvm-svn: 287986
* [X86] Simplify control flow. NFCICraig Topper2016-11-261-3/+2
| | | | llvm-svn: 287985
* [X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load ↵Craig Topper2016-11-261-3/+6
| | | | | | | | | | | | | | from being folded multiple times. Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied. Reviewers: RKSimon, zvi, delena, spatel Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D26790 llvm-svn: 287983
* [AVX-512] Add unmasked EVEX vpmovzx/sx instructions to load folding tables.Craig Topper2016-11-261-0/+36
| | | | llvm-svn: 287975
* [AVX-512] Add masked 128/256-bit integer add/sub instructions to load ↵Craig Topper2016-11-261-0/+64
| | | | | | folding tables. llvm-svn: 287974
* [AVX-512] Add masked 512-bit integer add/sub instructions to load folding ↵Craig Topper2016-11-261-0/+31
| | | | | | tables. llvm-svn: 287972
* [AVX-512] Teach LowerFormalArguments to use the extended register class when ↵Craig Topper2016-11-261-4/+4
| | | | | | available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change. llvm-svn: 287971
* [AVX-512] Add VLX versions of VDIVPD/PS and VMULPD/PS to load folding tables.Craig Topper2016-11-261-0/+8
| | | | llvm-svn: 287970
* [X86][XOP] Add a reversed reg/reg form for VPROT instructions.Craig Topper2016-11-261-0/+7
| | | | | | The W bit distinquishes which operand is the memory operand. But if the mod bits are 3 then the memory operand is a register and there are two possible encodings. We already did this correctly for several other XOP instructions. llvm-svn: 287961
* [X86] Add SSE, AVX, and AVX2 version of MOVDQU to the load/store folding ↵Craig Topper2016-11-261-0/+6
| | | | | | | | tables for consistency. Not sure this is truly needed but we had the floating point equivalents, the aligned equivalents, and the EVEX equivalents. So this just makes it complete. llvm-svn: 287960
* [AVX-512] Put the AVX-512 sections of the load folding tables into mostly ↵Craig Topper2016-11-251-365/+373
| | | | | | alphabetical order. This is consistent with the older sections of the table. NFC llvm-svn: 287956
* Use SDValue helper instead of explicitly going via SDValue::getNode(). NFCISimon Pilgrim2016-11-251-5/+5
| | | | llvm-svn: 287940
* [AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding ↵Craig Topper2016-11-251-9/+25
| | | | | | | | | | | | | | | | | | | a vselect with 32-bit element size. Summary: Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations. This patch detects this and changes the element size to match the vselect. I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now. Reviewers: delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27087 llvm-svn: 287939
* [AVX-512] Add VPERMT2* and VPERMI2* instructions to load folding tables.Craig Topper2016-11-251-0/+32
| | | | llvm-svn: 287937
* [X86] Invert an 'if' and early out to fix a weird indentation. NFCICraig Topper2016-11-251-1/+2
| | | | llvm-svn: 287909
* [X86] Size a SmallVector to the worst case mask size for a 512-bit shuffle. NFCICraig Topper2016-11-251-1/+1
| | | | llvm-svn: 287908
* Fix unused variable warningSimon Pilgrim2016-11-241-1/+0
| | | | llvm-svn: 287889
* [X86] Don't round trip a unique_ptr through a raw pointer for assignment.Benjamin Kramer2016-11-241-1/+1
| | | | | | No functional change. llvm-svn: 287888
* [X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64Simon Pilgrim2016-11-241-8/+38
| | | | | | | | | | Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit). The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32. Differential Revision: https://reviews.llvm.org/D26938 llvm-svn: 287886
* [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on ↵Simon Pilgrim2016-11-243-5/+29
| | | | | | | | AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882
* [X86][AVX512DQVL] Add awareness of vcvtqq2ps and vcvtuqq2ps implicit zeroing ↵Simon Pilgrim2016-11-241-0/+11
| | | | | | of upper 64-bits of xmm result llvm-svn: 287878
* [X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP loweringSimon Pilgrim2016-11-241-4/+22
| | | | llvm-svn: 287877
* [x86] Fixing PR28755 by precomputing the address used in CMPXCHG8BNikolai Bozhenov2016-11-243-1/+63
| | | | | | | | | | | | | | | | | | The bug arises during register allocation on i686 for CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address, plus ESI is reserved as the base pointer. With such constraints the only way register allocator would do its job successfully is when the addressing mode of the instruction requires only one register. If that is not the case - we are emitting additional LEA instruction to compute the address. It fixes PR28755. Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D25088 llvm-svn: 287875
* [x86] Minor refactoring of X86TargetLowering::EmitInstrWithCustomInserterNikolai Bozhenov2016-11-241-10/+6
| | | | | | | | | | Move the definitions of three variables out of the switch. Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D25192 llvm-svn: 287874
* [x86] Rewrite getAddressFromInstr helper functionNikolai Bozhenov2016-11-241-17/+18
| | | | | | | | | | | | | - It does not modify the input instruction - Second operand of any address is always an Index Register, make sure we actually check for that, instead of a check for an immediate value Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D24938 llvm-svn: 287873
* [X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI Simon Pilgrim2016-11-246-58/+54
| | | | | | | | | | Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions. This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types. Differential Revision: https://reviews.llvm.org/D27072 llvm-svn: 287870
* [X86][SSE] Add awareness of (v)cvtpd2dq and vcvtpd2udq implicit zeroing of ↵Simon Pilgrim2016-11-232-15/+30
| | | | | | | | upper 64-bits of xmm result We've already added the equivalent for (v)cvttpd2dq (rL284459) and vcvttpd2udq llvm-svn: 287835
* [X86] Allow folding of stack reloads when loading a subreg of the spilled regMichael Kuperstein2016-11-232-0/+20
| | | | | | | | | | | | | We did not support subregs in InlineSpiller:foldMemoryOperand() because targets may not deal with them correctly. This adds a target hook to let the spiller know that a target can handle subregs, and actually enables it for x86 for the case of stack slot reloads. This fixes PR30832. Differential Revision: https://reviews.llvm.org/D26521 llvm-svn: 287792
* [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on ↵Simon Pilgrim2016-11-233-4/+50
| | | | | | | | AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762
* [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costsSimon Pilgrim2016-11-231-6/+12
| | | | llvm-svn: 287760
* [AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native ↵Craig Topper2016-11-231-12/+0
| | | | | | shuffles. llvm-svn: 287744
* [X86] Simplify lowerVectorShuffleAsBitMask to handle only integer VT'sZvi Rackover2016-11-231-13/+5
| | | | | | | | | | | | Summary: This function is only called with integer VT arguments, so remove code that handles FP vectors. Reviewers: RKSimon, craig.topper, delena, andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26985 llvm-svn: 287743
* [xray] Add XRay support for Mach-O in CodeGenKuba Mracek2016-11-231-26/+34
| | | | | | | | Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support. Differential Revision: https://reviews.llvm.org/D26983 llvm-svn: 287734
* [X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles.Simon Pilgrim2016-11-221-3/+12
| | | | | | | | This occurs during UINT_TO_FP v2f64 lowering. We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle llvm-svn: 287676
* CodeGen: simplify TargetMachine::getSymbol interface. NFC.Tim Northover2016-11-221-3/+3
| | | | | | | | | No-one actually had a mangler handy when calling this function, and getSymbol itself went most of the way towards getting its own mangler (with a local TLOF variable) so forcing all callers to supply one was just extra complication. llvm-svn: 287645
* [X86] Change lowerBuildVectorToBitOp() to take a BuildVectorSDNode. NFC.Zvi Rackover2016-11-221-5/+6
| | | | llvm-svn: 287644
* [X86] Remove dead code from LowerVectorBroadcastZvi Rackover2016-11-221-73/+18
| | | | | | | | | | | | Summary: Splat vectors are canonicalized to BUILD_VECTOR's so the code can be simplified. NFC-ish. Reviewers: craig.topper, delena, RKSimon, andreadb Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D26678 llvm-svn: 287643
OpenPOWER on IntegriCloud