summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86][SSE] Fold (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1Simon Pilgrim2017-09-301-0/+9
| | | | | | Remove sign extend in register style pattern if the sign is already extended enough llvm-svn: 314599
* [AVX-512] Add patterns to make fp compare instructions commutable during isel.Craig Topper2017-09-302-1/+90
| | | | llvm-svn: 314598
* Code refactoring for the interleaved code <NFC>Michael Zuckerman2017-09-301-28/+18
| | | | | Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1 llvm-svn: 314596
* [X86] Support v64i8 mulhu/mulhsCraig Topper2017-09-301-1/+9
| | | | | | | | Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results. Differential Revision: https://reviews.llvm.org/D38307 llvm-svn: 314584
* [X86] Improve codegen for inverted overflow checking intrinsics.Amara Emerson2017-09-291-0/+20
| | | | | | | | Adds a new combine for: xor(setcc cc, val), 1 --> setcc (invert(cc), val) Differential Revision: https://reviews.llvm.org/D38161 llvm-svn: 314514
* Small modification <NFC>Michael Zuckerman2017-09-291-1/+1
| | | | | Change-Id: I360abccee12cae29bd2ac4f8399c9ecc92eb7f13 llvm-svn: 314510
* [X86][MS-InlineAsm] Extended support for variables / identifiers on memory / ↵Coby Tayree2017-09-291-60/+90
| | | | | | | | | | | immediate expressions Allow the proper recognition of Enum values and global variables inside ms inline-asm memory / immediate expressions, as they require some additional overhead and treated incorrect if doesn't early recognized. supersedes D33278, D35774 Differential Revision: https://reviews.llvm.org/D37412 llvm-svn: 314493
* [X86] Don't select (cmp (and, imm), 0) to testwCraig Topper2017-09-281-1/+4
| | | | | | | | | | | | | | | | | Summary: X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters. I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs. Reviewers: RKSimon, zvi, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38273 llvm-svn: 314474
* [X86] Make use of vpmovwb when possible in LowerMULHCraig Topper2017-09-281-15/+8
| | | | | | | | If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract. Differential Revision: https://reviews.llvm.org/D38375 llvm-svn: 314457
* [X86] Use target independent ZERO_EXTEND/SIGN_EXTEND nodes were possible in ↵Craig Topper2017-09-281-9/+10
| | | | | | | | LowerMULH We aren't do any in register extends here so we should be able to just the target independent nodes directly and allow them to be lowered as necessary. llvm-svn: 314447
* [X86] Move a setOperation action for ISD::TRUNCATE near another one in the ↵Craig Topper2017-09-281-2/+1
| | | | | | same if. Remove one that is redundant with another subtarget features. llvm-svn: 314446
* [X86] Use BWI instructions to improve lowering of v32i8 MULHU/SCraig Topper2017-09-281-0/+18
| | | | | | | | | | | | | | Summary: If we have BWI instructions we can widen to v32i16 to do the multiply instead of splitting. Reviewers: RKSimon, spatel, zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38305 llvm-svn: 314432
* [X86] Remove dead code from X86ISelDAGToDAG.cpp multiply handlingCraig Topper2017-09-281-1/+1
| | | | | | | | | | | | | | | | | Summary: Lowering never creates X86ISD::UMUL for 8-bit types. X86ISD::UMUL8 is used instead. If X86ISD::UMUL 8-bit were ever used it would crash. DAGCombiner replaces UMUL_LOHI/SMUL_LOHI with a wider MUL and a shift if the type twice as wide is legal. So we should never see i8 UMUL_LOHI/SMUL_LOHI. In fact I think there was a bug in part of the i8 code. Similar is true for i16 though without the bug. Reviewers: RKSimon, spatel, zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38276 llvm-svn: 314430
* [X86] Use correct subvector index when combining two insert subvectors ↵Craig Topper2017-09-281-1/+1
| | | | | | | | | | featuring zero vectors. Previously we were using one of the subvector indices twice. The included test case causes an assert without this change. Thanks to Simon Pilgrim for catching this. llvm-svn: 314429
* Use SDValue::getConstantOperandVal helper. NFCI.Simon Pilgrim2017-09-281-3/+3
| | | | llvm-svn: 314425
* [x86][AsmParser] Allow some more MS size directivesCoby Tayree2017-09-281-0/+3
| | | | | | | MS allows the following size directives: float/double and long as synonymous to dword/qword and dword, respectively. Differential Revision: https://reviews.llvm.org/D37190 llvm-svn: 314410
* [MachineOutliner] AArch64: Avoid saving + restoring LR if possibleJessica Paquette2017-09-272-39/+59
| | | | | | | | | | | | | | | | This commit allows the outliner to avoid saving and restoring the link register on AArch64 when it is dead within an entire class of candidates. This introduces changes to the way the outliner interfaces with the target. For example, the target now interfaces with the outliner using a MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and getOutliningFrameOverhead. This also improves several comments on the outliner's cost model. https://reviews.llvm.org/D36721 llvm-svn: 314341
* Revert r314249 "Recommit r314151 "[X86] Make all the NOREX CodeGenOnly ↵Craig Topper2017-09-274-37/+27
| | | | | | | | instructions into postRA pseudos like the NOREX version of TEST.""" This caused PR34751 llvm-svn: 314339
* Revert r314248 "[X86] Don't emit X86::MOV8rr_NOREX from ↵Craig Topper2017-09-271-5/+7
| | | | | | | | X86InstrInfo::copyPhysReg." This contributed to PR34751 llvm-svn: 314338
* [X86][SSE] Pull out variable shuffle mask combine logic. NFCI.Simon Pilgrim2017-09-271-10/+13
| | | | | | Hopefully this will make it easier to vary the combine depth threshold per-target. llvm-svn: 314337
* [X86] Rewrite the zero vector checks in lowerV2X128VectorShuffle to use the ↵Craig Topper2017-09-271-23/+10
| | | | | | | | | | Zeroable APInt We already have zeroable bits in an APInt. We might as well use that instead of checking for an all zero BUILD_VECTOR. Differential Revision: https://reviews.llvm.org/D37950 llvm-svn: 314332
* [X86] In combineLoopSADPattern, pad result with zeros and use full size add ↵Craig Topper2017-09-271-10/+7
| | | | | | | | | | | | | | instead of using a smaller add and inserting. In some cases the result psadbw is smaller than the type of the add that started the match. Currently in these cases we are using a smaller add and inserting the result. If we instead combine the psadbw with zeros and use the full size add we can take advantage of implicit zeroing we get if we emit a narrower move before the add. In a future patch, I want to make isel aware that the psadbw itself already zeroed the upper bits and remove the move entirely. Differential Revision: https://reviews.llvm.org/D37453 llvm-svn: 314331
* [X86][AsmParser] fix PR32035Coby Tayree2017-09-271-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D37473 llvm-svn: 314295
* [X86][AVX] Improve (i4 bitcast (v4i1 x)) handling for 256-bit vector compare ↵Simon Pilgrim2017-09-271-1/+1
| | | | | | | | results. As commented on D37849 and rL313547, AVX1 targets were missing a chance to use vmovmskpd for v4f64/v4i64 results for bool vector bitcasts llvm-svn: 314293
* [X86] Fix SJLJ struct offsets for x86_64Martin Storsjo2017-09-271-2/+2
| | | | | | | | | This is necessary, but not sufficient, for having working SJLJ exception handling on x86_64. Differential Revision: https://reviews.llvm.org/D38254 llvm-svn: 314277
* [X86] Remove erroneous callsite offsetting in SJLJ landing padsMartin Storsjo2017-09-271-6/+2
| | | | | | | | | | | | | | | | | | The callsite value is already stored indexed from 0 in the _Unwind_Context struct. When accessed via the functions _Unwind_GetIP and _Unwind_SetIP, the value is indexed from 1, but those functions handle the offseting. When reading directly from the struct here, we shouldn't subtract 1. This matches the code generated by the ARM target, where SJLJ exception handling is used by default on iOS. This makes clang-built object files for 32 bit x86 mingw work when linked with libgcc/libstdc++. Differential Revision: https://reviews.llvm.org/D38251 llvm-svn: 314276
* [X86] Use extract128BitVector in LowerMULH so we can extract from constant ↵Craig Topper2017-09-271-5/+6
| | | | | | build vectors. llvm-svn: 314274
* [X86] Fix register class name in a comment. NFCCraig Topper2017-09-261-1/+1
| | | | llvm-svn: 314250
* Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into ↵Craig Topper2017-09-264-27/+37
| | | | | | | | postRA pseudos like the NOREX version of TEST."" The late MOV8rr_NOREX that caused the crash has been removed. llvm-svn: 314249
* [X86] Don't emit X86::MOV8rr_NOREX from X86InstrInfo::copyPhysReg.Craig Topper2017-09-261-7/+5
| | | | | | This hook is called after register allocation with two physical registers. We don't need a separate instruction at that time to force register class constraints. I left in the assert though. We also have a fatal error in X86MCCodeEmitter if we ever encode an H-reg and a REX prefix. llvm-svn: 314248
* [X86] Fix typo in comment. NFCCraig Topper2017-09-261-1/+1
| | | | llvm-svn: 314247
* [X86][LLVM]Expanding Supports lowerInterleavedStore() in ↵Michael Zuckerman2017-09-261-3/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86InterleavedAccess (VF{8|16|32} stride 3) This patch expands the support of lowerInterleavedStore to {8|16|32}x8i stride 3. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) . This patch is part two of two patches and it covers the store (interlevaed) side. The patch goal is to optimize the following sequence: a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7 c0 c1 c2 c3 c4 c5 c6 c7 into a0 b0 c0 a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a6 b6 c6 a7 b7 c7 Reviewers: zvi guyblank dorit Ayal Differential Revision: https://reviews.llvm.org/D37117 Change-Id: I56ced8bcbea809a37654060771911ade20246ccc llvm-svn: 314234
* [X86] Add support for v16i32 UMUL_LOHI/SMUL_LOHICraig Topper2017-09-261-17/+20
| | | | | | | | | | | | | | Summary: This patch extends the v8i32/v4i32 custom lowering to support v16i32 Reviewers: zvi, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38274 llvm-svn: 314221
* [X86][XOP] Merge rotation opcodes with AVX512 equivalents. NFCI.Simon Pilgrim2017-09-265-26/+19
| | | | | | | | The XOP rotations act as ROTL with +ve values and ROTR with -ve values, which means that we can treat them all as ROTL with unsigned modulo. We already check that we're only trying to lower as ROTL for XOP rotations. Differential Revision: https://reviews.llvm.org/D37949 llvm-svn: 314207
* [x86] fix pr29061Coby Tayree2017-09-261-6/+8
| | | | | | | | | | https://bugs.llvm.org//show_bug.cgi?id=29061 Don't try referencing REX-needed regs when not on 64bit mode Aligns to GCC Differetial Revision: https://reviews.llvm.org/D37801 llvm-svn: 314203
* Revert "[X86] Make all the NOREX CodeGenOnly instructions into postRA ↵Benjamin Kramer2017-09-264-37/+27
| | | | | | | | pseudos like the NOREX version of TEST." Makes llc crash. This reverts commit r314151. llvm-svn: 314199
* [X86] Finishing broadcastf32x2 and broadcasti32x2 intrinsics lowering to IR. ↵Uriel Korach2017-09-261-10/+0
| | | | | | | | | | | | llvm side. Removing X86 broadcast(f/i)32x2 intrinsics from llvm. Adding autoUpgrade support. Moving matching tests from avx512dq-intrinsics.ll to avx512dq-intrinsics-upgrade.ll and from avx512dqvl-intrinsics.ll to avx512dqvl-intrinsics-upgrade.ll. Differential Revision: https://reviews.llvm.org/D38220 llvm-svn: 314195
* X86: remove R12 from CSR on Windows x64 SwiftCCSaleem Abdulrasool2017-09-252-20/+21
| | | | | | | | R12 is used for the SwiftError parameter. It is no longer a CSR as it is used for transfer the SwiftError, and the caller must preserve it if they need to. llvm-svn: 314165
* [X86] Don't select anyext GR32->GR64 to SUBREG_TO_REG. Use INSERT_SUBREG ↵Craig Topper2017-09-251-1/+1
| | | | | | | | | | | | instead. As far as I know SUBREG_TO_REG is stating that the upper bits are 0. But if we are just converting the GR32 with no checks, then we have no reason to say the upper bits are 0. I don't really know how to test this today since I can't find anything that looks that closely at SUBREG_TO_REG. The test changes here seems to be some perturbance of register allocation. Differential Revision: https://reviews.llvm.org/D38001 llvm-svn: 314152
* [X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like ↵Craig Topper2017-09-254-27/+37
| | | | | | the NOREX version of TEST. llvm-svn: 314151
* [X86] [ASM INTEL SYNTAX] fix for incorrect assembler code generation when ↵Konstantin Belochapka2017-09-251-0/+1
| | | | | | | | | x86-asm-syntax=intel (PR34617). Fix for incorrect code generation when x86-asm-syntax=intel. Differential Revision: https://reviews.llvm.org/D37945 llvm-svn: 314140
* [AVX-512] Replace large number of explicit patterns that check for ↵Craig Topper2017-09-253-632/+116
| | | | | | | | | | | | | | insert_subvector with zero after masked compares with fewer patterns with predicate This replaces the large number of patterns that handle every possible case of zeroing after a masked compare with a few simpler patterns that use a predicate to check for a masked compare producer. This is similar to what we do for detecting free GR32->GR64 zero extends and free xmm->ymm/zmm zero extends. This shrinks the isel table from ~590k to ~531k. This is a roughly 10% reduction in size. Differential Revision: https://reviews.llvm.org/D38217 llvm-svn: 314133
* [X86][LLVM]Expanding Supports lowerInterleavedStore() in ↵Michael Zuckerman2017-09-251-1/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86InterleavedAccess (VF8 stride 4): This patch expands the support of lowerInterleavedStore to 8x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future. The patch goal is to optimize the following sequence: At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding each 8 chars: c0, c1, , c7 m0, m1, , m7 y0, y1, , y7 k0, k1, ., k7 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Reviewers DavidKreitzer Farhana zvi igorb guyblank RKSimon Ayal Differential Revision: https://reviews.llvm.org/D36058 Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32 llvm-svn: 314109
* Adding missing feature to goldmont.Michael Zuckerman2017-09-251-1/+2
| | | | | Change-Id: I1ddc619169fae6a56308deef8dae5db3da702cf4 llvm-svn: 314103
* [CodeGenPrepare][NFC] Rename TargetTransformInfo::expandMemCmp -> ↵Clement Courbet2017-09-252-2/+2
| | | | | | | | | | | | | | | | TargetTransformInfo::enableMemCmpExpansion. Summary: Right now there are two functions with the same name, one does the work and the other one returns true if expansion is needed. Rename TargetTransformInfo::expandMemCmp to make it more consistent with other members of TargetTransformInfo. Remove the unused Instruction* parameter. Differential Revision: https://reviews.llvm.org/D38165 llvm-svn: 314096
* [X86] Make IFMA instructions during isel so we can fold broadcast loads.Craig Topper2017-09-245-21/+46
| | | | | | This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us. llvm-svn: 314083
* [X86] Add IFMA instructions to the load folding tables and make them ↵Craig Topper2017-09-242-1/+54
| | | | | | commutable for the multiply operands. llvm-svn: 314080
* Fix signed/unsigned warningSimon Pilgrim2017-09-241-1/+1
| | | | llvm-svn: 314078
* [X86][SSE] Add support for extending bool vectors bitcasted from scalarsSimon Pilgrim2017-09-241-0/+113
| | | | | | | | | | This patch acts as a reverse to combineBitcastvxi1 - bitcasting a scalar integer to a boolean vector and extending it 'in place' to the requested legal type. Currently this doesn't handle AVX512 at all - but the current mask register approach is lacking for some cases. Differential Revision: https://reviews.llvm.org/D35320 llvm-svn: 314076
* [AVX-512] Add pattern for selecting masked version of v8i32/v8f32 compare ↵Craig Topper2017-09-241-0/+17
| | | | | | | | instructions when VLX isn't available. We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'. llvm-svn: 314072
OpenPOWER on IntegriCloud