summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86ISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AVX512: VMOVDQU8/16/32/64 (load) intrinsic implementation.Igor Breger2016-01-241-0/+22
| | | | | | Differential Revision: http://reviews.llvm.org/D16137 llvm-svn: 258657
* [X86][SSE] Generalised TRUNC -> PACKSS/PACKUS code. NFC.Simon Pilgrim2016-01-231-16/+11
| | | | | | Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types. llvm-svn: 258645
* Added missing comment. NFC.Simon Pilgrim2016-01-231-2/+3
| | | | llvm-svn: 258624
* [X86][SSE] Remove INSERTPS dependencies from unreferenced operands.Simon Pilgrim2016-01-231-3/+13
| | | | | | If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies. llvm-svn: 258622
* fix typos; NFCSanjay Patel2016-01-221-2/+2
| | | | llvm-svn: 258567
* [X86][SSE] Improve i16 splatting shufflesSimon Pilgrim2016-01-211-0/+20
| | | | | | | | | | | | Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector. Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles. Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU. Differential Revision: http://reviews.llvm.org/D14901 llvm-svn: 258440
* AVX512: Store (MOVNTPD, MOVNTPS, MOVNTDQ) using non-temporal hint intrinsic ↵Igor Breger2016-01-201-1/+16
| | | | | | | | implementation. Differential Revision: http://reviews.llvm.org/D16350 llvm-svn: 258309
* [X86][SSE] Add VZEXT_MOVL target shuffle decoding.Simon Pilgrim2016-01-191-0/+5
| | | | | | Add support for decoding VZEXT_MOVL target shuffle masks, allowing it to be used as a source in target shuffle combines. llvm-svn: 258215
* [X86][SSE] Add INSERTPS target shuffle combines.Simon Pilgrim2016-01-191-0/+137
| | | | | | | | | | As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles. This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements. Differential Revision: http://reviews.llvm.org/D16072 llvm-svn: 258205
* [X86][AVX512]fix dag & add intrinsics for fixupimmAsaf Badouh2016-01-191-0/+30
| | | | | | | | cover all width and types (pd/ps/sd/ss) of fixupimm instruction and inrtinsics Differential Revision: http://reviews.llvm.org/D16313 llvm-svn: 258124
* [X86][AVX2] Broadcast subvectorsSimon Pilgrim2016-01-181-3/+21
| | | | | | | | AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly. Differential Revision: http://reviews.llvm.org/D16050 llvm-svn: 258081
* AVX512: Masked store intrinsic implementation.Igor Breger2016-01-181-0/+28
| | | | | | | | Implemented intrinsic for the follow instructions (store) : VMOVDQU8/16/32/64, VMOVDQA32/64, VMOVAPS/PD, VMOVUPS/PD. Differential Revision: http://reviews.llvm.org/D16271 llvm-svn: 258047
* AVX512: Use MemIntrinsicSDNode to implement load/store intrinsic.Igor Breger2016-01-171-60/+76
| | | | | | Differential Revision: http://reviews.llvm.org/D16184 llvm-svn: 258009
* [X86][AVX] Enable extraction of upper 128-bit subvectors for 'half undef' ↵Simon Pilgrim2016-01-161-11/+28
| | | | | | | | | | shuffle lowering Added support for the extraction of the upper 128-bit subvectors for lower/upper half undef shuffles if it would reduce the number of extractions/insertions or avoid loads of AVX2 permps/permd shuffle masks. Minor follow up to D15477. llvm-svn: 258000
* [Cygwin] Use -femulated-tls by default since r257718 introduced the new pass.NAKAMURA Takumi2016-01-161-5/+1
| | | | | | FIXME: Add more targets to use emutls into clang/test/Driver/emulated-tls.cpp. FIXME: Add cygwin tests into llvm/test/CodeGen/X86. Working in progress. llvm-svn: 257984
* CXX_FAST_TLS calling convention: fix issue on X86-64.Manman Ren2016-01-151-4/+5
| | | | | | | | | | | | When we have a single basic block, the explicit copy-back instructions should be inserted right before the terminator. Before this fix, they were wrongly placed at the beginning of the basic block. I will commit fixes to other platforms as well. PR26136 llvm-svn: 257925
* [X86] Don't alter HasOpaqueSPAdjustment after we've relied on itDavid Majnemer2016-01-141-1/+1
| | | | | | | | | | | | | | We rely on HasOpaqueSPAdjustment not changing after we've calculated things based on it. Things like whether or not we can use 'rep;movs' to copy bytes around, that sort of thing. If it changes, invariants in the backend will quietly break. This situation arose when we had a call to memcpy *and* a COPY of the FLAGS register where we would attempt to reference local variables using %esi, a register that was clobbered by the 'rep;movs'. This fixes PR26124. llvm-svn: 257730
* Fixing warning by adding the X86ISD::VROTRI case. Michael Zuckerman2016-01-131-0/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D16052 llvm-svn: 257607
* [AVX512] adding PROLQ and PROLD IntrinsicsMichael Zuckerman2016-01-121-0/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D16048 llvm-svn: 257523
* AVX512: VPMOVAPS/PD and VPMOVUPS/PD (load) intrinsic implementation.Igor Breger2016-01-121-2/+48
| | | | | | Differential Revision: http://reviews.llvm.org/D16042 llvm-svn: 257463
* CXX_FAST_TLS calling convention: performance improvement for x86-64.Manman Ren2016-01-121-0/+60
| | | | | | | This is the same change on x86-64 as r255821 on AArch64. rdar://9001553 llvm-svn: 257428
* Optimized instruction sequence for sitofp operation on X86-32Elena Demikhovsky2016-01-101-15/+43
| | | | | | | | | | | | | | | | | | | Optimized sitofp i64 %x to double. The current sequence movl %ecx, 8(%esp) movl %edx, 12(%esp) fildll 8(%esp) is replaced with: movd %ecx, %xmm0 movd %edx, %xmm1 punpckldq %xmm1, %xmm0 movq %xmm0, 8(%esp) Differential Revision: http://reviews.llvm.org/D15946 llvm-svn: 257285
* [X86][AVX] Match broadcast loads through a bitcastSimon Pilgrim2016-01-091-2/+7
| | | | | | | | AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur. This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars. llvm-svn: 257266
* [X86][AVX] Add support for i64 broadcast loads on 32-bit targetsSimon Pilgrim2016-01-091-2/+9
| | | | | | Added 32-bit AVX1/AVX2 broadcast tests. llvm-svn: 257264
* Revert r257055, it caused PR26064.Nico Weber2016-01-071-7/+2
| | | | llvm-svn: 257066
* [X86][AVX] Match broadcast loads through a bitcastSimon Pilgrim2016-01-071-2/+7
| | | | | | | | AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur. Follow up to D15310 llvm-svn: 257055
* [X86][SSE} Add INSERTPS as a target shuffleSimon Pilgrim2016-01-071-3/+17
| | | | | | Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS. llvm-svn: 257046
* [X86] Determine if target shuffle can contain zero elementsSimon Pilgrim2016-01-061-15/+17
| | | | | | | | | | | | | | getTargetShuffleMask may return shuffle masks with SM_SentinelZero (-2) values (currently just for PSHUFB but VPERM2X128 as well with this patch). Although some calling functions can make use of this (mainly for shuffle combining), others can not and their inclusion makes shuffle mask comparisons more difficult. This patch adds a flag to getTargetShuffleMask to indicate if the calling function can't handle SM_SentinelZero; getTargetShuffleMask will then return false if it occurs to make handling much easier. I've tidied up some uses of getTargetShuffleMask to better indicate what is going on - more could be done but at present I don't have test cases to demonstrate it. Some upcoming patches will make use of this to both support more uses where SM_SentinelZero is not permitted (e.g. combineShuffleToAddSub), and also will allow us to add INSERTPS support to getTargetShuffleMask as part of better zero handling discussed in D14261. Differential Revision: http://reviews.llvm.org/D15378 llvm-svn: 256992
* [X86] Correctly model TLS calls w.r.t. frame requirements.Quentin Colombet2016-01-061-0/+4
| | | | | | | | | TLS calls need the stack frame to be properly set up and this implies that such calls need ADJUSTSTACK_xxx markers. Fixes PR25820. llvm-svn: 256959
* refactor divrem8 lowering; NFCISanjay Patel2016-01-061-26/+30
| | | | | | | The code duplication contributed to PR25754: https://llvm.org/bugs/show_bug.cgi?id=25754 llvm-svn: 256957
* PR25754: avoid generating UDIVREM8_ZEXT_HREG nodes with i64 resultArtyom Skrobov2016-01-061-1/+2
| | | | | | | | | | Reviewers: spatel, srking Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15331 llvm-svn: 256924
* [X86][SSE] There is no zmm addsubpd/addsubps instruction.Simon Pilgrim2016-01-061-9/+7
| | | | | | Replace the assert in combineShuffleToAddSub with an early out. llvm-svn: 256922
* [X86][SSE] An empty target shuffle mask is always a failure.Simon Pilgrim2016-01-061-7/+4
| | | | | | As discussed on D15378, move the mask.empty() tests to after the switch statement and consider any shuffle decode where the extracted target shuffle mask is empty as a failure. llvm-svn: 256921
* [X86] Determine if we have an OpaqueSPAdjustment earlierDavid Majnemer2016-01-051-4/+12
| | | | | | | | | | | | We queried hasFP before we hit ExpandISelPseudos. ExpandISelPseudos manipulated state that hasFP relied on, potentially changing the result after it has been queried elsewhere. While I am not aware of any particular bug due to this state of affairs, it seems best to avoid it entirely by changing the state during DAG construction. llvm-svn: 256849
* [X86][SSE] Merge PerformBLENDICombine into PerformShuffleCombineSimon Pilgrim2016-01-051-29/+26
| | | | | | PBLEND/BLENDPD/BLENDPS are no different to the other target shuffles and this will make future improvements to the target shuffle combines more straightforward. llvm-svn: 256819
* [X86][SSE] Ensure BLENDPD/BLENDPS/PBLEND inputs are both of the correct ↵Simon Pilgrim2016-01-041-0/+3
| | | | | | input type llvm-svn: 256782
* [X86] Make hasFP constant timeDavid Majnemer2016-01-041-0/+12
| | | | | | | | | | We need a frame pointer if there is a push/pop sequence after the prologue in order to unwind the stack. Scanning the instructions to figure out if this happened made hasFP not constant-time which is a violation of expectations. Let's compute this up-front and reuse that computation when we need it. llvm-svn: 256730
* [X86] Add intrinsics for reading and writing to the flags registerDavid Majnemer2016-01-011-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | LLVM's targets need to know if stack pointer adjustments occur after the prologue. This is needed to correctly determine if the red-zone is appropriate to use or if a frame pointer is required. Normally, LLVM can figure this out very precisely by reasoning about the contents of the MachineFunction. There is an interesting corner case: inline assembly. The vast majority of inline assembly which will perform a push or pop is done so to pair up with pushf or popf as appropriate. Unfortunately, this inline assembly doesn't mark the stack pointer as clobbered because, well, it isn't. The stack pointer is decremented and then immediately incremented. Because of this, LLVM was changed in r256456 to conservatively assume that inline assembly contain a sequence of stack operations. This is unfortunate because the vast majority of inline assembly will not end up manipulating the stack pointer in any way at all. Instead, let's provide a more principled solution: an intrinsic. FWIW, other compilers (MSVC and GCC among them) also provide this functionality as an intrinsic. llvm-svn: 256685
* [X86] Move shuffle decoding for constant pool into the X86CodeGen library to ↵Craig Topper2015-12-311-0/+1
| | | | | | remove a layering violation in the Util library. llvm-svn: 256680
* [X86][PKU] Add {RD,WR}PKRU intrinsicsAsaf Badouh2015-12-311-1/+46
| | | | | | Differential Revision: http://reviews.llvm.org/D15808 llvm-svn: 256670
* [x86] lower calls to fmin and llvm.minnum.* using minss/minsd/minps/minpd ↵Sanjay Patel2015-12-281-6/+10
| | | | | | | | | | | (PR24475) This is a follow-on to: http://reviews.llvm.org/rL255700 http://reviews.llvm.org/rL256454 http://reviews.llvm.org/rL256510 llvm-svn: 256522
* [x86] lower calls to fmax and llvm.maxnum.* using maxps/maxpd (PR24475)Sanjay Patel2015-12-281-2/+4
| | | | | | | | This is a follow-on to: http://reviews.llvm.org/rL255700 http://reviews.llvm.org/rL256454 llvm-svn: 256510
* [X86] Better support for the MCU psABI (LLVM part)Michael Kuperstein2015-12-281-30/+6
| | | | | | | | | | | | | | | | This adds support for the MCU psABI in a way different from r251223 and r251224, basically reverting most of these two patches. The problem with the approach taken in r251223/4 is that it only handled libcalls that originated from the backend. However, the mid-end also inserts quite a few libcalls and assumes these use the platform's default calling convention. The previous patch tried to insert inregs when necessary both in the FE and, somewhat hackily, in the CG. Instead, we now define a new default calling convention for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does. Differential Revision: http://reviews.llvm.org/D15054 llvm-svn: 256494
* [X86][AVX512] Lower broadcast sub vector to vector inrtrinsicsAsaf Badouh2015-12-281-0/+17
| | | | | | | | | | | | | lower broadcast<type>x<vector> to shuffles. there are two cases: 1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0. 2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op). Differential Revision: http://reviews.llvm.org/D15790 llvm-svn: 256490
* [AVX512] Remove separate instruction and patterns for lowering ↵Craig Topper2015-12-271-19/+16
| | | | | | ctlz_zero_undef. Change the operation for CTLZ_ZERO_UNDEF to Expand so SelectionDAG will convert them to CTLZ before lowering. llvm-svn: 256477
* AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE.Igor Breger2015-12-271-34/+73
| | | | | | | | | Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB. Implement VPMOVB/W/D/Q2M intrinsic. Differential Revision: http://reviews.llvm.org/D15675 llvm-svn: 256470
* [x86] lower calls to llvm.maxnum.v4f32 using maxpsSanjay Patel2015-12-261-7/+10
| | | | | | | This is a follow-on to: http://reviews.llvm.org/rL255700 llvm-svn: 256454
* [X86] Fold some variable declarations and initializations into if ↵Craig Topper2015-12-261-6/+3
| | | | | | statements. NFC llvm-svn: 256451
* [X86] Replace MVT::SimpleValueType in the AsmParser library and ↵Craig Topper2015-12-251-9/+5
| | | | | | | | getX86SubSuperRegister with just an unsigned representing size. This a is step towards fixing a layering violation so the X86 AsmParser won't depending on CodeGen types. llvm-svn: 256425
* AVX512: VPMOVM2B/W/D/Q intrinsic implementation.Igor Breger2015-12-241-0/+6
| | | | | | Differential Revision: http://reviews.llvm.org//D15747 llvm-svn: 256364
OpenPOWER on IntegriCloud