| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16137
llvm-svn: 258657
|
| |
|
|
|
|
| |
Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types.
llvm-svn: 258645
|
| |
|
|
| |
llvm-svn: 258624
|
| |
|
|
|
|
| |
If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies.
llvm-svn: 258622
|
| |
|
|
| |
llvm-svn: 258567
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector.
Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles.
Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU.
Differential Revision: http://reviews.llvm.org/D14901
llvm-svn: 258440
|
| |
|
|
|
|
|
|
| |
implementation.
Differential Revision: http://reviews.llvm.org/D16350
llvm-svn: 258309
|
| |
|
|
|
|
| |
Add support for decoding VZEXT_MOVL target shuffle masks, allowing it to be used as a source in target shuffle combines.
llvm-svn: 258215
|
| |
|
|
|
|
|
|
|
|
| |
As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles.
This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements.
Differential Revision: http://reviews.llvm.org/D16072
llvm-svn: 258205
|
| |
|
|
|
|
|
|
| |
cover all width and types (pd/ps/sd/ss) of fixupimm instruction and inrtinsics
Differential Revision: http://reviews.llvm.org/D16313
llvm-svn: 258124
|
| |
|
|
|
|
|
|
| |
AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly.
Differential Revision: http://reviews.llvm.org/D16050
llvm-svn: 258081
|
| |
|
|
|
|
|
|
| |
Implemented intrinsic for the follow instructions (store) : VMOVDQU8/16/32/64, VMOVDQA32/64, VMOVAPS/PD, VMOVUPS/PD.
Differential Revision: http://reviews.llvm.org/D16271
llvm-svn: 258047
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16184
llvm-svn: 258009
|
| |
|
|
|
|
|
|
|
|
| |
shuffle lowering
Added support for the extraction of the upper 128-bit subvectors for lower/upper half undef shuffles if it would reduce the number of extractions/insertions or avoid loads of AVX2 permps/permd shuffle masks.
Minor follow up to D15477.
llvm-svn: 258000
|
| |
|
|
|
|
| |
FIXME: Add more targets to use emutls into clang/test/Driver/emulated-tls.cpp.
FIXME: Add cygwin tests into llvm/test/CodeGen/X86. Working in progress.
llvm-svn: 257984
|
| |
|
|
|
|
|
|
|
|
|
|
| |
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.
I will commit fixes to other platforms as well.
PR26136
llvm-svn: 257925
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
We rely on HasOpaqueSPAdjustment not changing after we've calculated
things based on it. Things like whether or not we can use 'rep;movs' to
copy bytes around, that sort of thing. If it changes, invariants in the
backend will quietly break. This situation arose when we had a call to
memcpy *and* a COPY of the FLAGS register where we would attempt to
reference local variables using %esi, a register that was clobbered by
the 'rep;movs'.
This fixes PR26124.
llvm-svn: 257730
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16052
llvm-svn: 257607
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16048
llvm-svn: 257523
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16042
llvm-svn: 257463
|
| |
|
|
|
|
|
| |
This is the same change on x86-64 as r255821 on AArch64.
rdar://9001553
llvm-svn: 257428
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimized sitofp i64 %x to double. The current sequence
movl %ecx, 8(%esp)
movl %edx, 12(%esp)
fildll 8(%esp)
is replaced with:
movd %ecx, %xmm0
movd %edx, %xmm1
punpckldq %xmm1, %xmm0
movq %xmm0, 8(%esp)
Differential Revision: http://reviews.llvm.org/D15946
llvm-svn: 257285
|
| |
|
|
|
|
|
|
| |
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur.
This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars.
llvm-svn: 257266
|
| |
|
|
|
|
| |
Added 32-bit AVX1/AVX2 broadcast tests.
llvm-svn: 257264
|
| |
|
|
| |
llvm-svn: 257066
|
| |
|
|
|
|
|
|
| |
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur.
Follow up to D15310
llvm-svn: 257055
|
| |
|
|
|
|
| |
Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS.
llvm-svn: 257046
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
getTargetShuffleMask may return shuffle masks with SM_SentinelZero (-2) values (currently just for PSHUFB but VPERM2X128 as well with this patch). Although some calling functions can make use of this (mainly for shuffle combining), others can not and their inclusion makes shuffle mask comparisons more difficult.
This patch adds a flag to getTargetShuffleMask to indicate if the calling function can't handle SM_SentinelZero; getTargetShuffleMask will then return false if it occurs to make handling much easier.
I've tidied up some uses of getTargetShuffleMask to better indicate what is going on - more could be done but at present I don't have test cases to demonstrate it.
Some upcoming patches will make use of this to both support more uses where SM_SentinelZero is not permitted (e.g. combineShuffleToAddSub), and also will allow us to add INSERTPS support to getTargetShuffleMask as part of better zero handling discussed in D14261.
Differential Revision: http://reviews.llvm.org/D15378
llvm-svn: 256992
|
| |
|
|
|
|
|
|
|
| |
TLS calls need the stack frame to be properly set up and this
implies that such calls need ADJUSTSTACK_xxx markers.
Fixes PR25820.
llvm-svn: 256959
|
| |
|
|
|
|
|
| |
The code duplication contributed to PR25754:
https://llvm.org/bugs/show_bug.cgi?id=25754
llvm-svn: 256957
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: spatel, srking
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15331
llvm-svn: 256924
|
| |
|
|
|
|
| |
Replace the assert in combineShuffleToAddSub with an early out.
llvm-svn: 256922
|
| |
|
|
|
|
| |
As discussed on D15378, move the mask.empty() tests to after the switch statement and consider any shuffle decode where the extracted target shuffle mask is empty as a failure.
llvm-svn: 256921
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We queried hasFP before we hit ExpandISelPseudos. ExpandISelPseudos
manipulated state that hasFP relied on, potentially changing the result
after it has been queried elsewhere.
While I am not aware of any particular bug due to this state of affairs,
it seems best to avoid it entirely by changing the state during DAG
construction.
llvm-svn: 256849
|
| |
|
|
|
|
| |
PBLEND/BLENDPD/BLENDPS are no different to the other target shuffles and this will make future improvements to the target shuffle combines more straightforward.
llvm-svn: 256819
|
| |
|
|
|
|
| |
input type
llvm-svn: 256782
|
| |
|
|
|
|
|
|
|
|
| |
We need a frame pointer if there is a push/pop sequence after the
prologue in order to unwind the stack. Scanning the instructions to
figure out if this happened made hasFP not constant-time which is a
violation of expectations. Let's compute this up-front and reuse that
computation when we need it.
llvm-svn: 256730
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM's targets need to know if stack pointer adjustments occur after the
prologue. This is needed to correctly determine if the red-zone is
appropriate to use or if a frame pointer is required.
Normally, LLVM can figure this out very precisely by reasoning about the
contents of the MachineFunction. There is an interesting corner case:
inline assembly.
The vast majority of inline assembly which will perform a push or pop is
done so to pair up with pushf or popf as appropriate. Unfortunately,
this inline assembly doesn't mark the stack pointer as clobbered
because, well, it isn't. The stack pointer is decremented and then
immediately incremented. Because of this, LLVM was changed in r256456
to conservatively assume that inline assembly contain a sequence of
stack operations. This is unfortunate because the vast majority of
inline assembly will not end up manipulating the stack pointer in any
way at all.
Instead, let's provide a more principled solution: an intrinsic.
FWIW, other compilers (MSVC and GCC among them) also provide this
functionality as an intrinsic.
llvm-svn: 256685
|
| |
|
|
|
|
| |
remove a layering violation in the Util library.
llvm-svn: 256680
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D15808
llvm-svn: 256670
|
| |
|
|
|
|
|
|
|
|
|
| |
(PR24475)
This is a follow-on to:
http://reviews.llvm.org/rL255700
http://reviews.llvm.org/rL256454
http://reviews.llvm.org/rL256510
llvm-svn: 256522
|
| |
|
|
|
|
|
|
| |
This is a follow-on to:
http://reviews.llvm.org/rL255700
http://reviews.llvm.org/rL256454
llvm-svn: 256510
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the MCU psABI in a way different from r251223 and r251224,
basically reverting most of these two patches. The problem with the approach
taken in r251223/4 is that it only handled libcalls that originated from the backend.
However, the mid-end also inserts quite a few libcalls and assumes these use the
platform's default calling convention.
The previous patch tried to insert inregs when necessary both in the FE and,
somewhat hackily, in the CG. Instead, we now define a new default calling convention
for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does.
Differential Revision: http://reviews.llvm.org/D15054
llvm-svn: 256494
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
lower broadcast<type>x<vector> to shuffles.
there are two cases:
1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0.
2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op).
Differential Revision: http://reviews.llvm.org/D15790
llvm-svn: 256490
|
| |
|
|
|
|
| |
ctlz_zero_undef. Change the operation for CTLZ_ZERO_UNDEF to Expand so SelectionDAG will convert them to CTLZ before lowering.
llvm-svn: 256477
|
| |
|
|
|
|
|
|
|
| |
Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB.
Implement VPMOVB/W/D/Q2M intrinsic.
Differential Revision: http://reviews.llvm.org/D15675
llvm-svn: 256470
|
| |
|
|
|
|
|
| |
This is a follow-on to:
http://reviews.llvm.org/rL255700
llvm-svn: 256454
|
| |
|
|
|
|
| |
statements. NFC
llvm-svn: 256451
|
| |
|
|
|
|
|
|
| |
getX86SubSuperRegister with just an unsigned representing size.
This a is step towards fixing a layering violation so the X86 AsmParser won't depending on CodeGen types.
llvm-svn: 256425
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org//D15747
llvm-svn: 256364
|