| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Looking at the IR definition of a masked load made me realize
there was no reason to use a shuffle here, so we don't need
to convert the format of the mask at all.
llvm-svn: 263167
|
| |
|
|
|
|
|
|
|
|
|
|
| |
ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 263159
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to replace an add to esp with pops, we need to choose dead
registers to pop into. Registers clobbered by the call and not imp-def'd
by it should be safe. Except that it's not enough to check the register
itself isn't defined, we also need to make sure no overlapping registers
are defined either.
This fixes PR26711.
Differential Revision: http://reviews.llvm.org/D18029
llvm-svn: 263139
|
| |
|
|
|
|
|
|
| |
runs successfully on routines containing IRETs. This fixes PR26410.
Differential Revision: http://reviews.llvm.org/D17643
llvm-svn: 263120
|
| |
|
|
|
|
|
|
| |
(failed on instruction selection phase)
Differential Revision: http://reviews.llvm.org/D17924
llvm-svn: 263111
|
| |
|
|
|
|
|
|
| |
The BLEND+zero combine was failing to combine equivalent BLEND masks.
Follow up to D17483 and D17858
llvm-svn: 263105
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask.
This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining.
There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for.
Differential Revision: http://reviews.llvm.org/D17858
llvm-svn: 263102
|
| |
|
|
|
|
|
|
| |
Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on".
Differential Revision: http://reviews.llvm.org/D17994
llvm-svn: 263097
|
| |
|
|
|
|
|
|
|
|
|
| |
The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar
has a completely double-pumped AVX implementation. But getting the cost model to
reflect that is a much bigger problem. The small goal here is simply to improve on
the lie that !AVX2 == SandyBridge.
Differential Revision: http://reviews.llvm.org/D18000
llvm-svn: 263069
|
| |
|
|
|
|
|
|
| |
Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper.
Differential Revision: http://reviews.llvm.org/D17899
llvm-svn: 263067
|
| |
|
|
|
|
| |
http://reviews.llvm.org/D17967
llvm-svn: 263021
|
| |
|
|
|
|
|
|
| |
The fix consisting in using the library call for atomic compare and swap when
the instruction is not safe to use may be incorrect. Indeed the library call may
not exist on all platform. In other words, we need a better fix!
llvm-svn: 262943
|
| |
|
|
|
|
|
|
| |
ZERO_EXTEND_VECTOR_INREG"
This caused PR26870.
llvm-svn: 262935
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17953
llvm-svn: 262929
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fix bag with curly braces
Until now curly braces could only be used in MS inline assembly to mark block start/end.
All curly braces were removed completely at a very early stage.
This approach caused bugs like:
"m{o}v eax, ebx" turned into "mov eax, ebx" without any error.
In addition, AVX-512 added special operands (e.g., k registers), which are also surrounded by curly braces that mark them as such.
Now, we need to keep the curly braces and identify at a later stage if they are marking block start/end (if so, ignore them), or surrounding special AVX-512 operands (if so, parse them as such).
This patch fixes the bug described above and enables the use of AVX-512 special operands.
This commit is the the llvm part of the patch.
The clang part of the review is: http://reviews.llvm.org/D17766
The llvm part of the review is: http://reviews.llvm.org/D17767
Differential Revision: http://reviews.llvm.org/D17767
llvm-svn: 262843
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
shuffle combining.
Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles.
This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed.
Removed non-constant pool mask decode path as we have no way of testing it right now.
Differential Revision: http://reviews.llvm.org/D17916
llvm-svn: 262809
|
| |
|
|
|
|
|
|
| |
types on SKX
Differential Revision: http://reviews.llvm.org/D17913
llvm-svn: 262803
|
| |
|
|
|
|
| |
functions to populate the REX and VEX prefix bits that extend register encodings. NFC
llvm-svn: 262800
|
| |
|
|
| |
llvm-svn: 262799
|
| |
|
|
|
|
| |
explicitly reflects the desired size.
llvm-svn: 262798
|
| |
|
|
|
|
|
|
| |
kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector).
Differential Revision: http://reviews.llvm.org/D17763
llvm-svn: 262797
|
| |
|
|
|
|
|
|
|
|
| |
Added support for decoding VPERMILPS variable shuffle masks that aren't in the constant pool.
Added target shuffle mask decoding for SCALAR_TO_VECTOR+VZEXT_MOVL cases - these can happen for v2i64 constant re-materialization
Followup to D17681
llvm-svn: 262784
|
| |
|
|
|
|
|
|
| |
btver1 is a SSSE3/SSE4a only CPU - it doesn't have AVX and doesn't support XSAVE.
Differential Revision: http://reviews.llvm.org/D17683
llvm-svn: 262782
|
| |
|
|
|
|
|
|
|
|
| |
When the lowering of the setjmp intrinsic requires
a global base pointer to be set, make sure such pointer
gets defined by the CGBR pass.
This fixes PR26742.
llvm-svn: 262762
|
| |
|
|
|
|
|
|
|
|
|
|
| |
cmpxchgXXb uses RBX as one of its implicit argument. I.e., when
we use that instruction we need to clobber RBX. This is generally
fine, expect when RBX is a reserved register because in that case,
the register allocator will not track its value and will not
save and restore it when interferences occur.
rdar://problem/24851412
llvm-svn: 262759
|
| |
|
|
| |
llvm-svn: 262756
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The x86 ret instruction has a 16 bit immediate indicating how many bytes
to pop off of the stack beyond the return address.
There is a problem when extremely large structs are passed by value: we
might not be able to fit the number of bytes to pop into the return
instruction.
To fix this, expand RET_FLAG a little later and use a special sequence
to clean the stack:
pop %ecx ; return address is now in %ecx
add $n, %esp ; clean the stack
push %ecx ; bring the return address back on the stack
ret ; pop the return address and jmp to it's value
llvm-svn: 262755
|
| |
|
|
| |
llvm-svn: 262702
|
| |
|
|
|
|
| |
PSHUFB decoder was assuming that input was 128 or 256-bit vector only.
llvm-svn: 262661
|
| |
|
|
|
|
|
|
|
|
| |
The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic.
This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle.
Differential Revision: http://reviews.llvm.org/D17681
llvm-svn: 262635
|
| |
|
|
|
|
| |
lowerShift was manually splitting BUILD_VECTOR cases when it could just call Extract128BitVector which does this anyway.
llvm-svn: 262633
|
| |
|
|
| |
llvm-svn: 262631
|
| |
|
|
|
|
|
|
|
|
| |
overrides this ABI.
Fixed the ordering to check first for X86 interrupt handler then for MCU target.
Differential Revision: http://reviews.llvm.org/D17801
llvm-svn: 262628
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
That's not the case for VPERMV/VPERMV3, which cover all possible
combinations (the C intrinsics use a different order; the AVX vs
AVX512 intrinsics are different still).
Since:
r246981 AVX-512: Lowering for 512-bit vector shuffles.
VPERMV is recognized in getTargetShuffleMask.
This breaks assumptions in most callers, as they expect
the non-mask operands to start at index 0.
VPERMV has the mask as operand #0; VPERMV3 has it in the middle.
Instead of the faulty assumption, have getTargetShuffleMask return
its operands as well.
One alternative we considered was to change the operand order of
VPERMV, but we agreed to stick to the instruction order, as there
are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular).
Differential Revision: http://reviews.llvm.org/D17041
llvm-svn: 262627
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17844
llvm-svn: 262621
|
| |
|
|
|
|
|
|
| |
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 262599
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17753
llvm-svn: 262592
|
| |
|
|
|
|
|
|
|
| |
The code was previously not able to track a boolean argument
at a call site back to the formal argument of the caller.
Differential Revision: http://reviews.llvm.org/D17786
llvm-svn: 262575
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Catch objects with a displacement of zero do not initialize a catch
object. The displacement is relative to %rsp at the end of the
function's prologue for x86_64 targets.
If we place an object at the top-of-stack, we will end up wit a
displacement of zero resulting in our catch object remaining
uninitialized.
Address this by creating our catch objects as fixed objects. We will
ensure that the UnwindHelp object is created after the catch objects so
that no catch object will have a displacement of zero.
Differential Revision: http://reviews.llvm.org/D17823
llvm-svn: 262546
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r262370.
It turns out there is code out there that does sequences of allocas
greater than 4K: http://crbug.com/591404
The goal of this change was to improve the code size of inalloca call
sequences, but we got tangled up in the mess of dynamic allocas.
Instead, we should come back later with a separate MI pass that uses
dominance to optimize the full sequence. This should also be able to
remove the often unneeded stacksave/stackrestore pairs around the call.
llvm-svn: 262505
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17705
llvm-svn: 262480
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of.
This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well.
It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts.
Differential Revision: http://reviews.llvm.org/D17680
llvm-svn: 262478
|
| |
|
|
|
|
| |
VEX prefix. The operand is always a register. NFC
llvm-svn: 262468
|
| |
|
|
|
|
| |
how VEX prefix handling does.
llvm-svn: 262467
|
| |
|
|
|
|
|
|
|
|
|
| |
We modeled the RDFLAGS{32,64} operations as "using" {E,R}FLAGS.
While technically correct, this is not be desirable for folks who want
to examine aspects of the FLAGS register which are not related to
computation like whether or not CPUID is a valid instruction.
Differential Revision: http://reviews.llvm.org/D17782
llvm-svn: 262465
|
| |
|
|
| |
llvm-svn: 262464
|
| |
|
|
|
|
|
|
|
|
| |
encoded in bits 7:4 of the immediate.
For some instructions the register is not the last operand and the immediate handling had to detect this and hardcode the index to find it. It also required CurOp to be pointing at the last operand handled in the Form switch whereas for any instruction it would be pointing at the next operand.
Now we just capture the value in the Form switch when we know exactly where it is and the CurOp pointer can behave normally.
llvm-svn: 262462
|
| |
|
|
|
|
| |
respectively should reduce size tiny bit. NFC
llvm-svn: 262458
|
| |
|
|
|
|
|
|
| |
This isn't quite NFC because some of the SDLocs may change which could
cause scheduling differences. But no regression tests are affected and
there is no functional change intended.
llvm-svn: 262391
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TableGen checks at compiletime that for scheduling models with
"CompleteModel = 1" one of the following holds:
- Is marked with the hasNoSchedulingInfo flag
- The instruction is a subclass of Sched
- There are InstRW definitions in the scheduling model
Typical steps necessary to complete a model:
- Ensure all pseudo instructions that are expanded before machine
scheduling (usually everything handled with EmitYYY() functions in
XXXTargetLowering).
- If a CPU does not support some instructions mark the corresponding
resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }".
- Add missing scheduling information.
Differential Revision: http://reviews.llvm.org/D17747
llvm-svn: 262384
|