| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
undef in the shuffle mask. This shows up when we're printing comments
during lowering and we still have an IR-level constant hanging around
that models undef.
A nice consequence of this is *much* prettier test cases where the undef
lanes actually show up as undef rather than as a particular set of
values. This also allows us to print shuffle comments in cases that use
undef such as the recently added variable VPERMILPS lowering. Now those
test cases have nice shuffle comments attached with their details.
The shuffle lowering for PSHUFB has been augmented to use undef, and the
shuffle combining has been augmented to comprehend it.
llvm-svn: 218301
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
trick that I missed.
VPERMILPS has a non-immediate memory operand mode that allows it to do
asymetric shuffles in the two 128-bit lanes. Use this rather than two
shuffles and a blend.
However, it turns out the variable shuffle path to VPERMILPS (and
VPERMILPD, although that one offers no functional differenc from the
immediate operand other than variability) wasn't even plumbed through
codegen. Do such plumbing so that we can reasonably emit
a variable-masked VPERMILP instruction. Also plumb basic comment parsing
and printing through so that the tests are reasonable.
There are still a few tests which don't show the shuffle pattern. These
are tests with undef lanes. I'll teach the shuffle decoding and printing
to handle undef mask entries in a follow-up. I've looked at the masks
and they seem reasonable.
llvm-svn: 218300
|
|
|
|
|
|
|
|
|
|
| |
The PSHUFB mask decode routine used to assert if the mask index was out of
range (<0 or greater than the size of the vector). The problem is, we can
legitimately have a PSHUFB with a large index using intrinsics. The
instruction only uses the least significant 4 bits. This change removes the
assert and masks the index to match the instruction behaviour.
llvm-svn: 218242
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions when it finds an appropriate pattern.
These are lovely instructions, and its a shame to not use them. =] They
are fast, and can hand loads folded into their operands, etc.
I've also plumbed the comment shuffle decoding through the various
layers so that the test cases are printed nicely.
llvm-svn: 217758
|
|
|
|
|
|
|
|
|
| |
BLENDPS, BLENDPD, and PBLENDW instructions into pretty shuffle comments.
These will be used in my next commit as part of test cases for AVX
shuffles which can directly use blend in more places.
llvm-svn: 215701
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lowering with a small addition to it and adding PSHUFB combining.
There is one obvious place in the new vector shuffle lowering where we
should form PSHUFBs directly: when without them we will unpack a vector
of i8s across two different registers and do a potentially 4-way blend
as i16s only to re-pack them into i8s afterward. This is the crazy
expensive fallback path for i8 shuffles and we can just directly use
pshufb here as it will always be cheaper (the unpack and pack are
two instructions so even a single shuffle between them hits our
three instruction limit for forming PSHUFB).
However, this doesn't generate very good code in many cases, and it
leaves a bunch of common patterns not using PSHUFB. So this patch also
adds support for extracting a shuffle mask from PSHUFB in the X86
lowering code, and uses it to handle PSHUFBs in the recursive shuffle
combining. This allows us to combine through them, combine multiple ones
together, and generally produce sufficiently high quality code.
Extracting the PSHUFB mask is annoyingly complex because it could be
either pre-legalization or post-legalization. At least this doesn't have
to deal with re-materialized constants. =] I've added decode routines to
handle the different patterns that show up at this level and we dispatch
through them as appropriate.
The two primary test cases are updated. For the v16 test case there is
still a lot of room for improvement. Since I was going through it
systematically I left behind a bunch of FIXME lines that I'm hoping to
turn into ALL lines by the end of this.
llvm-svn: 214628
|
|
|
|
| |
llvm-svn: 214019
|
|
|
|
| |
llvm-svn: 214016
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions which happen to have a constant mask.
Currently, this only handles a very narrow set of cases, but those
happen to be the cases that I care about for testing shuffles sanely.
This is a bit trickier than other shuffle instructions because we're
decoding constants out of the constant pool. The current MC layer makes
it completely impossible to inspect a constant pool entry, so we have to
do it at the MI level and attach the comment to the streamer on its way
out. So no joy for disassembling, but it does make test cases and asm
dumps *much* nicer.
Sorry for no test cases, but it didn't really seem that valuable to go
trolling through existing old test cases and updating them. I'll have
lots of testing of this in the upcoming patch for SSSE3 emission in the
new vector shuffle lowering code paths.
llvm-svn: 213986
|
|
|
|
|
|
|
|
|
| |
Utilize the previous move of MVT to a separate header for all trivial
cases (that don't need any further restructuring).
Reviewed By: Tim Northover
llvm-svn: 204003
|
|
|
|
|
|
|
| |
In some cases the include is pushed "downstream" (or removed if
unused).
llvm-svn: 203644
|
|
|
|
|
|
| |
independent 256-bit lanes.
llvm-svn: 173674
|
|
|
|
|
|
| |
instruction.
llvm-svn: 173667
|
|
|
|
| |
llvm-svn: 173572
|
|
|
|
|
|
| |
Simplify some of the decode functions.
llvm-svn: 156268
|
|
|
|
| |
llvm-svn: 156265
|
|
|
|
|
|
| |
lower half correctly. Missed in r155982.
llvm-svn: 156059
|
|
|
|
|
|
| |
for AsmPrinter.
llvm-svn: 155982
|
|
|
|
|
|
| |
immediate is set.
llvm-svn: 154907
|
|
|
|
|
|
| |
SmallVector of int instead of unsigned for shuffle mask in decode functions. Preparation for another change.
llvm-svn: 153079
|
|
|
|
|
|
| |
decoding.
llvm-svn: 149859
|
|
|
|
|
|
| |
instruction commenting for AVX/AVX2 forms for integer UNPCKs.
llvm-svn: 145924
|
|
|
|
|
|
| |
type for VPERMILPD/PS. Add instruction selection support for VINSERTI128/VEXTRACTI128.
llvm-svn: 145483
|
|
|
|
|
|
| |
decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD.
llvm-svn: 145390
|
|
|
|
|
|
| |
add AVX flavors of many instructions and fix the destination operand for some of the existing AVX entries.
llvm-svn: 145063
|
|
|
|
|
|
| |
correctly. Add support for decoding UNPCKHPS/UNPCKHPD for AVX 128-bit and 256-bit forms.
llvm-svn: 145055
|
|
|
|
|
|
|
|
| |
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.
llvm-svn: 137519
|
|
|
|
| |
llvm-svn: 136452
|
|
|
|
|
|
|
| |
different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases
llvm-svn: 136157
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instruction introduced in AVX, which can operate on 128 and 256-bit vectors.
It considers a 256-bit vector as two independent 128-bit lanes. It can permute
any 32 or 64 elements inside a lane, and restricts the second lane to
have the same permutation of the first one. With the improved splat support
introduced early today, adding codegen for this instruction enable more
efficient 256-bit code:
Instead of:
vextractf128 $0, %ymm0, %xmm0
punpcklbw %xmm0, %xmm0
punpckhbw %xmm0, %xmm0
vinsertf128 $0, %xmm0, %ymm0, %ymm1
vinsertf128 $1, %xmm0, %ymm1, %ymm0
vextractf128 $1, %ymm0, %xmm1
shufps $1, %xmm1, %xmm1
movss %xmm1, 28(%rsp)
movss %xmm1, 24(%rsp)
movss %xmm1, 20(%rsp)
movss %xmm1, 16(%rsp)
vextractf128 $0, %ymm0, %xmm0
shufps $1, %xmm0, %xmm0
movss %xmm0, 12(%rsp)
movss %xmm0, 8(%rsp)
movss %xmm0, 4(%rsp)
movss %xmm0, (%rsp)
vmovaps (%rsp), %ymm0
We get:
vextractf128 $0, %ymm0, %xmm0
punpcklbw %xmm0, %xmm0
punpckhbw %xmm0, %xmm0
vinsertf128 $0, %xmm0, %ymm0, %ymm1
vinsertf128 $1, %xmm0, %ymm1, %ymm0
vpermilps $85, %ymm0, %ymm0
llvm-svn: 135662
|
|
|
|
|
|
|
|
|
|
| |
missing patterns for them.
Add a SIMD test subdirectory to hold tests for SIMD instruction
selection correctness and quality.
'
llvm-svn: 126845
|
|
|
|
| |
llvm-svn: 126682
|
|
|
|
|
|
|
|
| |
and 256-bit forms. Because the number of elements in a vector
does not determine the vector type (4 elements could be v4f32 or
v4f64), pass the full type of the vector to decode routines.
llvm-svn: 126664
|
|
(LLVMX86Utils.a) to break cyclic library dependencies between
LLVMX86CodeGen.a and LLVMX86AsmParser.a. Previously this code was in
a header file and marked static but AVX requires some additional
functionality here that won't be used by all clients. Since including
unused static functions causes a gcc compiler warning, keeping it as a
header would break builds that use -Werror. Putting this in its own
library solves both problems at once.
llvm-svn: 125765
|