|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | instructions when it finds an appropriate pattern.
These are lovely instructions, and its a shame to not use them. =] They
are fast, and can hand loads folded into their operands, etc.
I've also plumbed the comment shuffle decoding through the various
layers so that the test cases are printed nicely.
llvm-svn: 217758 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | BLENDPS, BLENDPD, and PBLENDW instructions into pretty shuffle comments.
These will be used in my next commit as part of test cases for AVX
shuffles which can directly use blend in more places.
llvm-svn: 215701 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | lowering with a small addition to it and adding PSHUFB combining.
There is one obvious place in the new vector shuffle lowering where we
should form PSHUFBs directly: when without them we will unpack a vector
of i8s across two different registers and do a potentially 4-way blend
as i16s only to re-pack them into i8s afterward. This is the crazy
expensive fallback path for i8 shuffles and we can just directly use
pshufb here as it will always be cheaper (the unpack and pack are
two instructions so even a single shuffle between them hits our
three instruction limit for forming PSHUFB).
However, this doesn't generate very good code in many cases, and it
leaves a bunch of common patterns not using PSHUFB. So this patch also
adds support for extracting a shuffle mask from PSHUFB in the X86
lowering code, and uses it to handle PSHUFBs in the recursive shuffle
combining. This allows us to combine through them, combine multiple ones
together, and generally produce sufficiently high quality code.
Extracting the PSHUFB mask is annoyingly complex because it could be
either pre-legalization or post-legalization. At least this doesn't have
to deal with re-materialized constants. =] I've added decode routines to
handle the different patterns that show up at this level and we dispatch
through them as appropriate.
The two primary test cases are updated. For the v16 test case there is
still a lot of room for improvement. Since I was going through it
systematically I left behind a bunch of FIXME lines that I'm hoping to
turn into ALL lines by the end of this.
llvm-svn: 214628 | 
| | 
| 
| 
| | llvm-svn: 214019 | 
| | 
| 
| 
| | llvm-svn: 214016 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | instructions which happen to have a constant mask.
Currently, this only handles a very narrow set of cases, but those
happen to be the cases that I care about for testing shuffles sanely.
This is a bit trickier than other shuffle instructions because we're
decoding constants out of the constant pool. The current MC layer makes
it completely impossible to inspect a constant pool entry, so we have to
do it at the MI level and attach the comment to the streamer on its way
out. So no joy for disassembling, but it does make test cases and asm
dumps *much* nicer.
Sorry for no test cases, but it didn't really seem that valuable to go
trolling through existing old test cases and updating them. I'll have
lots of testing of this in the upcoming patch for SSSE3 emission in the
new vector shuffle lowering code paths.
llvm-svn: 213986 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Utilize the previous move of MVT to a separate header for all trivial
cases (that don't need any further restructuring).
Reviewed By: Tim Northover
llvm-svn: 204003 | 
| | 
| 
| 
| 
| 
| 
| | In some cases the include is pushed "downstream" (or removed if
unused).
llvm-svn: 203644 | 
| | 
| 
| 
| 
| 
| | independent 256-bit lanes.
llvm-svn: 173674 | 
| | 
| 
| 
| 
| 
| | instruction.
llvm-svn: 173667 | 
| | 
| 
| 
| | llvm-svn: 173572 | 
| | 
| 
| 
| 
| 
| | Simplify some of the decode functions.
llvm-svn: 156268 | 
| | 
| 
| 
| | llvm-svn: 156265 | 
| | 
| 
| 
| 
| 
| | lower half correctly. Missed in r155982.
llvm-svn: 156059 | 
| | 
| 
| 
| 
| 
| | for AsmPrinter.
llvm-svn: 155982 | 
| | 
| 
| 
| 
| 
| | immediate is set.
llvm-svn: 154907 | 
| | 
| 
| 
| 
| 
| | SmallVector of int instead of unsigned for shuffle mask in decode functions. Preparation for another change.
llvm-svn: 153079 | 
| | 
| 
| 
| 
| 
| | decoding.
llvm-svn: 149859 | 
| | 
| 
| 
| 
| 
| | instruction commenting for AVX/AVX2 forms for integer UNPCKs.
llvm-svn: 145924 | 
| | 
| 
| 
| 
| 
| | type for VPERMILPD/PS. Add instruction selection support for VINSERTI128/VEXTRACTI128.
llvm-svn: 145483 | 
| | 
| 
| 
| 
| 
| | decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD.
llvm-svn: 145390 | 
| | 
| 
| 
| 
| 
| | add AVX flavors of many instructions and fix the destination operand for some of the existing AVX entries.
llvm-svn: 145063 | 
| | 
| 
| 
| 
| 
| | correctly. Add support for decoding UNPCKHPS/UNPCKHPD for AVX 128-bit and 256-bit forms.
llvm-svn: 145055 | 
| | 
| 
| 
| 
| 
| 
| 
| | vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.
llvm-svn: 137519 | 
| | 
| 
| 
| | llvm-svn: 136452 | 
| | 
| 
| 
| 
| 
| 
| | different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases
llvm-svn: 136157 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | instruction introduced in AVX, which can operate on 128 and 256-bit vectors.
It considers a 256-bit vector as two independent 128-bit lanes. It can permute
any 32 or 64 elements inside a lane, and restricts the second lane to
have the same permutation of the first one. With the improved splat support
introduced early today, adding codegen for this instruction enable more
efficient 256-bit code:
Instead of:
  vextractf128  $0, %ymm0, %xmm0
  punpcklbw %xmm0, %xmm0
  punpckhbw %xmm0, %xmm0
  vinsertf128 $0, %xmm0, %ymm0, %ymm1
  vinsertf128 $1, %xmm0, %ymm1, %ymm0
  vextractf128  $1, %ymm0, %xmm1
  shufps  $1, %xmm1, %xmm1
  movss %xmm1, 28(%rsp)
  movss %xmm1, 24(%rsp)
  movss %xmm1, 20(%rsp)
  movss %xmm1, 16(%rsp)
  vextractf128  $0, %ymm0, %xmm0
  shufps  $1, %xmm0, %xmm0
  movss %xmm0, 12(%rsp)
  movss %xmm0, 8(%rsp)
  movss %xmm0, 4(%rsp)
  movss %xmm0, (%rsp)
  vmovaps (%rsp), %ymm0
We get:
  vextractf128  $0, %ymm0, %xmm0
  punpcklbw %xmm0, %xmm0
  punpckhbw %xmm0, %xmm0
  vinsertf128 $0, %xmm0, %ymm0, %ymm1
  vinsertf128 $1, %xmm0, %ymm1, %ymm0
  vpermilps $85, %ymm0, %ymm0
llvm-svn: 135662 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | missing patterns for them.
      Add a SIMD test subdirectory to hold tests for SIMD instruction
      selection correctness and quality.
'
llvm-svn: 126845 | 
| | 
| 
| 
| | llvm-svn: 126682 | 
| | 
| 
| 
| 
| 
| 
| 
| | and 256-bit forms.  Because the number of elements in a vector
      does not determine the vector type (4 elements could be v4f32 or
      v4f64), pass the full type of the vector to decode routines.
llvm-svn: 126664 | 
|  | (LLVMX86Utils.a) to break cyclic library dependencies between
LLVMX86CodeGen.a and LLVMX86AsmParser.a.  Previously this code was in
a header file and marked static but AVX requires some additional
functionality here that won't be used by all clients.  Since including
unused static functions causes a gcc compiler warning, keeping it as a
header would break builds that use -Werror.  Putting this in its own
library solves both problems at once.
llvm-svn: 125765 |