summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/Utils/X86ShuffleDecode.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [x86] Teach the vector comment parsing and printing to correctly handleChandler Carruth2014-09-231-28/+74
| | | | | | | | | | | | | | | | | undef in the shuffle mask. This shows up when we're printing comments during lowering and we still have an IR-level constant hanging around that models undef. A nice consequence of this is *much* prettier test cases where the undef lanes actually show up as undef rather than as a particular set of values. This also allows us to print shuffle comments in cases that use undef such as the recently added variable VPERMILPS lowering. Now those test cases have nice shuffle comments attached with their details. The shuffle lowering for PSHUFB has been augmented to use undef, and the shuffle combining has been augmented to comprehend it. llvm-svn: 218301
* [x86] Teach the AVX1 path of the new vector shuffle lowering one moreChandler Carruth2014-09-231-0/+23
| | | | | | | | | | | | | | | | | | | | | | trick that I missed. VPERMILPS has a non-immediate memory operand mode that allows it to do asymetric shuffles in the two 128-bit lanes. Use this rather than two shuffles and a blend. However, it turns out the variable shuffle path to VPERMILPS (and VPERMILPD, although that one offers no functional differenc from the immediate operand other than variability) wasn't even plumbed through codegen. Do such plumbing so that we can reasonably emit a variable-masked VPERMILP instruction. Also plumb basic comment parsing and printing through so that the tests are reasonable. There are still a few tests which don't show the shuffle pattern. These are tests with undef lanes. I'll teach the shuffle decoding and printing to handle undef mask entries in a follow-up. I've looked at the masks and they seem reasonable. llvm-svn: 218300
* Fix assert when decoding PSHUFB maskRobert Lougher2014-09-221-6/+4
| | | | | | | | | | The PSHUFB mask decode routine used to assert if the mask index was out of range (<0 or greater than the size of the vector). The problem is, we can legitimately have a PSHUFB with a large index using intrinsics. The instruction only uses the least significant 4 bits. This change removes the assert and masks the index to match the instruction behaviour. llvm-svn: 218242
* [x86] Teach the x86 DAG combiner to form MOVSLDUP and MOVSHDUPChandler Carruth2014-09-151-0/+16
| | | | | | | | | | | | instructions when it finds an appropriate pattern. These are lovely instructions, and its a shame to not use them. =] They are fast, and can hand loads folded into their operands, etc. I've also plumbed the comment shuffle decoding through the various layers so that the test cases are printed nicely. llvm-svn: 217758
* [x86] Teach the instruction printer to decode immediate operands toChandler Carruth2014-08-151-0/+7
| | | | | | | | | BLENDPS, BLENDPD, and PBLENDW instructions into pretty shuffle comments. These will be used in my next commit as part of test cases for AVX shuffles which can directly use blend in more places. llvm-svn: 215701
* [x86] Largely complete the use of PSHUFB in the new vector shuffleChandler Carruth2014-08-021-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lowering with a small addition to it and adding PSHUFB combining. There is one obvious place in the new vector shuffle lowering where we should form PSHUFBs directly: when without them we will unpack a vector of i8s across two different registers and do a potentially 4-way blend as i16s only to re-pack them into i8s afterward. This is the crazy expensive fallback path for i8 shuffles and we can just directly use pshufb here as it will always be cheaper (the unpack and pack are two instructions so even a single shuffle between them hits our three instruction limit for forming PSHUFB). However, this doesn't generate very good code in many cases, and it leaves a bunch of common patterns not using PSHUFB. So this patch also adds support for extracting a shuffle mask from PSHUFB in the X86 lowering code, and uses it to handle PSHUFBs in the recursive shuffle combining. This allows us to combine through them, combine multiple ones together, and generally produce sufficiently high quality code. Extracting the PSHUFB mask is annoyingly complex because it could be either pre-legalization or post-legalization. At least this doesn't have to deal with re-materialized constants. =] I've added decode routines to handle the different patterns that show up at this level and we dispatch through them as appropriate. The two primary test cases are updated. For the v16 test case there is still a lot of room for improvement. Since I was going through it systematically I left behind a bunch of FIXME lines that I'm hoping to turn into ALL lines by the end of this. llvm-svn: 214628
* Fix broken assert.Nick Lewycky2014-07-261-1/+1
| | | | llvm-svn: 214019
* X86ShuffleDecode.cpp: Silence a warning. [-Wunused-variable]NAKAMURA Takumi2014-07-261-2/+2
| | | | llvm-svn: 214016
* [x86] Teach the X86 backend to print shuffle comments for PSHUFBChandler Carruth2014-07-251-0/+33
| | | | | | | | | | | | | | | | | | | | instructions which happen to have a constant mask. Currently, this only handles a very narrow set of cases, but those happen to be the cases that I care about for testing shuffles sanely. This is a bit trickier than other shuffle instructions because we're decoding constants out of the constant pool. The current MC layer makes it completely impossible to inspect a constant pool entry, so we have to do it at the MI level and attach the comment to the streamer on its way out. So no joy for disassembling, but it does make test cases and asm dumps *much* nicer. Sorry for no test cases, but it didn't really seem that valuable to go trolling through existing old test cases and updating them. I'll have lots of testing of this in the upcoming patch for SSSE3 emission in the new vector shuffle lowering code paths. llvm-svn: 213986
* Replace ValueTypes.h with MachineValueType.h if possible.Patrik Hagglund2014-03-151-1/+1
| | | | | | | | | Utilize the previous move of MVT to a separate header for all trivial cases (that don't need any further restructuring). Reviewed By: Tim Northover llvm-svn: 204003
* Replace '#include ValueTypes.h' with forward declarations.Patrik Hagglund2014-03-121-0/+1
| | | | | | | In some cases the include is pushed "downstream" (or removed if unused). llvm-svn: 203644
* Fix 256-bit PALIGNR comment decoding to understand that it works on ↵Craig Topper2013-01-281-2/+11
| | | | | | independent 256-bit lanes. llvm-svn: 173674
* Fix inconsistent usage of PALIGN and PALIGNR when referring to the same ↵Craig Topper2013-01-281-1/+2
| | | | | | instruction. llvm-svn: 173667
* X86: Decode PALIGN operands so I don't have to do it in my head.Benjamin Kramer2013-01-261-0/+8
| | | | llvm-svn: 173572
* Use MVT instead of EVT as the argument to all the shuffle decode functions. ↵Craig Topper2012-05-061-22/+18
| | | | | | Simplify some of the decode functions. llvm-svn: 156268
* Add shuffle decode support for VPERMQ/VPERMPD.Craig Topper2012-05-061-0/+8
| | | | llvm-svn: 156265
* Fix 256-bit vpshuflw and vpshufhw immediate encoding to handle undefs in the ↵Craig Topper2012-05-031-4/+2
| | | | | | lower half correctly. Missed in r155982. llvm-svn: 156059
* Add support for selecting AVX2 vpshuflw and vpshufhw. Add decoding support ↵Craig Topper2012-05-021-20/+32
| | | | | | for AsmPrinter. llvm-svn: 155982
* Don't decode vperm2i128 or vperm2f128 into a shuffle if bit 3 or 7 of the ↵Craig Topper2012-04-171-0/+3
| | | | | | immediate is set. llvm-svn: 154907
* Factor out target shuffle mask decoding from getShuffleScalarElt and use a ↵Craig Topper2012-03-201-16/+10
| | | | | | SmallVector of int instead of unsigned for shuffle mask in decode functions. Preparation for another change. llvm-svn: 153079
* Add shuffle decoding support for 256-bit pshufd. Merge vpermilp* and pshufd ↵Craig Topper2012-02-061-48/+32
| | | | | | decoding. llvm-svn: 149859
* Clean up some of the shuffle decoding code for UNPCK instructions. Add ↵Craig Topper2011-12-061-36/+3
| | | | | | instruction commenting for AVX/AVX2 forms for integer UNPCKs. llvm-svn: 145924
* Merge decoding of VPERMILPD and VPERMILPS shuffle masks. Merge X86ISD node ↵Craig Topper2011-11-301-28/+15
| | | | | | type for VPERMILPD/PS. Add instruction selection support for VINSERTI128/VEXTRACTI128. llvm-svn: 145483
* Fix issues in shuffle decoding around VPERM* instructions. Fix shuffle ↵Craig Topper2011-11-291-11/+21
| | | | | | decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD. llvm-svn: 145390
* More fixes to the X86InstComments for shuffle instructions. In particular ↵Craig Topper2011-11-221-20/+0
| | | | | | add AVX flavors of many instructions and fix the destination operand for some of the existing AVX entries. llvm-svn: 145063
* Fix shuffle decoding logic to handle UNPCKLPS/UNPCKLPD on 256-bit vectors ↵Craig Topper2011-11-221-14/+31
| | | | | | correctly. Add support for decoding UNPCKHPS/UNPCKHPD for AVX 128-bit and 256-bit forms. llvm-svn: 145055
* The VPERM2F128 is a AVX instruction which permutes between two 256-bitBruno Cardoso Lopes2011-08-121-0/+20
| | | | | | | | vectors. It operates on 128-bit elements instead of regular scalar types. Recognize shuffles that are suitable for VPERM2F128 and teach the x86 legalizer how to handle them. llvm-svn: 137519
* Add DecodeShuffle shuffle support for VPERMIPD variantesBruno Cardoso Lopes2011-07-291-19/+26
| | | | llvm-svn: 136452
* Recognize unpckh* masks and match 256-bit versions. The new versions areBruno Cardoso Lopes2011-07-261-11/+10
| | | | | | | different from the previous 128-bit because they work in lanes. Update a few comments and add testcases llvm-svn: 136157
* Add support for 256-bit versions of VPERMIL instruction. This is a newBruno Cardoso Lopes2011-07-211-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instruction introduced in AVX, which can operate on 128 and 256-bit vectors. It considers a 256-bit vector as two independent 128-bit lanes. It can permute any 32 or 64 elements inside a lane, and restricts the second lane to have the same permutation of the first one. With the improved splat support introduced early today, adding codegen for this instruction enable more efficient 256-bit code: Instead of: vextractf128 $0, %ymm0, %xmm0 punpcklbw %xmm0, %xmm0 punpckhbw %xmm0, %xmm0 vinsertf128 $0, %xmm0, %ymm0, %ymm1 vinsertf128 $1, %xmm0, %ymm1, %ymm0 vextractf128 $1, %ymm0, %xmm1 shufps $1, %xmm1, %xmm1 movss %xmm1, 28(%rsp) movss %xmm1, 24(%rsp) movss %xmm1, 20(%rsp) movss %xmm1, 16(%rsp) vextractf128 $0, %ymm0, %xmm0 shufps $1, %xmm0, %xmm0 movss %xmm0, 12(%rsp) movss %xmm0, 8(%rsp) movss %xmm0, 4(%rsp) movss %xmm0, (%rsp) vmovaps (%rsp), %ymm0 We get: vextractf128 $0, %ymm0, %xmm0 punpcklbw %xmm0, %xmm0 punpckhbw %xmm0, %xmm0 vinsertf128 $0, %xmm0, %ymm0, %ymm1 vinsertf128 $1, %xmm0, %ymm1, %ymm0 vpermilps $85, %ymm0, %ymm0 llvm-svn: 135662
* [AVX] Fix mask predicates for 256-bit UNPCKLPS/D and implementDavid Greene2011-03-021-6/+19
| | | | | | | | | | missing patterns for them. Add a SIMD test subdirectory to hold tests for SIMD instruction selection correctness and quality. ' llvm-svn: 126845
* fix a signed comparison warning.Chris Lattner2011-02-281-1/+1
| | | | llvm-svn: 126682
* [AVX] Add decode support for VUNPCKLPS/D instructions, both 128-bitDavid Greene2011-02-281-9/+38
| | | | | | | | and 256-bit forms. Because the number of elements in a vector does not determine the vector type (4 elements could be v4f32 or v4f64), pass the full type of the vector to decode routines. llvm-svn: 126664
* [AVX] Recorganize X86ShuffleDecode into its own libraryDavid Greene2011-02-171-0/+148
(LLVMX86Utils.a) to break cyclic library dependencies between LLVMX86CodeGen.a and LLVMX86AsmParser.a. Previously this code was in a header file and marked static but AVX requires some additional functionality here that won't be used by all clients. Since including unused static functions causes a gcc compiler warning, keeping it as a header would break builds that use -Werror. Putting this in its own library solves both problems at once. llvm-svn: 125765
OpenPOWER on IntegriCloud