summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/avx512-skx-insert-subvec.ll
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Teach lower1BitShuffle to recognize padding a subvector with zeros ↵Craig Topper2019-08-191-4/+0
| | | | | | | | | with V2 as the source and V1 as the zero vector. Shuffle canonicalization can swap the sources so the zero vector might be V1 and the subvector that's being padded can be V2. llvm-svn: 369226
* [X86] Add test case for missed opportunity to recognize a vXi1 shuffle as an ↵Craig Topper2019-08-191-0/+18
| | | | | | | | | | insert into a zero vector. We are currently missing this because shuffle canonicalization puts the zero vector as V1 and the subvector as V2. Our current code doesn't recognize this case. llvm-svn: 369225
* [X86] Add a special case to LowerCONCAT_VECTORSvXi1 to handle concatenating ↵Craig Topper2019-08-181-6/+3
| | | | | | | | | | zero vectors followed by one non-zero vector followed by undef vectors. For such a case we should only need a KSHIFTL, but we were previously generating a KSHIFTL followed by a KSHIFTR because we mistakenly believed we need to zero the undef elements. llvm-svn: 369224
* [X86] Add test cases for suboptimal insertion of a vXi1 vector into a larger ↵Craig Topper2019-08-181-0/+39
| | | | | | | | | vector with zeros in the lower elements and undef upper elements. Currently we generate kshifts to clear both the upper and lower elements, but we only need one kshift. llvm-svn: 369223
* [DAGCombiner] narrow vector binops when extraction is cheapSanjay Patel2018-10-301-2/+2
| | | | | | | | | | | | | | | | | Narrowing vector binops came up in the demanded bits discussion in D52912. I don't think we're going to be able to do this transform in IR as a canonicalization because of the risk of creating unsupported widths for vector ops, but we already have a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression test diffs). This is artificially limited to not look through bitcasts because there are so many test diffs already, but that's marked with a TODO and is a small follow-up. Differential Revision: https://reviews.llvm.org/D53784 llvm-svn: 345602
* [X86] Add support for turning vXi1 shuffles into KSHIFTL/KSHIFTR.Craig Topper2018-08-311-4/+1
| | | | | | | | | | This patch recognizes shuffles that shift elements and fill with zeros. I've copied and modified the shift matching code we use for normal vector registers to do this. I'm not sure if there's a good way to share more of this code without making the existing function more complex than it already is. This will be used to enable kshift intrinsics in clang. Differential Revision: https://reviews.llvm.org/D51401 llvm-svn: 341227
* [X86] Use vpmovq2m/vpmovd2m for truncate to vXi1 when possible.Craig Topper2018-02-191-14/+14
| | | | | | Previously we used vptestmd, but the scheduling data for SKX says vpmovq2m/vpmovd2m is lower latency. We already used vpmovb2m/vpmovw2m for byte/word truncates. So this is more consistent anyway. llvm-svn: 325534
* [X86] Custom legalize vXi1 extract_subvector with KSHIFTR.Craig Topper2017-12-301-2/+2
| | | | | | | | This allows us to remove some isel patterns. This is mostly NFC, but we now use KSHIFTB instead of KSHIFTW with DQI. llvm-svn: 321576
* [X86] Promote v8i1 shuffles to v8i32 instead of v8i64 if we have VLX.Craig Topper2017-12-211-4/+3
| | | | | | | | We should have equally good shuffle options for v8i32 with VLX. This was spotted during my attempts to remove 512-bit vectors from SKX. We still use 512-bits for v16i1, v32i1, and v64i1. I'm less sure we can handle those well with narrower vectors. i32 and i64 element sizes get the best shuffle support. llvm-svn: 321291
* [X86] Improve lowering of vXi1 insert_subvectors to better utilize ↵Craig Topper2017-12-091-22/+14
| | | | | | | | (insert_subvector zero, vec, 0) for zeroing upper bits. This can be better recognized during isel when the producer already zeroed the upper bits. llvm-svn: 320267
* [X86] When inserting into the upper bits of a vXi1 vector, make sure we ↵Craig Topper2017-12-091-4/+4
| | | | | | | | | | shift enough bits if we widened the vector. We may need to widen the vector to make the shifts legal, but if we do that we need to make sure we shift left/right after accounting for the new size. If not we can't guarantee we are shifting in zeros. The test cases affected actually show cases where we should move the shifts all together, but that's another problem. llvm-svn: 320248
* [X86] Don't use kunpck for vXi1 concat_vectors if the upper bits are undef.Craig Topper2017-12-051-3/+0
| | | | | | This can be efficiently selected by a COPY_TO_REGCLASS without the need for an extra instruction. llvm-svn: 319726
* [CodeGen] Unify MBB reference format in both MIR and debug outputFrancis Visoiu Mistrih2017-12-041-12/+12
| | | | | | | | | | | | | | | | As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
* [X86][AVX512] Improve lowering of AVX512 test intrinsicsUriel Korach2017-11-061-2/+0
| | | | | | | | | | | | Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits and does not need the redundant shift left and shift right instructions afterwards. Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM and icmp(eq,and(X,Y), 0) goes folds into TESTNM This commit is a preparation for lowering the test and testn X86 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38732 llvm-svn: 317465
* [X86] SET0 to use XMM registers where possible PR26018 PR32862Dinar Temirbulatov2017-08-031-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D35965 llvm-svn: 309926
* [X86] Generate VZEROUPPER for Skylake-avx512.Amjad Aboud2017-03-031-0/+1
| | | | | | | | VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859
* [DAGCombiner] Teach DAG combine that inserting an extract_subvector result ↵Craig Topper2017-02-131-3/+3
| | | | | | into the same location of a an undef vector can just use the original input to the extract. llvm-svn: 294932
* [AVX-512] Fix lowering for mask register concatenation with undef in the ↵Craig Topper2017-01-291-0/+12
| | | | | | | | lower half. Previously this test case fired an assertion in getNode because we tried to create an insert_subvector with both input types the same size and the index pointing to half the vector width. llvm-svn: 293446
* [AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles ↵Craig Topper2017-01-031-3/+3
| | | | | | corresponding to 256-bit subvector inserts. llvm-svn: 290870
* [AVX512] Fix insertelement i1 lowering.Igor Breger2016-08-141-2/+1
| | | | | | | | | 1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 ) 2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE. Differential Revision: http://reviews.llvm.org/D23347 llvm-svn: 278623
* AVX512: Add extract_subvector patterns v8i1->v4i1 , v4i1->v2i1.Igor Breger2016-03-081-0/+23
| | | | | | Differential Revision: http://reviews.llvm.org/D17953 llvm-svn: 262929
* AVX512: Remove VSHRI kmask patterns from TD file. It is incorrect to use ↵Igor Breger2016-03-061-10/+22
| | | | | | | | kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector). Differential Revision: http://reviews.llvm.org/D17763 llvm-svn: 262797
* AVX512: Fix truncate v32i8 to v32i1 lowering implementation.Igor Breger2016-01-281-9/+9
| | | | | | | | Enable truncate 128/256bit packed byte/word with AVX512BW but without AVX512VL, use 512bit instructions. Differential Revision: http://reviews.llvm.org/D16531 llvm-svn: 259044
* AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE.Igor Breger2015-12-271-0/+12
| | | | | | | | | Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB. Implement VPMOVB/W/D/Q2M intrinsic. Differential Revision: http://reviews.llvm.org/D15675 llvm-svn: 256470
* AVX-512: Optimized INSERT_SUBVECTOR for i1 vector typesElena Demikhovsky2015-11-221-0/+123
ISERT_SUBVECTOR for i1 vectors may be done with shifts, when we insert into the lower part, or into the upper part, on into all-zero vector. CONCAT_VECTORS uses ISERT_SUBVECTOR. Differential Revision: http://reviews.llvm.org/D14815 llvm-svn: 253819
OpenPOWER on IntegriCloud