summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add fast isel test cases for the clang output for 512-bit cvtps2pd ↵Craig Topper2018-05-141-0/+92
| | | | | | related intrinsics. llvm-svn: 332214
* [X86] Add patterns for combining movss+uint_to_fp into the intrinsic ↵Craig Topper2018-05-131-12/+6
| | | | | | | | instructions under AVX512. This matches what we do for sint_to_fp. llvm-svn: 332205
* [X86] Add fast-isel test cases for _mm_cvtu32_sd, _mm_cvtu64_sd, ↵Craig Topper2018-05-131-0/+98
| | | | | | _mm_cvtu32_ss, and _mm_cvtu64_ss. llvm-svn: 332204
* Correct dwarf unwind information in function epiloguePetar Jovanovic2018-04-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: * CFI instructions do not affect code generation (they are not counted as instructions when tail duplicating or tail merging) * Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Added CFIInstrInserter pass: * analyzes each basic block to determine cfa offset and register are valid at its entry and exit * verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors * inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D42848 llvm-svn: 330706
* [X86] Don't use Lower512IntUnary to split bitcasts with v32i16/v64i8 types ↵Craig Topper2018-04-091-0/+22
| | | | | | | | | | | | | | on targets without AVX512BW. LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type. Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round. Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases. This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that. llvm-svn: 329616
* [X86] Remove GCCBuiltin name from pmuldq/pmuludq intrinsics so clang can ↵Craig Topper2018-04-091-0/+144
| | | | | | | | | | custom lower to native IR. Update fast-isel intrinsic tests for clang's new codegen. In somes cases fast-isel fails to remove the and/shifts and uses blends or conditional moves. But once masking gets involved, fast-isel aborts on the mask portion and we DAG combine more thorougly. llvm-svn: 329604
* [X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats.Craig Topper2018-03-131-4/+0
| | | | | | | | | | This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs. This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best. We probably want to merge this with the vXi1 concat code since they are very similar. llvm-svn: 327454
* [X86] Rewrite printMasking code in X86InstComments to use TSFlags to ↵Craig Topper2018-03-101-8/+8
| | | | | | | | determine whether the instruction is masked. This should have been NFC, but it looks like we were missing PUNPCKLHQDQ/PUNPCKLQDQ instructions in there. llvm-svn: 327200
* [X86] Remove kortest intrinsics and replace with native IR.Craig Topper2018-02-081-0/+97
| | | | llvm-svn: 324646
* [X86] Remove X86ISD::SHUF128 from combineBitcastForMaskedOp. Use isel ↵Craig Topper2018-02-051-2/+2
| | | | | | | | | | patterns instead. We always created X86ISD::SHUF128 with a 64-bit element type so we can use isel patterns to detect a bitconvert to 32-bit to handle masking. The test changes are because we also match the bitconvert even if there is no masking. This leads to unnecessary isel pattern, but it requires more multiclass hackery in tablegen to get rid of it. llvm-svn: 324205
* [X86] Teach LowerBUILD_VECTOR to recognize pair-wise splats of 32-bit ↵Craig Topper2018-01-171-16/+6
| | | | | | | | | | | | | | elements and use a 64-bit broadcast If we are splatting pairs of 32-bit elements, we can use a 64-bit broadcast to get the job done. We could probably could probably do this with other sizes too, for example four 16-bit elements. Or we could broadcast pairs of 16-bit elements using a 32-bit element broadcast. But I've left that as a future improvement. I've also restricted this to AVX2 only because we can only broadcast loads under AVX. Differential Revision: https://reviews.llvm.org/D42086 llvm-svn: 322730
* [X86] Autoupgrade kunpck intrinsics using vector operations instead of ↵Craig Topper2018-01-141-18/+15
| | | | | | | | | | | | | | | | scalar operations Summary: This patch changes the kunpck intrinsic autoupgrade to use vXi1 shufflevector operations to perform vector extracts and concats. This more closely matches the definition of the kunpck instructions. Currently we rely on a DAG combine to turn the scalar shift/and/or code into a concat vectors operation. By doing it in the IR we get this for free. Reviewers: spatel, RKSimon, zvi, jina.nahias Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42018 llvm-svn: 322462
* [x86][AVX512] Lowering kunpack intrinsics to LLVM IRJina Nahias2017-12-051-0/+53
| | | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D39719), implements the lowering of X86 kunpack intrinsics to IR. Differential Revision: https://reviews.llvm.org/D39720 Change-Id: I4088d9428478f9457f6afddc90bd3d66b3daf0a1 llvm-svn: 319778
* [CodeGen] Unify MBB reference format in both MIR and debug outputFrancis Visoiu Mistrih2017-12-041-182/+182
| | | | | | | | | | | | | | | | As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
* [X86] test/testn intrinsics lowering to IR. llvm part.Uriel Korach2017-11-131-0/+158
| | | | | | | | | Remove builtins from llvm and add AutoUpgrade support. Also add fast-isel tests for the TEST and TESTN instructions. Differential Revision: https://reviews.llvm.org/D38736 llvm-svn: 318036
* [x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IRJina Nahias2017-11-131-0/+225
| | | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D38672), implements the lowering of X86 shuffle i/f intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38671 Change-Id: I1e7d359a74743e995ec356237a85214ce55d3661 llvm-svn: 318026
* [X86] Add a pass to convert instruction chains between domains.Guy Blank2017-10-221-44/+22
| | | | | | | | | | | | | | | | | The pass scans the function to find instruction chains that define registers in the same domain (closures). It then calculates the cost of converting the closure to another domain. If found profitable, the instructions are converted to instructions in the other domain and the register classes are changed accordingly. This commit adds the pass infrastructure and a simple conversion from the GPR domain to the Mask domain. Differential Revision: https://reviews.llvm.org/D37251 Change-Id: Ic2cf1d76598110401168326d411128ae2580a604 llvm-svn: 316288
* [x86] Lowering Mask Set1 intrinsics to LLVM IRJina Nahias2017-09-191-0/+104
| | | | | | | | This patch, together with a matching clang patch (https://reviews.llvm.org/D37668), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37669 llvm-svn: 313625
* [X86] Add VPERMPD/VPERMQ and VPERMPS/VPERMD to the execution domain fixing ↵Craig Topper2017-09-191-2/+2
| | | | | | table. llvm-svn: 313610
* [X86FixupBWInsts] More precise register liveness if no <imp-use> on MOVs.Nikolai Bozhenov2017-09-181-20/+20
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Subregister liveness tracking is not implemented for X86 backend, so sometimes the whole super register is said to be live, when only a subregister is really live. That might happen if the def and the use are located in different MBBs, see added fixup-bw-isnt.mir test. However, using knowledge of the specific instructions handled by the bw-fixup-pass we can get more precise liveness information which this change does. Reviewers: MatzeB, DavidKreitzer, ab, andrew.w.kaylor, craig.topper Reviewed By: craig.topper Subscribers: n.bozhenov, myatsina, llvm-commits, hiraditya Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D37559 llvm-svn: 313524
* [X86] Teach execution domain fixing to convert between FP and int unpack ↵Craig Topper2017-09-181-8/+8
| | | | | | instructions. llvm-svn: 313508
* [X86] Teach execution domain fixing to convert between VPERMILPS and VPSHUFD.Craig Topper2017-09-181-2/+2
| | | | llvm-svn: 313507
* [X86] Add a combine to turn (insert_subvector zero, (insert_subvector zero, ↵Craig Topper2017-09-031-4/+0
| | | | | | X, Idx), Idx) into an insert of X into the larger zero vector. llvm-svn: 312460
* [X86] Combine inserting a vector of zeros into a vector of zeros just the ↵Craig Topper2017-09-031-14/+4
| | | | | | larger vector. llvm-svn: 312458
* [X86] Add patterns to turn an insert into lower subvector of a zero vector ↵Craig Topper2017-09-031-38/+20
| | | | | | | | into a move instruction which will implicitly zero the upper elements. Ideally we'd be able to emit the SUBREG_TO_REG without the explicit register->register move, but we'd need to be sure the producing operation would select something that guaranteed the upper bits were already zeroed. llvm-svn: 312450
* [X86] Add VBLENDPS/VPBLENDD to the execution domain fixing tables.Craig Topper2017-09-031-16/+16
| | | | llvm-svn: 312449
* [X86] Canonicalize (concat_vectors X, zero) -> (insert_subvector zero, X, 0).Craig Topper2017-09-031-36/+36
| | | | | | In a future patch, I plan to teach isel to use a small vector move with implicit zeroing of the upper elements when it sees the (insert_subvector zero, X, 0) pattern. llvm-svn: 312448
* [AVX-512] Add unmasked subvector inserts and extract to the execution domain ↵Craig Topper2017-07-311-20/+20
| | | | | | tables. llvm-svn: 309632
* [X86] SET0 to use XMM registers where possible PR26018 PR32862Dinar Temirbulatov2017-07-271-6/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D35839 llvm-svn: 309298
* [X86][AVX] Added codegen tests for _mm256_zext* helper intrinsics (PR32839)Simon Pilgrim2017-04-291-0/+120
| | | | | | Not great codegen, especially as VEX moves support implicit zeroing of upper bits.... llvm-svn: 301748
* [X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution ↵Craig Topper2016-09-291-4/+4
| | | | | | domain fixing table. llvm-svn: 282687
* [X86][AVX512] Added BROADCAST intrinsics fast-isel generic IR testsSimon Pilgrim2016-07-051-0/+215
| | | | llvm-svn: 274537
* [X86][AVX512] Added VSHUFPD intrinsics fast-isel generic IR testsSimon Pilgrim2016-07-051-0/+52
| | | | llvm-svn: 274534
* [X86][AVX512] Added VPERMPD/VPERMQ intrinsics fast-isel generic IR testsSimon Pilgrim2016-07-041-0/+104
| | | | llvm-svn: 274503
* [X86][AVX512] Added VPERMILPD/VPERMILPS intrinsics fast-isel generic IR testsSimon Pilgrim2016-07-041-0/+163
| | | | | | Added PSHUFD tests as well llvm-svn: 274493
* [X86][AVX512] Add support for UNPCK masked shuffle commentsSimon Pilgrim2016-07-031-24/+24
| | | | llvm-svn: 274464
* [X86][AVX512] Add support for masked shuffle commentsSimon Pilgrim2016-07-031-12/+12
| | | | | | | | | | This patch adds support for including the avx512 mask register information in the mask/maskz versions of shuffle instruction comments. This initial version just adds support for MOVDDUP/MOVSHDUP/MOVSLDUP to reduce the mass of test regenerations, other shuffle instructions can be added in due course. Differential Revision: http://reviews.llvm.org/D21953 llvm-svn: 274459
* [X86][AVX512] Converted the MOVDDUP/MOVSLDUP/MOVSHDUP masked intrinsics to ↵Simon Pilgrim2016-07-021-0/+156
| | | | | | generic IR llvm-svn: 274443
* [X86][AVX512] Add fast-isel shuffle testsSimon Pilgrim2016-07-021-0/+444
Its not worth trying to write out tests for all the avx512f builtins yet, just adding tests for lowering of generic IR as we transition to it (shuffles mainly right now). llvm-svn: 274434
OpenPOWER on IntegriCloud