summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/avx512-cvt.ll
Commit message (Collapse)AuthorAgeFilesLines
...
* X86 Tests: Tidy up AVX512 conversion tests. NFC.Zvi Rackover2017-09-061-227/+227
| | | | | | Rename functions to a consistent format to make it easier to track coverage. llvm-svn: 312619
* X86 Tests: Adding missing AVX512 fptoui coverage tests. NFC.Zvi Rackover2017-09-051-0/+231
| | | | | | Some of the cases show missing pattern i intend to fix shortly. llvm-svn: 312560
* [AVX512] Use 256-bit extract instructions for extracting bits [255:128] from ↵Craig Topper2017-08-301-2/+2
| | | | | | | | | | a 512-bit register This enables the use of a smaller encoding by using a VEX instruction when possible. Differential Revision: https://reviews.llvm.org/D37092 llvm-svn: 312100
* [X86][Haswell] Updating HSW instruction scheduling informationGadi Haber2017-08-281-13/+41
| | | | | | | | | | | | | | | This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target. We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling. The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792. Information includes latency, number of micro-Ops and used ports by each HSW instruction. Please expect some performance fluctuations due to code alignment effects. Reviewers: RKSimon, zvi, aymanmus, craig.topper, m_zuckerman, igorb, dim, chandlerc, aaboud Differential Revision: https://reviews.llvm.org/D36663 llvm-svn: 311879
* [AVX512] Don't switch unmasked subvector insert/extract instructions when ↵Craig Topper2017-08-171-28/+13
| | | | | | | | | | | | | | AVX512DQI is enabled. There's no reason to switch instructions with and without DQI. It just creates extra isel patterns and test divergences. There is however value in enabling the masked version of the instructions with DQI. This required introducing some new multiclasses to enabling this splitting. Differential Revision: https://reviews.llvm.org/D36661 llvm-svn: 311091
* [X86] SET0 to use XMM registers where possible PR26018 PR32862Dinar Temirbulatov2017-08-031-19/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D35965 llvm-svn: 309926
* [AVX-512] Add unmasked subvector inserts and extract to the execution domain ↵Craig Topper2017-07-311-2/+2
| | | | | | tables. llvm-svn: 309632
* [X86] SET0 to use XMM registers where possible PR26018 PR32862Dinar Temirbulatov2017-07-271-5/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D35839 llvm-svn: 309298
* [X86][AVX] Regenerate tests with constant broadcast commentsSimon Pilgrim2017-07-151-1/+1
| | | | llvm-svn: 308110
* Reverting commit 306414 on behalf of @gadi.haberMichael Zuckerman2017-06-281-41/+13
| | | | llvm-svn: 306532
* Updated and extended the information about each instruction in HSW and SNB ↵Gadi Haber2017-06-271-13/+41
| | | | | | | | | | | | | | | | | | | to include the following data: •static latency •number of uOps from which the instructions consists •all ports used by the instruction Reviewers:  RKSimon zvi aymanmus m_zuckerman Differential Revision: https://reviews.llvm.org/D33897 llvm-svn: 306414
* [x86] avoid flipping sign bits for vector icmp by using known bitsSanjay Patel2017-06-071-4/+0
| | | | | | | | | | | | | | | | If we know that both operands of an unsigned integer vector comparison are non-negative, then it's safe to directly use a signed-compare-greater-than instruction (the only non-equality integer vector compare predicate provided by SSE/AVX). We're intentionally not changing the condition code to signed in order to preserve the existing transforms that use min/max/psubus below here. This should solve PR33276: https://bugs.llvm.org/show_bug.cgi?id=33276 Differential Revision: https://reviews.llvm.org/D33862 llvm-svn: 304909
* [x86] fix over-specific triple; NFCSanjay Patel2017-06-021-204/+204
| | | | | | | | There's nothing darwin-specific in these tests, and using that setting causes extra phantom diffs when the auto-generated check lines are regenerated today. llvm-svn: 304614
* [X86][AVX512] Make i1 illegal in the CodeGenGuy Blank2017-05-191-2/+2
| | | | | | | | | | This patch defines the i1 type as illegal in the X86 backend for AVX512. For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended. This should produce better scalar code for i1 types since GPRs will be used instead of mask registers. Differential Revision: https://reviews.llvm.org/D32273 llvm-svn: 303421
* [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registersCraig Topper2017-03-281-40/+112
| | | | | | | | | | | | | | | | We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register. This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8. I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa. Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition. This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI. Differential Revision: https://reviews.llvm.org/D30968 llvm-svn: 298928
* [X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instructionMichael Zuckerman2017-03-231-8/+4
| | | | | | | | | | | Up until now, vpmovm2 instruction described its destination operand size by the source operand size. This patch adds new pattern for the vpmovm2 instruction. The node describes new expansion of the destination (from {128|256} to 512). Differential Revision: https://reviews.llvm.org/D30654 llvm-svn: 298586
* [X86] Generate VZEROUPPER for Skylake-avx512.Amjad Aboud2017-03-031-85/+275
| | | | | | | | VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859
* [DAGCombine] Recognise any_extend_vector_inreg and truncation style shuffle ↵Simon Pilgrim2017-02-171-1/+0
| | | | | | | | | | | | | | masks During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing). This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types. The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712). Differential Revision: https://reviews.llvm.org/D29454 llvm-svn: 295451
* [AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 ↵Craig Topper2017-01-121-153/+42
| | | | | | with VLX, but no DQ or BW support. llvm-svn: 291747
* [AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 ↵Craig Topper2017-01-121-143/+60
| | | | | | when avx512vl is available, but not avx512dq. llvm-svn: 291746
* [AVX-512] Add more varied avx512 feature command lines to the avx512-cvt.ll ↵Craig Topper2017-01-121-590/+954
| | | | | | | | test to show some poor codegen examples. We're definitely doing bad things when avx512vl is enabled without avx512dq. It looks like avx512vl/dq without avx512bw may also have some issues. llvm-svn: 291744
* [X86] Fix PR30926 - Add patterns for (v)cvtsi2s{s,d} and (v)cvtsd2s{s,d}Elad Cohen2017-01-111-0/+22
| | | | | | | | | | | The code emiited by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes redundant (v)movss/(v)movsd instructions. This patch adds patterns for optimizing these sequences. Differential revision: https://reviews.llvm.org/D28455 llvm-svn: 291660
* [AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for ↵Craig Topper2017-01-091-9/+5
| | | | | | | | vselects of all ones and all zeros. Previously we emitted a VPTERNLOG and a separate masked move. llvm-svn: 291415
* This is a large patch for X86 AVX-512 of an optimization for reducing code ↵Gadi Haber2016-12-281-6/+6
| | | | | | | | | | | | size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663
* [X86][SSE] Add support for combining target shuffles to SHUFPS.Simon Pilgrim2016-12-181-2/+1
| | | | | | As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point. llvm-svn: 290064
* [X86][SSE] Add initial support for combining (V)PMOVZX with shuffles.Simon Pilgrim2016-11-281-3/+1
| | | | llvm-svn: 288049
* [X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP loweringSimon Pilgrim2016-11-241-2/+1
| | | | llvm-svn: 287877
* [X86][AVX512] Added some mask/maskz tests for sitofp/uitofp i32 to f64Simon Pilgrim2016-11-161-0/+68
| | | | llvm-svn: 287106
* [X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)Simon Pilgrim2016-11-151-17/+2
| | | | | | | | | | | | This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic. This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases. Fix for PR13248 Differential Revision: https://reviews.llvm.org/D26583 llvm-svn: 286979
* [X86] Cleanup 'x' and 'y' mnemonic suffixes for ↵Craig Topper2016-11-141-19/+9
| | | | | | | | | | | | | vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions. -Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions. -Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions. -Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax. -Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing. This should fix at least some of PR28850. llvm-svn: 286787
* [AVX-512] Add the scalar unsigned integer to fp conversion instructions to ↵Craig Topper2016-09-251-12/+12
| | | | | | hasUndefRegUpdate. llvm-svn: 282356
* [X86][SSE] Improve recognition of uitofp conversions that can be performed ↵Simon Pilgrim2016-09-181-13/+6
| | | | | | | | | | | | | | as sitofp With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations. This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware). While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions. Differential Revision: https://reviews.llvm.org/D24343 llvm-svn: 281852
* Avoid false dependencies of undef machine operandsMarina Yatsina2016-08-111-34/+27
| | | | | | | | | | | | | | | | | | | | This patch helps avoid false dependencies on undef registers by updating the machine instructions' undef operand to use a register that the instruction is truly dependent on, or use a register with clearance higher than Pref. Pseudo example: loop: xmm0 = ... xmm1 = vcvtsi2sdl eax, xmm0<undef> ... = inst xmm0 jmp loop In this example, selecting xmm0 as the undef register creates false dependency between loop iterations. This false dependency cannot be solved by inserting an xor before vcvtsi2sdl because xmm0 is alive at the point of the vcvtsi2sdl instruction. Selecting a different register instead of xmm0, especially a register that is not used in the loop, will eliminate this problem. Differential Revision: https://reviews.llvm.org/D22466 llvm-svn: 278321
* [x86, AVX] allow FP vector select folding to bitwise logic ops (PR28895)Sanjay Patel2016-08-101-4/+3
| | | | | | | | | | | | | | | This handles the case in: https://llvm.org/bugs/show_bug.cgi?id=28895 ...but we are not getting all of the possibilities yet. Eg, we use 'X86::FANDN' for scalar FP select combines. That enhancement is filed as: https://llvm.org/bugs/show_bug.cgi?id=28925 Differential Revision: https://reviews.llvm.org/D23337 llvm-svn: 278270
* [AVX-512] Add AVX-512 scalar CVT instructions to hasUndefRegUpdate.Craig Topper2016-08-061-0/+7
| | | | llvm-svn: 277933
* [AVX-512] Fix duplicate column in AVX512 execution dependency table that was ↵Craig Topper2016-08-011-7/+7
| | | | | | preventing VMOVDQU32/VMOVDQA32 from being recognized. Fix a bug in the code that stops execution dependency fix from turning operations on 32-bit integer element types into operations on 64-bit integer element types. llvm-svn: 277327
* AVX-512: Fixed [US]INT_TO_FP selection for i1 vectors.Elena Demikhovsky2016-07-251-0/+347
| | | | | | | | It failed with assertion before this patch. Differential Revision: https://reviews.llvm.org/D22735 llvm-svn: 276648
* [AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one ↵Craig Topper2016-07-111-2/+4
| | | | | | vectors. llvm-svn: 275045
* VirtRegMap: Replace some identity copies with KILL instructions.Matthias Braun2016-07-091-1/+14
| | | | | | | | | | | | | | An identity COPY like this: %AL = COPY %AL, %EAX<imp-def> has no semantic effect, but encodes liveness information: Further users of %EAX only depend on this instruction even though it does not define the full register. Replace the COPY with a KILL instruction in those cases to maintain this liveness information. (This reverts a small part of r238588 but this time adds a comment explaining why a KILL instruction is useful). llvm-svn: 274952
* AVX-512: fixed a bug in fp_to_uint pattern on KNLElena Demikhovsky2016-03-291-151/+499
| | | | | | | | | Fixed fp_to_uint instruction selection on KNL. One pattern was missing for <4 x double> to <4 x i32> Differential Revision: http://reviews.llvm.org/D18512 llvm-svn: 264701
* AVX512: Implemented encoding and intrinsics forIgor Breger2015-09-101-8/+10
| | | | | | | | | vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11802 llvm-svn: 247276
* Revert "AVX512: Implemented encoding and intrinsics for vextracti64x4 ↵Renato Golin2015-09-091-10/+8
| | | | | | | | ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding." This reverts commit r247149, as it was breaking numerous buildbots of varied architectures. llvm-svn: 247177
* AVX512: Implemented encoding and intrinsics forIgor Breger2015-09-091-8/+10
| | | | | | | | | vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11802 llvm-svn: 247149
* AVX-512: Floating point conversions for SKX - DAG Lowering.Elena Demikhovsky2015-07-191-4/+101
| | | | | | | | | | SKX supports conversion for all FP types. Integer types include doublewords and quardwords. I added "Legal" status for these nodes and a bunch of tests. I added "NoVLX" for AVX DAG selection to force VLX instructions selection when VLX is supported. Differential Revision: http://reviews.llvm.org/D11255 llvm-svn: 242637
* AVX-512: fixed UINT_TO_FP operation for 512-bit types.Elena Demikhovsky2015-05-101-0/+17
| | | | llvm-svn: 236955
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-271-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* AVX-512: SINT_TO_FP cost model and some bugfixesElena Demikhovsky2014-11-131-0/+53
| | | | | | | Checked some corner cases, for example translation of <8 x i1> to <8 x double> llvm-svn: 221883
* Add pattern for unsigned v4i32->v4f64 convert on AVX512.Cameron McInally2014-06-181-0/+8
| | | | llvm-svn: 211164
* AVX-512: Added fp_to_uint and uint_to_fp patterns.Elena Demikhovsky2014-04-081-0/+32
| | | | llvm-svn: 205754
* Fix broken CHECK lines.Benjamin Kramer2014-01-111-4/+4
| | | | llvm-svn: 199016
OpenPOWER on IntegriCloud