summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86IntrinsicsInfo.h
Commit message (Collapse)AuthorAgeFilesLines
* [IR] Split out target specific intrinsic enums into separate headersReid Kleckner2019-12-111-0/+1
| | | | | | | | | | | | | | | | | | | | This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320
* [X86] Convert tbm_bextri_u32/tbm_bextri_u64 intrinsics TargetConstant ↵Craig Topper2019-09-201-3/+3
| | | | | | | | | | | | | | | | | argument to a regular Constant during lowering. We reuse an ISD opcode here that can be reached from BMI that doesn't require it to be an immediate. Our isel patterns to match the TBM immediate form require a Constant and not a TargetConstant. We were accidentally getting the Constant due to a quirk of combineBEXTR calling SimplifyDemandedBits. The call to SimplifyDemandedBits ended up constant folding the TargetConstant to a regular Constant. But we should probably instead be asserting if SimplifyDemandedBits on a TargetConstant so we shouldn't rely on this behavior. llvm-svn: 372373
* Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper LakeLuo, Yuanke2019-05-061-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | Summary: 1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake; 2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision. VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data. VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data. VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision. For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Author: LiuTianle Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel Reviewed By: craig.topper Subscribers: kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60550 llvm-svn: 360017
* [X86] Restore the pavg intrinsics.Craig Topper2019-04-151-0/+6
| | | | | | | | | | | | | | | The pattern we replaced these with may be too hard to match as demonstrated by PR41496 and PR41316. This patch restores the intrinsics and then we can start focusing on the optimizing the intrinsics. I've mostly reverted the original patch that removed them. Though I modified the avx512 intrinsics to not have masking built in. Differential Revision: https://reviews.llvm.org/D60674 llvm-svn: 358427
* [X86] Remove X86 specific dag nodes for RDTSC/RDTSCP/RDPMC. NFCIAndrea Di Biagio2019-03-201-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | This patch removes the following dag node opcodes from namespace X86ISD: RDTSC_DAG, RDTSCP_DAG, RDPMC_DAG The logic that expands RDTSC/RDPMC/XGETBV intrinsics is basically the same. The only differences are: RDTSC/RDTSCP don't implicitly read ECX. RDTSCP also implicitly writes ECX. I moved the common expansion logic into a helper function with the goal to get rid of code repetition. That helper is now used for the expansion of RDTSC/RDTSCP/RDPMC/XGETBV intrinsics. No functional change intended. Differential Revision: https://reviews.llvm.org/D59547 llvm-svn: 356546
* [X86] Enable sse2_cvtsd2ss intrinsic to use an EVEX encoded instruction.Craig Topper2019-03-111-0/+1
| | | | llvm-svn: 355810
* [X86] Add SCALAR_SINT_TO_FP/SCALAR_UINT_TO_FP ISD opcodes without rounding mode.Craig Topper2019-03-111-6/+6
| | | | | | After this we no longer need to match FROUND_CURRENT or FROUND_NO_EXC during isel so I remove those. llvm-svn: 355807
* [X86] Split SCALEF(S) ISD opcodes into a version without rounding mode.Craig Topper2019-03-111-15/+14
| | | | llvm-svn: 355806
* [X86] Split RCP28/RSQRT/GETEXP/EXP2 ISD opcodes into SAE and current ↵Craig Topper2019-03-111-27/+27
| | | | | | direction nodes. Remove rounding mode operand. llvm-svn: 355805
* [X86] Rename _RND versions of RANGE/REDUCE/GETMANT/RDNSCALE ISD opcodes to ↵Craig Topper2019-03-111-40/+39
| | | | | | | | _SAE. Remove SAE operand. No need to explicitly store it and match it during isel. llvm-svn: 355804
* [X86] Rename X86ISD::CVTPH2PS_RND to CVTPH2PS_SAE. Remove SAE operand.Craig Topper2019-03-111-2/+2
| | | | llvm-svn: 355803
* [X86] Rename the CVTT*_RND ISD nodes to _SAE and remove the SAE operand. ↵Craig Topper2019-03-111-31/+33
| | | | | | | | Split VFPROUNDS_RND/VFPEXT(S)_RND into versions without rounding operand. For VFPEXT(S) we only need current rounding mode and an SAE version. Neither need extra operand. llvm-svn: 355802
* [X86] Rename X86ISD::CMPM_RND and X86ISD::FSETCCM_RND to _SAE instead of ↵Craig Topper2019-03-111-4/+4
| | | | | | | | _RND. Remove rounding operand. The operand could only be the SAE encoding so no need to include it. llvm-svn: 355801
* [X86] Split the VFIXUPIMM/VFIXUPIMMS nodes into a current rounding mode and ↵Craig Topper2019-03-111-12/+11
| | | | | | | | SAE ISD opcode. Remove matching of FROUND_CURRENT and FROUND_NO_EXC for these nodes from isel table. llvm-svn: 355800
* [X86] Begin removing matching of FROUND_CURRENT and FROUND_NO_EXC from isel ↵Craig Topper2019-03-111-33/+34
| | | | | | | | | | tables. Instead I plan to have dedicated nodes for FROUND_CURRENT and FROUND_NO_EXC. This patch starts with FADDS/FSUBS/FMULS/FDIVS/FMAXS/FMINS/FSQRTS. llvm-svn: 355799
* [X86] Add new variadic avx512 compress/expand intrinsics that use vXi1 types ↵Craig Topper2019-01-281-70/+2
| | | | | | | | for the mask argument. Remove and autoupgrade the old intrinsics llvm-svn: 352343
* [X86] Remove and autoupgrade vpconflict intrinsics that take a mask and ↵Craig Topper2019-01-261-12/+0
| | | | | | | | passthru argument. We have unmasked versions as of r352172 llvm-svn: 352270
* [X86] Remove GCCBuiltins from 512-bit cvt(u)qqtops, cvt(u)qqtopd, and ↵Craig Topper2019-01-261-16/+2
| | | | | | | | | | | | | | | | cvt(u)dqtops intrinsics. Add new variadic uitofp/sitofp with rounding mode intrinsics. Summary: See clang patch D56998 for a full description. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56999 llvm-svn: 352266
* [X86] Add non-masked versions of vpconflict intrinsics so we can use a ↵Craig Topper2019-01-251-0/+6
| | | | | | | | select in the header file in clang. I'll remove and autoupgrade the old intrinsics in a future commit. llvm-svn: 352172
* [X86] Use X86ISD::VFPROUND instead of ISD::FP_ROUND for 256 and 512 bit ↵Craig Topper2019-01-211-4/+4
| | | | | | | | | | | | | | | | | | | cvtpd2ps intrinsics. Summary: Use X86ISD::VFPROUND in the instruction isel patterns. Add new patterns for ISD::FP_ROUND to maintain support for fptrunc in IR. In the process I found a couple duplicate isel patterns which I also deleted in this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56991 llvm-svn: 351762
* [X86] Remove and autoupgrade vpmovqd/vpmovwb intrinsics using trunc+select.Craig Topper2019-01-211-8/+0
| | | | llvm-svn: 351729
* [X86] Auto upgrade VPCOM/VPCOMU intrinsics to generic integer comparisonsSimon Pilgrim2019-01-201-8/+0
| | | | | | | | This causes a couple of changes in the upgrade tests as signed/unsigned eq/ne are equivalent and we constant fold true/false codes, these changes are the same as what we already do for avx512 cmp/ucmp. Noticed while cleaning up vector integer comparison costs for PR40376. llvm-svn: 351697
* [X86] Add masked MCVTSI2P/MCVTUI2P ISD opcodes to model the cvtqq2ps ↵Craig Topper2019-01-191-9/+9
| | | | | | | | cvtuqq2ps nodes that produce less than 128-bits of results. These nodes zero the upper half of the result and can't be represented with vselect. llvm-svn: 351666
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [X86] Lower avx512f scatter intrinsics to X86MaskedScatterSDNode instead of ↵Craig Topper2019-01-181-48/+48
| | | | | | | | going directly to MachineSDNode. This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field. llvm-svn: 351583
* [X86] Lower avx2/avx512f gather intrinsics to X86MaskedGatherSDNode instead ↵Craig Topper2019-01-181-64/+64
| | | | | | | | | | of going directly to MachineSDNode.: This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field. Differential Revision: https://reviews.llvm.org/D56827 llvm-svn: 351570
* [X86] Add X86ISD::VSHLV and X86ISD::VSRLV nodes for psllv and psrlvCraig Topper2019-01-161-18/+18
| | | | | | | | | | | | | | Previously we used ISD::SHL and ISD::SRL to represent these in SelectionDAG. ISD::SHL/SRL interpret an out of range shift amount as undefined behavior and will constant fold to undef. While the intrinsics are defined to return 0 for out of range shift amounts. A previous patch added a special node for VPSRAV to produce all sign bits. This was previously believed safe because undefs frequently get turned into 0 either from the constant pool or a desire to not have a false register dependency. But undef is treated specially in some optimizations. For example, its ignored in detection of vector splats. So if the ISD::SHL/SRL can be constant folded and all of the elements with in bounds shift amounts are the same, we might fold it to single element broadcast from the constant pool. This would not put 0s in the elements with out of bounds shift amounts. We do have an existing InstCombine optimization to use shl/lshr when the shift amounts are all constant and in bounds. That should prevent some loss of constant folding from this change. Patch by zhutianyang and Craig Topper Differential Revision: https://reviews.llvm.org/D56695 llvm-svn: 351381
* [X86] Use X86ISD::BLENDV for blendv intrinsics. Replace vselect with blendv ↵Craig Topper2019-01-161-1/+7
| | | | | | | | | | | | just before isel table lookup. Remove vselect isel patterns. This cleans up the duplication we have with both intrinsic isel patterns and vselect isel patterns. This should also allow the intrinsics to get SimplifyDemandedBits support for the condition. I've switched the canonical pattern in isel to use the X86ISD::BLENDV node instead of VSELECT. Since it always seemed weird to move from BLENDV with its relaxed rules on condition bits to VSELECT which has strict rules about all bits of the condition element being the same. Its more correct to go from VSELECT to BLENDV. Differential Revision: https://reviews.llvm.org/D56771 llvm-svn: 351380
* [X86] Add avx512 scatter intrinsics that use a vXi1 mask instead of a scalar ↵Craig Topper2019-01-151-0/+25
| | | | | | | | integer. We're trying to have the vXi1 types in IR as much as possible. This prevents the need for bitcasts when the producer of the mask was already a vXi1 value like an icmp. The bitcasts can be subject to code motion and interfere with basic block at a time isel in bad ways. llvm-svn: 351275
* [X86] Add versions of the avx512 gather intrinsics that take the mask as a ↵Craig Topper2019-01-151-0/+25
| | | | | | | | | | | | | | vXi1 vector instead of a scalar In keeping with our general direction of having the vXi1 type present in IR, this patch converts the mask argument for avx512 gather to vXi1. This can avoid k-register to GPR to k-register transitions late in codegen. I left the existing intrinsics behind because they have many out of tree users such as ISPC. They generate their own code and don't go through the autoupgrade path which only works for bitcode and ll parsing. Ideally we will get them to migrate to target independent intrinsics, but it might be easier for them to migrate to these new intrinsics. I'll work on scatter and gatherpf/scatterpf next. Differential Revision: https://reviews.llvm.org/D56527 llvm-svn: 351234
* [X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select ↵Craig Topper2019-01-141-6/+3
| | | | | | | | in IR instead. Fixes PR40259 llvm-svn: 351035
* [X86] Remove unused intrinsic handlers. NFCCraig Topper2019-01-141-2/+1
| | | | llvm-svn: 351032
* [X86] Remove FPCLASS intrinsic handler. Use INTR_TYPE_2OP instead. NFCCraig Topper2019-01-141-7/+7
| | | | llvm-svn: 351031
* [X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a ↵Craig Topper2019-01-141-8/+4
| | | | | | | | | | vXi1 vector. The input mask can be represented with an AND in IR. Fixes PR40258 llvm-svn: 351028
* [X86] Add more ISD nodes to handle masked versions of ↵Craig Topper2019-01-131-9/+9
| | | | | | | | | | VCVT(T)PD2DQZ128/VCVT(T)PD2UDQZ128 which only produce 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877 llvm-svn: 351018
* [X86] Add X86ISD::VMFPROUND to handle the masked case of VCVTPD2PSZ128 which ↵Craig Topper2019-01-131-5/+5
| | | | | | | | | | only produces 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877. llvm-svn: 351017
* [X86] Add ISD node for masked version of CVTPS2PH.Craig Topper2019-01-121-7/+7
| | | | | | | | | | The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do. This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there. Fixes another instruction for PR34877. llvm-svn: 350994
* [X86] Add ISD nodes for masked truncate so we can properly represent when ↵Craig Topper2019-01-121-82/+83
| | | | | | | | | | | | the output has more elements than the input due to needing to be 128 bits. We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask. This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that. This fixes some of PR34877 llvm-svn: 350985
* Recommit r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. ↵Craig Topper2019-01-071-56/+0
| | | | | | | | Replace with target independent funnel shift intrinsics." The MSVC limit we hit on AutoUpgrade.cpp has been worked around for now. llvm-svn: 350567
* Revert r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. ↵Craig Topper2019-01-071-0/+56
| | | | | | | | Replace with target independent funnel shift intrinsics." The AutoUpgrade.cpp if/else cascade hit an MSVC limit again. llvm-svn: 350562
* [X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target ↵Craig Topper2019-01-071-56/+0
| | | | | | | | independent funnel shift intrinsics. Differential Revision: https://reviews.llvm.org/D56377 llvm-svn: 350554
* [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic ↵Simon Pilgrim2018-12-211-12/+0
| | | | | | | | | | | | intrinsics (llvm) This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics. Clang counterpart: https://reviews.llvm.org/D55890 Differential Revision: https://reviews.llvm.org/D55894 llvm-svn: 349892
* [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift ↵Simon Pilgrim2018-12-201-32/+0
| | | | | | | | | | | | intrinsics (llvm) This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics. Clang counterpart: https://reviews.llvm.org/D55937 Differential Revision: https://reviews.llvm.org/D55938 llvm-svn: 349795
* [X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBSNikita Popov2018-12-181-12/+12
| | | | | | | | | | | | Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision: https://reviews.llvm.org/D55833 llvm-svn: 349520
* [X86] Merge addcarryx/addcarry intrinsic into a single addcarry intrinsic.Craig Topper2018-12-101-6/+4
| | | | | | | | Both intrinsics do the exact same thing so we really only need one. Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency. llvm-svn: 348737
* [X86] If the carry input to an addcarry/subborrow intrinsic is known to be ↵Craig Topper2018-12-091-6/+6
| | | | | | | | | | 0, emit a flag setting ADD/SUB instead of ADC/SBB. Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother. This should go a long way towards fixing PR24545. llvm-svn: 348727
* [X86] X86DAGToDAGISel: handle BZHI selection too, not just BEXTR.Roman Lebedev2018-10-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: As discussed in D52304 / IRC, we now have pattern matching for 'bit extract' in two places - tablegen and `X86DAGToDAGISel`. There are 4 patterns. And we will have a problem with `x & (-1 >> (32 - y))` pattern. * If the mask is one-use, then it is always unfolded into `x << (32 - y) >> (32 - y)` first. Thus, the existing test coverage is already broken. * If it is not one-use, then it is not unfolded, and is matched as BZHI. * If it is not one-use, we will not match it as BEXTR. And if it is one-use, it will have been unfolded already. So we will either not handle that pattern for BEXTR, or not have test coverage for it. This is bad. As discussed with @craig.topper, let's unify this matching, and do everything in `X86DAGToDAGISel`. Then we will not have code duplication, and will have proper test coverage. This indeed does not affect any tests, and this is great. It means that for these two patterns, the `X86DAGToDAGISel` is identical to the tablegen version. Please review carefully, i'm not fully sure about that intrinsic change, and introduction of the new `X86ISD` opcode. Reviewers: craig.topper, RKSimon, spatel Reviewed By: craig.topper Subscribers: llvm-commits, craig.topper Differential Revision: https://reviews.llvm.org/D53164 llvm-svn: 344904
* [X86] Change the addcarry and subborrow intrinsics to return 2 results and ↵Craig Topper2018-09-071-8/+6
| | | | | | | | | | remove the pointer argument. We should represent the store directly in IR instead. This gives the middle end a chance to remove it if it can see a load from the same address. Differential Revision: https://reviews.llvm.org/D51769 llvm-svn: 341677
* [X86] Add intrinsics for KADD instructionsCraig Topper2018-08-281-0/+4
| | | | | | | | | | These are intrinsics for supporting kadd builtins in clang. These builtins are already in gcc to implement intrinsics from icc. Though they are missing from the Intel Intrinsics Guide. This instruction adds two mask registers together as if they were scalar rather than a vXi1. We might be able to get away with a bitcast to scalar and a normal add instruction, but that would require DAG combine smarts in the backend to recoqnize add+bitcast. For now I'd prefer to go with the easiest implementation so we can get these builtins in to clang with good codegen. Differential Revision: https://reviews.llvm.org/D51370 llvm-svn: 340869
* [X86] Remove masking from the 512-bit padds and psubs intrinsics. Use select ↵Craig Topper2018-08-161-4/+4
| | | | | | in IR instead. llvm-svn: 339842
OpenPOWER on IntegriCloud