summaryrefslogtreecommitdiffstats
path: root/clang/lib/Headers
Commit message (Collapse)AuthorAgeFilesLines
...
* [CUDA] Added missing functions.Artem Belevich2018-02-221-0/+22
| | | | | | | | | Initial commit missed sincos(float), llabs() and few atomics that we used to pull in from device_functions.hpp, which we no longer include. Differential Revision: https://reviews.llvm.org/D43602 llvm-svn: 325814
* [CUDA] Added missing __threadfence_system() function for CUDA9.Artem Belevich2018-02-201-0/+1
| | | | llvm-svn: 325626
* [X86] Remove mask from 512 bit pmulhrsw/pmulhw/pmulhuw builtins.Craig Topper2018-02-201-46/+29
| | | | | | We now use a vselect node in IR around an unmasked builtin. This makes it consistent with the 128 and 256 bit versions. llvm-svn: 325560
* [DOXYGEN] There was a request in the review D41507 to change the notation ↵Ekaterina Romanova2018-02-162-65/+67
| | | | | | | | for hex numbers in doxygen documentation from <...>h to 0x<...>. Both of these notations were used in x86 intrinsics documentation. I promised to change them to 0x<...> for consistency. Differential Revision: https://reviews.llvm.org/D41888 llvm-svn: 325312
* [CUDA] Added partial support for CUDA-9.1Artem Belevich2018-01-304-26/+1843
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clang can use CUDA-9.1 now, though new APIs (are not implemented yet. The major change is that headers in CUDA-9.1 went through substantial changes that started in CUDA-9.0 which required substantial changes in the cuda compatibility headers provided by clang. There are two major issues: * CUDA SDK no longer provides declarations for libdevice functions. * A lot of device-side functions have become nvcc's builtins and CUDA headers no longer contain their implementations. This patch changes the way CUDA headers are handled if we compile with CUDA 9.x. Both 9.0 and 9.1 are affected. * Clang provides its own declarations of libdevice functions. * For CUDA-9.x clang now provides implementation of device-side 'standard library' functions using libdevice. This patch should not affect compilation with CUDA-8. There may be some observable differences for CUDA-9.0, though they are not expected to affect functionality. Tested: CUDA test-suite tests for all supported combinations of: CUDA: 7.0,7.5,8.0,9.0,9.1 GPU: sm_20, sm_35, sm_60, sm_70 Differential Revision: https://reviews.llvm.org/D42513 llvm-svn: 323713
* [NFC] fix trivial typos in commentsHiroshi Inoue2018-01-291-2/+2
| | | | | | "to to" -> "to" llvm-svn: 323627
* [X86] Add rdpid command line option and intrinsics.Craig Topper2018-01-201-0/+12
| | | | | | | | | | | | | | Summary: This patch adds -mrdpid/-mno-rdpid and the rdpid intrinsic. The corresponding LLVM commit has already been made. Reviewers: RKSimon, spatel, zvi, AndreiGrischenko Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D42272 llvm-svn: 323047
* [AArch64] Add ARMv8.2-A FP16 scalar intrinsicsAbderrazek Zaafrani2018-01-192-0/+17
| | | | | | https://reviews.llvm.org/D41792 llvm-svn: 323006
* [DOXYGEN] Fix doxygen and content issues in xmmintrin.hDouglas Yung2018-01-171-28/+67
| | | | | | | | | | | | | | | | | - Fix inaccurate instruction listings. - Fix small issues in _mm_getcsr and _mm_setcsr. - Fix description of NaN handling in comparison intrinsics. - Fix inaccurate description of _mm_movemask_pi8. - Fix inaccurate instruction mappings. - Fix typos. - Clarify wording on some descriptions. - Fix bit ranges in return value. - Fix typo in _mm_move_ms intrinsic instruction since it operates on singe-precision values, not double. - This patch was made by Craig Flores Differential Revision: https://reviews.llvm.org/D41523 llvm-svn: 322778
* [X86] Implement old kunpck intrinsics using vector ops on vXi1 instead of ↵Craig Topper2018-01-142-3/+5
| | | | | | | | | | | | | | | | | | | integer shift/and/or Summary: kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation. I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations. Reviewers: RKSimon, spatel, zvi, jina.nahias Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D42016 llvm-svn: 322461
* [OpenCL] Reorder the CLK_sRGBx/sRGBA defines, NFCSven van Haastregt2018-01-111-1/+1
| | | | | | | Swap them so that all channel order defines are ordered according to their values. llvm-svn: 322278
* [DOXYGEN] Fix doxygen and content issues in avxintrin.hDouglas Yung2018-01-081-201/+200
| | | | | | | | | | - Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed". - Fix a few typos and errors found during review. - Restore new line endings. This patch was made by Craig Flores llvm-svn: 322027
* [DOXYGEN] Fix doxygen and content issues in smmintrin.hDouglas Yung2018-01-021-3/+3
| | | | | | | | | | | - Fix formatting issue due to hyphenated terms at line breaks. - Fix typo This patch was made by Craig Flores Differential Revision: https://reviews.llvm.org/D41520 llvm-svn: 321671
* [DOXYGEN] Fix doxygen and content issues in pmmintrin.hDouglas Yung2018-01-021-3/+3
| | | | | | | | | | - Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed". This patch was made by Craig Flores Differential Revision: https://reviews.llvm.org/D41518 llvm-svn: 321670
* [DOXYGEN] Fix doxygen and content issues in emmintrin.hDouglas Yung2018-01-021-48/+63
| | | | | | | | | | | | | | | - Fixed innaccurate instruction mappings for various intrinsics. - Fixed description of NaN handling in comparison intrinsics. - Unify description of _mm_store_pd1 to match _mm_store1_pd. - Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed". - Fix typos. - Add missing italics command (\a) for params and fixed some parameter spellings. This patch was made by Craig Flores Differential Revision: https://reviews.llvm.org/D41516 llvm-svn: 321669
* [x86][icelake][vbmi2]Coby Tayree2017-12-274-0/+1150
| | | | | | | | | | | | | | | added vbmi2 feature recognition added intrinsics support for vbmi2 instructions _mm[128,256,512]_mask[z]_compress_epi[16,32] _mm[128,256,512]_mask_compressstoreu_epi[16,32] _mm[128,256,512]_mask[z]_expand_epi[16,32] _mm[128,256,512]_mask[z]_expandloadu_epi[16,32] _mm[128,256,512]_mask[z]_sh[l,r]di_epi[16,32,64] _mm[128,256,512]_mask_sh[l,r]dv_epi[16,32,64] matching a similar work on the backend (D40206) Differential Revision: https://reviews.llvm.org/D41557 llvm-svn: 321487
* [x86][icelake][vnni]Coby Tayree2017-12-274-0/+411
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | added vnni feature recognition added intrinsics support for VNNI instructions _mm256_mask_dpbusd_epi32 _mm256_maskz_dpbusd_epi32 _mm256_dpbusd_epi32 _mm256_mask_dpbusds_epi32 _mm256_maskz_dpbusds_epi32 _mm256_dpbusds_epi32 _mm256_mask_dpwssd_epi32 _mm256_maskz_dpwssd_epi32 _mm256_dpwssd_epi32 _mm256_mask_dpwssds_epi32 _mm256_maskz_dpwssds_epi32 _mm256_dpwssds_epi32 _mm128_mask_dpbusd_epi32 _mm128_maskz_dpbusd_epi32 _mm128_dpbusd_epi32 _mm128_mask_dpbusds_epi32 _mm128_maskz_dpbusds_epi32 _mm128_dpbusds_epi32 _mm128_mask_dpwssd_epi32 _mm128_maskz_dpwssd_epi32 _mm128_dpwssd_epi32 _mm128_mask_dpwssds_epi32 _mm128_maskz_dpwssds_epi32 _mm128_dpwssds_epi32 _mm512_mask_dpbusd_epi32 _mm512_maskz_dpbusd_epi32 _mm512_dpbusd_epi32 _mm512_mask_dpbusds_epi32 _mm512_maskz_dpbusds_epi32 _mm512_dpbusds_epi32 _mm512_mask_dpwssd_epi32 _mm512_maskz_dpwssd_epi32 _mm512_dpwssd_epi32 _mm512_mask_dpwssds_epi32 _mm512_maskz_dpwssds_epi32 _mm512_dpwssds_epi32 matching a similar work on the backend (D40208) Differential Revision: https://reviews.llvm.org/D41558 llvm-svn: 321484
* [x86][icelake][bitalg]Coby Tayree2017-12-274-0/+265
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | added bitalg feature recognition added intrinsics support for bitalg instructions _mm512_popcnt_epi16 _mm512_mask_popcnt_epi16 _mm512_maskz_popcnt_epi16 _mm512_popcnt_epi8 _mm512_mask_popcnt_epi8 _mm512_maskz_popcnt_epi8 _mm512_mask_bitshuffle_epi64_mask _mm512_bitshuffle_epi64_mask _mm256_popcnt_epi16 _mm256_mask_popcnt_epi16 _mm256_maskz_popcnt_epi16 _mm128_popcnt_epi16 _mm128_mask_popcnt_epi16 _mm128_maskz_popcnt_epi16 _mm256_popcnt_epi8 _mm256_mask_popcnt_epi8 _mm256_maskz_popcnt_epi8 _mm128_popcnt_epi8 _mm128_mask_popcnt_epi8 _mm128_maskz_popcnt_epi8 _mm256_mask_bitshuffle_epi32_mask _mm256_bitshuffle_epi32_mask _mm128_mask_bitshuffle_epi16_mask _mm128_bitshuffle_epi16_mask matching a similar work on the backend (D40222) Differential Revision: https://reviews.llvm.org/D41564 llvm-svn: 321483
* [x86][icelake][vpclmulqdq]Coby Tayree2017-12-273-0/+47
| | | | | | | | | | | added vpclmulqdq feature recognition added intrinsics support for vpclmulqdq instructions _mm256_clmulepi64_epi128 _mm512_clmulepi64_epi128 matching a similar work on the backend (D40101) Differential Revision: https://reviews.llvm.org/D41573 llvm-svn: 321480
* [x86][icelake][gfni]Coby Tayree2017-12-273-0/+207
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | added gfni feature recognition added intrinsics support for gfni instructions _mm_gf2p8affineinv_epi64_epi8 _mm_mask_gf2p8affineinv_epi64_epi8 _mm_maskz_gf2p8affineinv_epi64_epi8 _mm256_gf2p8affineinv_epi64_epi8 _mm256_mask_gf2p8affineinv_epi64_epi8 _mm256_maskz_gf2p8affineinv_epi64_epi8 _mm512_gf2p8affineinv_epi64_epi8 _mm512_mask_gf2p8affineinv_epi64_epi8 _mm512_maskz_gf2p8affineinv_epi64_epi8 _mm_gf2p8affine_epi64_epi8 _mm_mask_gf2p8affine_epi64_epi8 _mm_maskz_gf2p8affine_epi64_epi8 _mm256_gf2p8affine_epi64_epi8 _mm256_mask_gf2p8affine_epi64_epi8 _mm256_maskz_gf2p8affine_epi64_epi8 _mm512_gf2p8affine_epi64_epi8 _mm512_mask_gf2p8affine_epi64_epi8 _mm512_maskz_gf2p8affine_epi64_epi8 _mm_gf2p8mul_epi8 _mm_mask_gf2p8mul_epi8 _mm_maskz_gf2p8mul_epi8 _mm256_gf2p8mul_epi8 _mm256_mask_gf2p8mul_epi8 _mm256_maskz_gf2p8mul_epi8 _mm512_gf2p8mul_epi8 _mm512_mask_gf2p8mul_epi8 _mm512_maskz_gf2p8mul_epi8 matching a similar work on the backend (D40373) Differential Revision: https://reviews.llvm.org/D41582 llvm-svn: 321477
* [x86][icelake][vaes]Coby Tayree2017-12-273-0/+103
| | | | | | | | | | | | | | | added vaes feature recognition added intrinsics support for vaes instructions, matching a similar work on the backend (D40078) _mm256_aesenc_epi128 _mm512_aesenc_epi128 _mm256_aesenclast_epi128 _mm512_aesenclast_epi128 _mm256_aesdec_epi128 _mm512_aesdec_epi128 _mm256_aesdeclast_epi128 _mm512_aesdeclast_epi128 llvm-svn: 321474
* [CUDA] More fixes for __shfl_* intrinsics.Artem Belevich2017-12-211-28/+49
| | | | | | | | | * __shfl_{up,down}* uses unsigned int for the third parameter. * added [unsigned] long overloads for non-sync shuffles. Differential Revision: https://reviews.llvm.org/D41521 llvm-svn: 321326
* [X86] Allow _mm_prefetch (both the header implementation and the builtin) to ↵Craig Topper2017-12-211-4/+7
| | | | | | | | accept bit 2 which is supposed to indicate the prefetched addresses will be written to Add the appropriate _MM_HINT_ET0/ET1 defines to match gcc. llvm-svn: 321325
* [X86] Add more CPUID bits to cpuid.h to match gcc and support icelake features.Craig Topper2017-12-201-5/+14
| | | | llvm-svn: 321129
* [X86] Add the two files I forgot to commit in r320915.Craig Topper2017-12-161-0/+99
| | | | llvm-svn: 320916
* [X86] Add builtins and tests for 128 and 256 bit vpopcntdq.Craig Topper2017-12-162-0/+6
| | | | llvm-svn: 320915
* In stdbool.h, define bool, false, true only in gnu++98Stephan Bergmann2017-12-081-1/+4
| | | | | | | | | | GCC has meanwhile corrected that with the similar <https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=216679> "C++11 explicitly forbids macros for bool, true and false." Differential Revision: https://reviews.llvm.org/D40167 llvm-svn: 320135
* [NVPTX,CUDA] Added llvm.nvvm.fns intrinsic and matching __nvvm_fns builtin ↵Artem Belevich2017-12-061-0/+4
| | | | | | | | in clang. Differential Revision: https://reviews.llvm.org/D40872 llvm-svn: 319909
* [CUDA] Added overloads for '[unsigned] long' variants of shfl builtins.Artem Belevich2017-12-061-0/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D40871 llvm-svn: 319908
* [x86][AVX512] Lowering kunpack intrinsics to LLVM IRJina Nahias2017-12-052-5/+3
| | | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D39720), implements the lowering of X86 kunpack intrinsics to IR. Differential Revision: https://reviews.llvm.org/D39719 Change-Id: Id5d3cb394ad33b98be79a6783d1d15569e2b798d llvm-svn: 319777
* [clang] Use add_llvm_install_targetsShoaib Meenai2017-11-301-5/+3
| | | | | | | | | | Use this function to create the install targets rather than doing so manually, which gains us the `-stripped` install targets to perform stripped installations. Differential Revision: https://reviews.llvm.org/D40675 llvm-svn: 319489
* [CUDA] Tweak CUDA wrappers to make cuda-9 work with libc++Artem Belevich2017-11-301-0/+6
| | | | | | | | | CUDA-9 headers check for specific libc++ version and ifdef out some of the definitions we need if LIBCPP_VERSION >= 3800. Differential Revision: https://reviews.llvm.org/D40198 llvm-svn: 319485
* [OpenCL] Add extensions cl_intel_subgroups and cl_intel_subgroups_shortAlexey Sotkin2017-11-271-0/+307
| | | | | | | | | | | | Reviewers: yaxunl, Anastasia, bader Reviewed By: Anastasia, bader Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D39936 llvm-svn: 319011
* Control-Flow Enforcement Technology - Shadow Stack and Indirect Branch ↵Oren Ben Simhon2017-11-263-0/+98
| | | | | | | | | | | | | | | | | | Tracking support (Clang side) Shadow stack solution introduces a new stack for return addresses only. The stack has a Shadow Stack Pointer (SSP) that points to the last address to which we expect to return. If we return to a different address an exception is triggered. This patch includes shadow stack intrinsics as well as the corresponding CET header. It includes CET clang flags for shadow stack and Indirect Branch Tracking. For more information, please see the following: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf Differential Revision: https://reviews.llvm.org/D40224 Change-Id: I79ad0925a028bbc94c8ecad75f6daa2f214171f1 llvm-svn: 318995
* [X86] Use separate builtins for fma4 scalar intrinsics. Use negations to ↵Craig Topper2017-11-252-14/+14
| | | | | | | | | | remove some of the scalar fma3 builtins. fma4 instructions zero the upper bits of the xmm register. fma3 instructions leave the bits unmodified. This requires separate builtins for the different semantics. While we're cleaning up the scalar builtins this also removes the fma3 fmsub/fnmadd/fnmsub builtins by using negates in the header file. llvm-svn: 318985
* [CUDA] Remove implementations of nexttoward.Justin Lebar2017-11-172-23/+8
| | | | | | | | | | | | | | | | | | Summary: __builtin_nexttoward lowers to a libcall, e.g. nexttowardf(), that CUDA does not have. Rather than try to implement it, we simply remove these functions -- nvcc doesn't support them either, and nextafter, which does work, does essentially the same thing on GPUs, because GPUs don't have long double. Reviewers: tra Subscribers: cfe-commits, sanjoy Differential Revision: https://reviews.llvm.org/D40152 llvm-svn: 318494
* [X86] test/testn intrinsics lowering to IR. clang sideUriel Korach2017-11-134-162/+128
| | | | | | | | | Change Header files of the intrinsics for lowering test and testn intrinsics to IR code. Removed test and testn builtins from clang Differential Revision: https://reviews.llvm.org/D38737 llvm-svn: 318035
* [x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IRJina Nahias2017-11-132-105/+125
| | | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38672 Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047 llvm-svn: 318025
* [CUDA] Fix std::min on device side to return the min, not the max.Justin Lebar2017-11-111-1/+1
| | | | | | | | | | | | | | | | Summary: How embarrassing. This is tested in the test-suite -- fix to come there in a separate patch. Reviewers: tra Subscribers: sanjoy, cfe-commits Differential Revision: https://reviews.llvm.org/D39817 llvm-svn: 317961
* [X86] Reduce the number of FMA builtins needed by the frontend by adding ↵Craig Topper2017-11-102-32/+32
| | | | | | | | | | | | negates to operands of the fmadd and fmaddsub builtins. The backend should be able to combine the negates to create fmsub, fnmadd, and fnmsub. faddsub converting to fsubadd still needs work I think, but should be very doable. This matches what we already do for the masked builtins. This only covers the packed builtins. Scalar builtins will be done after FMA4 is fixed. llvm-svn: 317873
* [X86] Rename the VEX scalar fma builtins to end with a '3' to match gccCraig Topper2017-11-092-16/+16
| | | | | | | | I think we need to use different builtins for the FMA4 instructions since those instructions zero the upper bits and FMA3 instructions pass the bits through. So this moves the existing builtins to be the FMA3 versions. New versions will be added for FMA4. llvm-svn: 317766
* [X86] Replace the mask cmpeq/cmple/cmplt/cmpgt/cmpge/cmpneq intrinsics with ↵Craig Topper2017-11-064-1821/+692
| | | | | | | | macros that just pass the right comparison predicate value to the regular cmp intrinsic. Remove mask cmpeq/cmpgt builtins that are now unused. This shortens the intrinsic headers a little and allows us to get rid of the cmpeq and cmpgt handling from CGBuiltin.cpp. llvm-svn: 317506
* lowering broadcastmJina Nahias2017-11-062-7/+8
| | | | | Change-Id: I0661abea3e3742860e0a03ff9e4fcdc367eff7db llvm-svn: 317456
* [Headers] Fix typoed __ARM_DWARF_EH__ ifdefsMartin Storsjo2017-10-191-3/+3
| | | | | | These typos appeared in SVN r309226 and r309327. llvm-svn: 316149
* [X86] Add CLWB intrinsic. clang partCraig Topper2017-10-123-0/+57
| | | | | | | | | | | | Reviewers: RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D38781 llvm-svn: 315607
* [X86] Correct type for argument to clflushopt intrinsic.Craig Topper2017-10-111-1/+1
| | | | | | | | | | | | | | Summary: According to Intel docs this should take void const *. We had char*. The lack of const is the main issue. Reviewers: RKSimon, zvi, igorb Reviewed By: igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38782 llvm-svn: 315470
* [CUDA] Fix name of __activemask()Jonas Hahnfeld2017-10-021-1/+1
| | | | | | | | | The name has two underscores in the official CUDA documentation: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-vote-functions Differential Revision: https://reviews.llvm.org/D38468 llvm-svn: 314691
* [CUDA] Work around conflicting function definitions in CUDA-9 headers.Artem Belevich2017-09-271-0/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D38326 llvm-svn: 314334
* [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.Artem Belevich2017-09-261-3/+29
| | | | | | Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314223
* Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & ↵Justin Lebar2017-09-251-29/+3
| | | | | | | | | | | | | | | builtins.", rL314135. Causing assertion failures on macos: > Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"), > function getOperand, file > /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h, > line 835. http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/ llvm-svn: 314142
OpenPOWER on IntegriCloud