summaryrefslogtreecommitdiffstats
path: root/clang/lib/Headers
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add ktest intrinsics to match gcc and icc.Craig Topper2018-08-312-0/+72
| | | | | | | | | | | | These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc. Includes these intrinsics: _ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8 _ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8 _ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8 _ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8 llvm-svn: 341265
* [X86] Add k-mask conversion and load/store instrinsics to match gcc and icc.Craig Topper2018-08-313-0/+80
| | | | | | | | | | | | This adds: _cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64 _cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64 _load_mask8, _load_mask16, _load_mask32, _load_mask64 _store_mask8, _store_mask16, _store_mask32, _store_mask64 These are currently missing from the Intel Intrinsics Guide webpage. llvm-svn: 341251
* [X86] Add kshift intrinsics to match gcc and icc.Craig Topper2018-08-313-0/+24
| | | | | | | | | | | | | | This adds the following intrinsics: _kshiftli_mask8 _kshiftli_mask16 _kshiftli_mask32 _kshiftli_mask64 _kshiftri_mask8 _kshiftri_mask16 _kshiftri_mask32 _kshiftri_mask64 llvm-svn: 341234
* [X86] Add kadd intrinsics to match gcc and icc.Craig Topper2018-08-282-0/+24
| | | | | | | | | | | | This adds the following intrinsics: _kadd_mask64 _kadd_mask32 _kadd_mask16 _kadd_mask8 These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc. llvm-svn: 340879
* [X86] Add kortest intrinsics for 8, 32, and 64 bit masks. Add new intrinsic ↵Craig Topper2018-08-283-0/+72
| | | | | | | | names for 16 bit masks. This matches gcc and icc despite not being documented in the Intel Intrinsics Guide. llvm-svn: 340798
* [X86] Add intrinsics for kand/kandn/knot/kor/kxnor/kxor with 8, 32, and ↵Craig Topper2018-08-273-0/+117
| | | | | | | | | | 64-bit mask registers. This also adds a second intrinsic name for the 16-bit mask versions. These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed. llvm-svn: 340719
* [X86] Remove min_vector_width 512 from some intrinsics that operate only on ↵Craig Topper2018-08-271-0/+2
| | | | | | k-registers. llvm-svn: 340718
* [X86] Rename __DEFAULT_FN_ATTRS to a__DEFAULT_FN_ATTRS512 in ↵Craig Topper2018-08-272-297/+295
| | | | | | | | avx512dqintrin.h and avx512bwintrin.h. This is preparation for adding removing min_vector_width 512 from some intrinsics. llvm-svn: 340717
* [X86] Undef __DEFAULT_FN_ATTRS in avx512fintrin.h.Craig Topper2018-08-271-0/+1
| | | | | | Fixes test failure after r340713 llvm-svn: 340714
* [X86] Don't set min_vector_width to 512 on intrinsics that only operate on k ↵Craig Topper2018-08-271-12/+13
| | | | | | registers. llvm-svn: 340713
* Make __shiftleft128 / __shiftright128 real compiler built-ins.Nico Weber2018-08-171-14/+0
| | | | | | | | | | | | r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h. Microsoft's STL plans on using these functions, and they're using intrin0.h which just has declarations of built-ins to not pull in the huge intrin.h header in the standard library headers. That requires that these functions are real built-ins. https://reviews.llvm.org/D50907 llvm-svn: 340048
* [X86] Remove masking from the 512-bit paddus/psubus builtins. Use a select ↵Craig Topper2018-08-161-56/+32
| | | | | | builtin instead. llvm-svn: 339845
* [X86] Remove masking from the 512-bit padds and psubs builtins. Use select ↵Craig Topper2018-08-161-56/+32
| | | | | | builtin instead. llvm-svn: 339843
* [Headers] Define *_HAS_SUBNORM for FLT, DBL, LDBLPirama Arumuga Nainar2018-08-081-0/+6
| | | | | | | | | | | | | | Summary: These macros are defined in the C11 standard and can be defined based on the __*_HAS_DENORM__ default macros. Reviewers: bruno, rsmith, doug.gregor Subscribers: llvm-commits, enh, srhines Differential Revision: https://reviews.llvm.org/D37302 llvm-svn: 339284
* [Headers] Expand _Unwind_Exception for SEH on MinGW/x86_64Martin Storsjo2018-08-071-0/+4
| | | | | | | | This matches how GCC defines this struct. Differential Revision: https://reviews.llvm.org/D50380 llvm-svn: 339170
* [clang] Fix broken include_next in float.hLouis Dionne2018-08-061-3/+3
| | | | | | | | | | | | | | | Summary: The code defines __FLOAT_H and then includes the next <float.h>, which is guarded on __FLOAT_H so it gets skipped entirely. This commit uses the header guard __CLANG_FLOAT_H, like other headers (such as limits.h) do. Reviewers: jfb Subscribers: dexonsmith, cfe-commits Differential Revision: https://reviews.llvm.org/D50276 llvm-svn: 339016
* Remove trailing spaceFangrui Song2018-07-305-10/+10
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338291
* [ms] Add __shiftleft128 / __shiftright128 intrinsicsNico Weber2018-07-201-0/+14
| | | | | | | | | | | | Carefully match the pattern matched by ISel so that this produces shld / shrd (unless Subtarget->isSHLDSlow() is true). Thanks to Craig Topper for providing the LLVM IR pattern that gets successfully matched. Fixes PR37755. llvm-svn: 337619
* [CUDA] Provide integer SIMD functions for CUDA-9.2Artem Belevich2018-07-202-1/+429
| | | | | | | | | | | | | | | | | | | | | CUDA-9.2 made all integer SIMD functions into compiler builtins, so clang no longer has access to the implementation of these functions in either headers of libdevice and has to provide its own implementation. This is mostly a 1:1 mapping to a corresponding PTX instructions with an exception of vhadd2/vhadd4 that don't have an equivalent instruction and had to be implemented with a bit hack. Performance of this implementation will be suboptimal for SM_50 and newer GPUs where PTXAS generates noticeably worse code for the SIMD instructions compared to the code it generates for the inline assembly generated by nvcc (or used to come with CUDA headers). Differential Revision: https://reviews.llvm.org/D49274 llvm-svn: 337587
* [COFF] Add more missing MSVC ARM64 intrinsicsMandeep Singh Grang2018-07-171-2/+2
| | | | | | | | | | | | | | | | | | | Summary: Added the following intrinsics: _BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64 _InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64, _InterlockedExchangeAdd64, _InterlockedExchangeSub64, _InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64. Reviewers: compnerd, mstorsjo, rnk, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49445 llvm-svn: 337327
* Remove unnecessary trailing ; in macro intrinsic definition.Eric Christopher2018-07-171-1/+1
| | | | llvm-svn: 337321
* [X86] Lowering integer truncation intrinsics to native IRMikhail Dvoretckii2018-07-102-28/+32
| | | | | | | | | | | | | This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to native IR in cases where the result's length is less than 128 bits. The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for 128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps) instructions are generated instead Differential Revision: https://reviews.llvm.org/D48712 llvm-svn: 336643
* [X86] Use masked the masked scalar fma builtins to implement the default ↵Craig Topper2018-07-101-120/+120
| | | | | | | | | | rounding version of the fma intrinsics. The rounding mode is checked in CGBuiltin.cpp to generate the correct intrinsic call. Making this switch switchs the masking to use the i8 bitcast to <8 x i1> and extract i1 version of the IR for the mask. Previously we ended up with a scalar 'and' plus an icmp. llvm-svn: 336637
* [X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that ↵Craig Topper2018-07-103-58/+30
| | | | | | | | | | | | is suitable for use in scalar mask intrinsics. This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics. This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling. I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle. llvm-svn: 336622
* [Builtins][Attributes][X86] Tag all X86 builtins with their required vector ↵Craig Topper2018-07-0938-2624/+2665
| | | | | | | | | | | | | | | | | | | | | | | | width. Add a min_vector_width function attribute and tag all x86 instrinsics with it This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter. Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible. To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future. To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin. To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute. To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins. There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch. Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue. Differential Revision: https://reviews.llvm.org/D48617 llvm-svn: 336583
* [X86] Remove some unnecessarily escaped new lines from avx512fintrin.hCraig Topper2018-07-071-10/+10
| | | | llvm-svn: 336499
* [X86] Fix a few intrinsics that were ignoring their rounding mode argument ↵Craig Topper2018-07-071-5/+5
| | | | | | | | | | and hardcoded _MM_FROUND_CUR_DIRECTION internally. I believe these have been broken since their introduction into clang. I've enhanced the tests for these intrinsics to using a real rounding mode and checking all the intrinsic arguments instead of just the name. llvm-svn: 336498
* [X86] Change _mm512_shuffle_pd and _mm512_shuffle_ps to use target specific ↵Craig Topper2018-07-071-28/+4
| | | | | | | | shuffle builtins instead of generic __builtin_shufflevector. I added the builtins for 128, 256, and 512 bits recently but looks like I failed to convert to using the 512 bit one. llvm-svn: 336488
* [X86] Fix various type mismatches in intrinsic headers and intrinsic tests ↵Craig Topper2018-07-073-48/+48
| | | | | | | | that cause extra bitcasts to be emitted in the IR. Found via imprecise grepping of the -O0 IR. There could still be more bugs out there. llvm-svn: 336487
* [X86] Add missing scalar fma intrinsics with rounding, but no mask.Craig Topper2018-07-061-0/+48
| | | | | | | | We had the mask versions of the rounding intrinsics, but not one without masking. Also change the rounding tests to not use the CUR_DIRECTION rounding mode. llvm-svn: 336470
* [X86] Correct the width of mask arguments in intrinsic headers and tests.Craig Topper2018-06-303-11/+11
| | | | | | | | | | | | | All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there. Some of these were real bugs where we lost bits from the user input: _mm512_mask_broadcast_f32x8 _mm512_maskz_broadcast_f32x8 _mm512_mask_broadcast_i32x8 _mm512_maskz_broadcast_i32x8 _mm256_mask_cvtusepi16_storeu_epi8 llvm-svn: 336042
* [X86] Remove masking from the avx512 rotate builtins. Use a select builtin ↵Craig Topper2018-06-302-294/+189
| | | | | | instead. llvm-svn: 336036
* [CUDA] Make __host__/__device__ min/max overloads constexpr in C++14.Justin Lebar2018-06-291-4/+13
| | | | | | | | | | | | Summary: Tests in a separate change to the test-suite. Reviewers: rsmith, tra Subscribers: lahwaacz, sanjoy, cfe-commits Differential Revision: https://reviews.llvm.org/D48151 llvm-svn: 336026
* [CUDA] Make min/max shims host+device.Justin Lebar2018-06-291-4/+4
| | | | | | | | | | | | | | | | Summary: Fixes PR37753: min/max can't be called from __host__ __device__ functions in C++14 mode. Testcase in a separate test-suite commit. Reviewers: rsmith Subscribers: sanjoy, lahwaacz, cfe-commits Differential Revision: https://reviews.llvm.org/D48036 llvm-svn: 336025
* [X86] Remove masking from the avx512 packed sqrt builtins. Use select ↵Craig Topper2018-06-291-49/+36
| | | | | | builtins instead. llvm-svn: 335945
* [X86] Correct the inline assembly implementations of __movsb/w/d/q and ↵Craig Topper2018-06-211-7/+14
| | | | | | | | | | | | __stosw/d/q to mark registers/memory as modified The inline assembly for these didn't mark that edi, esi, ecx are modified by movs/stos instruction. It also didn't mark that memory is modified. This issue was reported to llvm-dev last year http://lists.llvm.org/pipermail/cfe-dev/2017-November/055863.html but no bug was ever filed. Differential Revision: https://reviews.llvm.org/D48448 llvm-svn: 335270
* [Intrinsics] Add/move some builtin declarations in intrin.h to get ↵Craig Topper2018-06-211-4/+7
| | | | | | | | | | ms-intrinsics.c to not issue warnings ud2 and int2c were missing declarations entirely. And the bitscans were only under x86_64, but they seem to be in BuiltinsARM.def as well and are tested by ms_intrinsics.c Differential Revision: https://reviews.llvm.org/D48187 llvm-svn: 335259
* [X86] Rewrite the add/mul/or/and reduction intrinsics to make better use of ↵Craig Topper2018-06-211-166/+100
| | | | | | | | | | | | other intrinsics and remove undef shuffle indices. Similar to what was done to max/min recently. These already reduced the vector width to 256 and 128 bit as we go unlike the original max/min code. Differential Revision: https://reviews.llvm.org/D48346 llvm-svn: 335253
* [X86] Remove masking from the 512-bit floating point max/min builtins. Use ↵Craig Topper2018-06-211-128/+76
| | | | | | select in IR instead. llvm-svn: 335200
* [X86] Undefine _mm512_mask_reduce_operator macro in avx512fintrin.h before ↵Craig Topper2018-06-201-0/+1
| | | | | | redefining it. llvm-svn: 335086
* Recommit r335070 "[X86] Rewrite the max and min reduction intrinsics to make ↵Craig Topper2018-06-191-225/+118
| | | | | | | | better use of other functions and to reduce width to 256 and 128 bits were possible."" Test has been updated to reflect the IRGen. llvm-svn: 335075
* Revert r335070 "[X86] Rewrite the max and min reduction intrinsics to make ↵Craig Topper2018-06-191-118/+225
| | | | | | | | better use of other functions and to reduce width to 256 and 128 bits were possible." The test changes are failing the buildbot and its going to take me some time to fix it. llvm-svn: 335072
* [X86] Rewrite the max and min reduction intrinsics to make better use of ↵Craig Topper2018-06-191-225/+118
| | | | | | | | | | | | | | other functions and to reduce width to 256 and 128 bits were possible. We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX. For v16i32 and floating point we have legacy 128/256 bit instructions we can use. I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization. Differential Revision: https://reviews.llvm.org/D47401 llvm-svn: 335070
* [X86] Rename __builtin_ia32_pslldqi128 to ↵Craig Topper2018-06-143-10/+10
| | | | | | | | | | | | __builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files. The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8. This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic. Fixes the remaining issue from PR37795. llvm-svn: 334773
* [X86] Add inline assembly versions of ↵Craig Topper2018-06-142-10/+84
| | | | | | | | | | _InterlockedExchange_HLEAcquire/Release and _InterlockedCompareExchange_HLEAcquire/Release for MSVC compatibility. Clang/LLVM doesn't have a way to pass an HLE hint through to the X86 backend to emit HLE prefixed instructions. So this is a good short term fix. Differential Revision: https://reviews.llvm.org/D47672 llvm-svn: 334751
* [X86] Lowering Mask Scalar intrinsics to native IR (Clang part)Tomasz Krupa2018-06-141-60/+36
| | | | | | | | | | | | | | | Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls to native IR. Reviewers: craig.topper, RKSimon, spatel, sroland Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47979 llvm-svn: 334741
* [X86] Remove masking from avx512vbmi2 concat and shift by immediate ↵Craig Topper2018-06-132-290/+164
| | | | | | builtins. Use select builtins instead. llvm-svn: 334577
* [X86] Remove masking from dbpsadbw builtins, use select builtin instead.Craig Topper2018-06-112-36/+24
| | | | llvm-svn: 334385
* [X86] Remove masking from the 512-bit packed floating point add/sub/mul/div ↵Craig Topper2018-06-101-102/+70
| | | | | | builtins. Use select in IR instead. llvm-svn: 334359
* [X86] Add back some masked vector truncate builtins. Custom IRgen a a few ↵Craig Topper2018-06-082-17/+26
| | | | | | | | | | others. I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl. By using special buitlins we can emit a select without using the 128/256-bit select builtins. llvm-svn: 334331
OpenPOWER on IntegriCloud