summaryrefslogtreecommitdiffstats
path: root/clang/lib/Headers/avx512vlintrin.h
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Make the pointer arguments to avx512 gather/scatter intrinsics 'void*' ↵Craig Topper2019-01-091-48/+48
| | | | | | | | to match gcc and Intel's documentation. The avx2 gather intrinsics are documented to use 'int', 'long long', 'float', or 'double' *. So I'm leaving those. This matches gcc. llvm-svn: 350696
* [X86] Add missing intrinsics to match icc.Craig Topper2018-10-201-19/+234
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds _mm_and_epi32, _mm_and_epi64 _mm_andnot_epi32, _mm_andnot_epi64 _mm_or_epi32, _mm_or_epi64 _mm_xor_epi32, _mm_xor_epi64 _mm256_and_epi32, _mm256_and_epi64 _mm256_andnot_epi32, _mm256_andnot_epi64 _mm256_or_epi32, _mm256_or_epi64 _mm256_xor_epi32, _mm256_xor_epi64 _mm_loadu_epi32, _mm_loadu_epi64 _mm_load_epi32, _mm_load_epi64 _mm256_loadu_epi32, _mm256_loadu_epi64 _mm256_load_epi32, _mm256_load_epi64 _mm512_loadu_epi32, _mm512_loadu_epi64 _mm512_load_epi32, _mm512_load_epi64 _mm_storeu_epi32, _mm_storeu_epi64 _mm_store_epi32, _mm_load_epi64 _mm256_storeu_epi32, _mm256_storeu_epi64 _mm256_store_epi32, _mm256_load_epi64 _mm512_storeu_epi32, _mm512_storeu_epi64 _mm512_store_epi32,V _mm512_load_epi64 llvm-svn: 344861
* [X86] Lowering integer truncation intrinsics to native IRMikhail Dvoretckii2018-07-101-24/+28
| | | | | | | | | | | | | This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to native IR in cases where the result's length is less than 128 bits. The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for 128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps) instructions are generated instead Differential Revision: https://reviews.llvm.org/D48712 llvm-svn: 336643
* [Builtins][Attributes][X86] Tag all X86 builtins with their required vector ↵Craig Topper2018-07-091-892/+894
| | | | | | | | | | | | | | | | | | | | | | | | width. Add a min_vector_width function attribute and tag all x86 instrinsics with it This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter. Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible. To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future. To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin. To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute. To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins. There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch. Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue. Differential Revision: https://reviews.llvm.org/D48617 llvm-svn: 336583
* [X86] Fix various type mismatches in intrinsic headers and intrinsic tests ↵Craig Topper2018-07-071-5/+5
| | | | | | | | that cause extra bitcasts to be emitted in the IR. Found via imprecise grepping of the -O0 IR. There could still be more bugs out there. llvm-svn: 336487
* [X86] Correct the width of mask arguments in intrinsic headers and tests.Craig Topper2018-06-301-4/+4
| | | | | | | | | | | | | All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there. Some of these were real bugs where we lost bits from the user input: _mm512_mask_broadcast_f32x8 _mm512_maskz_broadcast_f32x8 _mm512_mask_broadcast_i32x8 _mm512_maskz_broadcast_i32x8 _mm256_mask_cvtusepi16_storeu_epi8 llvm-svn: 336042
* [X86] Remove masking from the avx512 rotate builtins. Use a select builtin ↵Craig Topper2018-06-301-204/+132
| | | | | | instead. llvm-svn: 336036
* [X86] Fold masking into subvector extract builtins.Craig Topper2018-06-081-14/+24
| | | | | | | | I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features. The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported. llvm-svn: 334330
* [X86] Add builtins for vpermq/vpermpd instructions to enable target feature ↵Craig Topper2018-06-081-8/+2
| | | | | | checking. llvm-svn: 334311
* [X86] Add subvector insert and extract builtins to enable target feature ↵Craig Topper2018-06-081-32/+6
| | | | | | | | checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261
* [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable ↵Craig Topper2018-06-071-28/+8
| | | | | | target feature checking and immediate range checking. llvm-svn: 334244
* [X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking.Craig Topper2018-06-071-26/+8
| | | | | | | | We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature. I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that. llvm-svn: 334237
* [X86] Fix some places where macro arguments to intrinsics weren't cast to ↵Craig Topper2018-05-311-6/+6
| | | | | | | | | | _m512(i|d)/_m256(i|d/_m128(i|d) first. The majority of the cases were correct. This fixes the few that weren't. I also removed some superfluous parentheses in non-macros that confused by attempts at grepping for missing casts. llvm-svn: 333615
* [X86] Remove __extension__ from macro intrinsics when its not needed.Craig Topper2018-05-311-1165/+1165
| | | | | | | | | | I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added. Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load. It also removed a bogus error message on another test. llvm-svn: 333613
* [X86] Reduce the number of setzero intrinsics to just the set defined by the ↵Craig Topper2018-05-301-36/+30
| | | | | | | | | | Intel Intrinsics Guide. We had quite a few for different element sizes of integers sometimes with strange target features attached to them. We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types. llvm-svn: 333568
* [X86] Remove 'return' from a bunch of intrinsics that return void and use a ↵Craig Topper2018-05-301-1/+1
| | | | | | | | builtin that returns void. Found by running the intrinsic headers through -pedantic -ansi. llvm-svn: 333563
* [X86] Lowering FMA intrinsics to native IR (Clang part)Gabor Buella2018-05-301-296/+360
| | | | | | | | | | | | | | | | This patch replaces all packed (and scalar without rounding mode) fused intrinsics with fmadd/fmaddsub variations. Then fmadd/fmaddsub are lowered to native IR. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47444 llvm-svn: 333555
* [X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins ↵Craig Topper2018-05-291-249/+156
| | | | | | to a single version without masking. Use select builtins with appropriate operand instead. llvm-svn: 333387
* [X86] Remove mask argument from more builtins that are handled completely in ↵Craig Topper2018-05-231-150/+86
| | | | | | CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead. llvm-svn: 333061
* [X86] Use __builtin_convertvector to implement some of the packed integer to ↵Craig Topper2018-05-211-22/+14
| | | | | | | | | | | | packed float conversion intrinsics. I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions. We already do something similar for scalar conversions. Differential Revision: https://reviews.llvm.org/D46863 llvm-svn: 332882
* [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select ↵Craig Topper2018-05-201-26/+16
| | | | | | | | in IR instead. Someday maybe we'll use selects for all the builtins. llvm-svn: 332825
* [X86] Revert part of r332266: Use __builtin_convertvector to replace some of ↵Craig Topper2018-05-151-6/+5
| | | | | | | | the avx512 truncate builtins. The masking doesn't work right in the backend for the ones that produce byte or word elements without avx512bw. llvm-svn: 332322
* [X86] Use __builtin_convertvector to replace some of the avx512 truncate ↵Craig Topper2018-05-141-16/+14
| | | | | | | | | | builtins. As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend. Differential Revision: https://reviews.llvm.org/D46742 llvm-svn: 332266
* [X86] Remove '#ifdef __x86_64__' around mask_set1_epi64 intrinsics.Craig Topper2018-04-241-3/+0
| | | | | | The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either. llvm-svn: 330681
* [X86] Remove some masked cvt builtins that can be replaced with legacy ↵Craig Topper2018-02-241-77/+66
| | | | | | sse/avx buiiltins and a select. llvm-svn: 326039
* [X86] Remove __builtin_ia32_permvarsf256_mask and ↵Craig Topper2018-02-241-38/+19
| | | | | | __builtin_ia32_permvarsi256_mask and use the avx2 unmasked versions and a select instead. llvm-svn: 326022
* [X86] test/testn intrinsics lowering to IR. clang sideUriel Korach2017-11-131-42/+28
| | | | | | | | | Change Header files of the intrinsics for lowering test and testn intrinsics to IR code. Removed test and testn builtins from clang Differential Revision: https://reviews.llvm.org/D38737 llvm-svn: 318035
* [x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IRJina Nahias2017-11-131-57/+53
| | | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38672 Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047 llvm-svn: 318025
* [X86] Replace the mask cmpeq/cmple/cmplt/cmpgt/cmpge/cmpneq intrinsics with ↵Craig Topper2017-11-061-576/+199
| | | | | | | | macros that just pass the right comparison predicate value to the regular cmp intrinsic. Remove mask cmpeq/cmpgt builtins that are now unused. This shortens the intrinsic headers a little and allows us to get rid of the cmpeq and cmpgt handling from CGBuiltin.cpp. llvm-svn: 317506
* fixing a bug in mask[z]_set1 intrinsicJina Nahias2017-09-251-2/+2
| | | | | | | Differential Revision: https://reviews.llvm.org/D38231 Change-Id: I80bbff9cbe93e4be54d8a761ef9723edf3f57c57 llvm-svn: 314102
* Lowering Mask Set1 intrinsics to LLVM IRJina Nahias2017-09-191-28/+41
| | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37668 llvm-svn: 313624
* [AVX-512] Replace subvector broadcast builtins with shufflevectors and selects.Craig Topper2017-01-181-24/+21
| | | | | | Verified that the backend codegens this equally well. llvm-svn: 292329
* [AVX-512] Remove 128/256-bit masked vpermilvar builtins and replace with ↵Craig Topper2016-12-101-50/+32
| | | | | | select and the avx unmasked builtins. llvm-svn: 289338
* [X86] Replace valignd/q builtins with appropriate __builtin_shufflevector.Craig Topper2016-11-231-48/+50
| | | | llvm-svn: 287733
* [X86][AVX512] Replace lossless i32/u32 to f64 conversion intrinsics with ↵Simon Pilgrim2016-11-161-36/+27
| | | | | | | | | | | | | | generic IR Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen. This patch removes the clang builtins and their use in the headers - a future patch will deal with removing the llvm intrinsics. This is an extension patch to D20528 which dealt with the equivalent sse/avx cases. Differential Revision: https://reviews.llvm.org/D26686 llvm-svn: 287088
* [AVX-512] Replace masked dword and qword variable shift builtins with ↵Craig Topper2016-11-131-35/+19
| | | | | | | | unmasked builtins and a select. This is part of a set of changes to allow InstCombine in the backend to optimize variable shifts without having to know about masking. llvm-svn: 286757
* [AVX-512] Convert the rest of the masked shift by immediate and by single ↵Craig Topper2016-11-121-60/+58
| | | | | | | | element builtins over to the newly added unmasked builtins and a select. This should also fix PR30691 since the new builtins are handled like the legacy builtins in the backend. llvm-svn: 286714
* [AVX-512] Remove masked vector insert builtins and replace with native ↵Craig Topper2016-11-011-27/+32
| | | | | | | | shufflevectors and selects. Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit. llvm-svn: 285667
* [AVX-512] Use selectd instead of selectps for _mm256_mask_extracti32x4_epi32.Craig Topper2016-10-311-2/+2
| | | | llvm-svn: 285545
* [AVX-512] Remove masked vector extract builtins and replace with native ↵Craig Topper2016-10-311-24/+24
| | | | | | | | shufflevectors and selects. Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit. llvm-svn: 285540
* [AVX-512] Remove many of the masked 128/256-bit shift builtins and replace ↵Craig Topper2016-10-311-325/+300
| | | | | | them with unmasked builtins and selects. llvm-svn: 285539
* [AVX-512] Remove masked 128/256-bit sqrt builtins and replace them with ↵Craig Topper2016-10-291-36/+32
| | | | | | unmasked builtins and a select. llvm-svn: 285504
* [AVX-512] Remove masked 128/256-bit pmuludq/pmuldq builtins and replace them ↵Craig Topper2016-10-291-44/+32
| | | | | | with unmasked builtins and a select. llvm-svn: 285503
* [AVX-512] Remove masked 128/256-bit floating point max/min builtins. Use ↵Craig Topper2016-10-291-90/+64
| | | | | | unmasked builtins with select instead. llvm-svn: 285502
* [AVX-512] Replace masked 128/256-bit byte, word, and dword min/max builtins ↵Craig Topper2016-10-231-88/+64
| | | | | | with selects and the older unmasked builtins. llvm-svn: 284954
* [AVX-512] Replace masked 128/256-bit vpmovzx/vpmovsx builtins with native IR.Craig Topper2016-10-221-176/+156
| | | | llvm-svn: 284927
* [AVX-512] Remove builtins for 128/256-bit pabsb/pabsw. We can use a select ↵Craig Topper2016-10-221-18/+16
| | | | | | and the older non-masked versions instead. llvm-svn: 284924
* [AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div ↵Craig Topper2016-09-041-179/+128
| | | | | | | | builtins and replace with native operations. We can't do the 512-bit ones because they take a rounding mode argument that we can't represent. llvm-svn: 280635
* [AVX-512] Remove masked integer mullo builtins and replace with native IR.Craig Topper2016-09-031-22/+16
| | | | llvm-svn: 280597
* [AVX-512] Remove masked integer add/sub builtins and replace with native IR.Craig Topper2016-09-031-96/+64
| | | | llvm-svn: 280596
OpenPOWER on IntegriCloud