summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGen/avx512vl-builtins.c
Commit message (Collapse)AuthorAgeFilesLines
* [IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperatorCameron McInally2019-10-141-72/+72
| | | | | | | | Reapply r374240 with fix for Ocaml test, namely Bindings/OCaml/core.ml. Differential Revision: https://reviews.llvm.org/D61675 llvm-svn: 374782
* Revert "[IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator"Dmitri Gribenko2019-10-101-72/+72
| | | | | | | This reverts commit r374240. It broke OCaml tests: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19014 llvm-svn: 374354
* [IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperatorCameron McInally2019-10-091-72/+72
| | | | | | | | Also update Clang to call Builder.CreateFNeg(...) for UnaryMinus. Differential Revision: https://reviews.llvm.org/D61675 llvm-svn: 374240
* [NewPM] Run avx*-builtins.c tests under the new pass manager onlyLeonard Chan2019-07-261-2/+41
| | | | | | | | | | | | | | | | | | | | This patch changes the following tests to run under the new pass manager only: ``` Clang :: CodeGen/avx512-reduceMinMaxIntrin.c (1 of 4) Clang :: CodeGen/avx512vl-builtins.c (2 of 4) Clang :: CodeGen/avx512vlbw-builtins.c (3 of 4) Clang :: CodeGen/avx512f-builtins.c (4 of 4) ``` The new PM added extra bitcasts that weren't checked before. For reduceMinMaxIntrin.c, the issue was mostly the alloca's being in a different order. Other changes involved extra bitcasts, and differently ordered loads and stores, but the logic should still be the same. Differential revision: https://reviews.llvm.org/D65110 llvm-svn: 367157
* [X86] Don't use _MM_FROUND_CUR_DIRECTION in the intrinsics tests.Craig Topper2019-06-221-4/+4
| | | | | | | | | | _MM_FROUND_CUR_DIRECTION is the behavior of the intrinsics that don't take a rounding mode argument. So a better test is using _MM_FROUND_NO_EXC with the SAE only intrinsics and an explicit rounding mode with the intrinsics that support embedded rounding mode. llvm-svn: 364127
* [X86] Make _mm_mask_cvtps_ph, _mm_maskz_cvtps_ph, _mm256_mask_cvtps_ph, and ↵Craig Topper2019-06-201-4/+4
| | | | | | | | | | | | | | _mm256_maskz_cvtps_ph aliases for their corresponding cvt_roundps_ph intrinsic. These intrinsics should always take an immediate for the rounding mode. The base instruction comes from before EVEX embdedded rounding. The user should always provide the immediate rather than us assuming CUR_DIRECTION. Make the 512-bit versions also explicit aliases instead of copy pasting the code. llvm-svn: 363961
* [X86] Add new variadic avx512 compress/expand intrinsics that use vXi1 types ↵Craig Topper2019-01-281-32/+32
| | | | | | | | for the mask argument. Custom lower the builtins to these intrinsics. This enables the middle end to optimize out bitcasts for the masks. llvm-svn: 352344
* [X86] Add custom emission for the avx512 scatter builtins to convert from ↵Craig Topper2019-01-171-32/+32
| | | | | | scalar integer to vXi1 for the mask arguments to the intrinsics. llvm-svn: 351408
* [X86] Add versions of the avx512 gather intrinsics that take the mask as a ↵Craig Topper2019-01-161-16/+16
| | | | | | | | | | vXi1 vector instead of a scalar We need to custom handle these so we can turn the scalar mask into a vXi1 vector. Differential Revision: https://reviews.llvm.org/D56530 llvm-svn: 351390
* [X86] Add shift-by-immediate tests for non-immediate/out-of-range valuesSimon Pilgrim2019-01-081-0/+179
| | | | | | As noted on PR40203, for gcc compatibility we need to support non-immediate values in the 'slli/srli/srai' shift by immediate vector intrinsics. llvm-svn: 350619
* [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift ↵Simon Pilgrim2018-12-201-48/+48
| | | | | | | | | | | | intrinsics (clang) This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics. LLVM counterpart: https://reviews.llvm.org/D55938 Differential Revision: https://reviews.llvm.org/D55937 llvm-svn: 349796
* [X86] Add more intrinsics to match icc.Craig Topper2018-10-201-4/+4
| | | | | | | | | | This adds _mm_loadu_epi8, _mm256_loadu_epi8, _mm512_loadu_epi8 _mm_loadu_epi16, _mm256_loadu_epi16, _mm512_loadu_epi16 _mm_storeu_epi8, _mm256_storeu_epi8, _mm512_storeu_epi8 _mm_storeu_epi16, _mm256_storeu_epi16, _mm512_storeu_epi16 llvm-svn: 344862
* [X86] Add missing intrinsics to match icc.Craig Topper2018-10-201-45/+241
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds _mm_and_epi32, _mm_and_epi64 _mm_andnot_epi32, _mm_andnot_epi64 _mm_or_epi32, _mm_or_epi64 _mm_xor_epi32, _mm_xor_epi64 _mm256_and_epi32, _mm256_and_epi64 _mm256_andnot_epi32, _mm256_andnot_epi64 _mm256_or_epi32, _mm256_or_epi64 _mm256_xor_epi32, _mm256_xor_epi64 _mm_loadu_epi32, _mm_loadu_epi64 _mm_load_epi32, _mm_load_epi64 _mm256_loadu_epi32, _mm256_loadu_epi64 _mm256_load_epi32, _mm256_load_epi64 _mm512_loadu_epi32, _mm512_loadu_epi64 _mm512_load_epi32, _mm512_load_epi64 _mm_storeu_epi32, _mm_storeu_epi64 _mm_store_epi32, _mm_load_epi64 _mm256_storeu_epi32, _mm256_storeu_epi64 _mm256_store_epi32, _mm256_load_epi64 _mm512_storeu_epi32, _mm512_storeu_epi64 _mm512_store_epi32,V _mm512_load_epi64 llvm-svn: 344861
* [X86] Lowering integer truncation intrinsics to native IRMikhail Dvoretckii2018-07-101-8/+16
| | | | | | | | | | | | | This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to native IR in cases where the result's length is less than 128 bits. The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for 128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps) instructions are generated instead Differential Revision: https://reviews.llvm.org/D48712 llvm-svn: 336643
* [X86] Use shufflevector instead of a select with a constant mask for ↵Craig Topper2018-07-051-96/+96
| | | | | | | | | | fmaddsub/fmsubadd IR emission. Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles. While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle. llvm-svn: 336388
* [X86] Fix some vector cmp builtins - TRUE/FALSE predicatesGabor Buella2018-07-051-56/+48
| | | | | | | | | | | | | | | | | This patch removes on optimization used with the TRUE/FALSE predicates, as was suggested in https://reviews.llvm.org/D45616 for r335339. The optimization was buggy, since r335339 used it also for *_mask builtins, without actually applying the mask -- the mask argument was just ignored. Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D48715 llvm-svn: 336355
* [X86] NFC - add more test cases for vector cmp intrinsicsGabor Buella2018-07-051-84/+1589
| | | | | | | | | | | | | | | | | | | | | | | | | | Add test cases with each predicate using the following intrinsics: _mm_cmp_pd _mm_cmp_ps _mm256_cmp_pd _mm256_cmp_ps _mm_cmp_pd_mask _mm_cmp_ps_mask _mm256_cmp_pd_mask _mm256_cmp_ps_mask _mm512_cmp_pd_mask _mm512_cmp_ps_mask _mm_mask_cmp_pd_mask _mm_mask_cmp_ps_mask _mm256_mask_cmp_pd_mask _mm256_mask_cmp_ps_mask _mm512_mask_cmp_pd_mask _mm512_mask_cmp_ps_mask Some of these are marked with FIXME, as there is bug in lowering e.g. _mm512_mask_cmp_ps_mask. llvm-svn: 336346
* [X86] Correct the width of mask arguments in intrinsic headers and tests.Craig Topper2018-06-301-4/+4
| | | | | | | | | | | | | All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there. Some of these were real bugs where we lost bits from the user input: _mm512_mask_broadcast_f32x8 _mm512_maskz_broadcast_f32x8 _mm512_mask_broadcast_i32x8 _mm512_maskz_broadcast_i32x8 _mm256_mask_cvtusepi16_storeu_epi8 llvm-svn: 336042
* [X86] Remove masking from the avx512 rotate builtins. Use a select builtin ↵Craig Topper2018-06-301-48/+80
| | | | | | instead. llvm-svn: 336036
* [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IRGabor Buella2018-06-221-12/+127
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Lowering some vector comparision builtins to fcmp IR instructions. This ignores the signaling behaviour specified in the predicate argument of said builtins. Affected AVX512 builtins: __builtin_ia32_cmpps128_mask __builtin_ia32_cmpps256_mask __builtin_ia32_cmpps512_mask __builtin_ia32_cmppd128_mask __builtin_ia32_cmppd256_mask __builtin_ia32_cmppd512_mask Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: craig.topper, spatel, efriedma Differential Revision: https://reviews.llvm.org/D45616 llvm-svn: 335339
* [X86] Lowering sqrt intrinsics to native IRTomasz Krupa2018-06-151-8/+8
| | | | | | | | | | | | Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, cfe-commits Differential Revision: https://reviews.llvm.org/D41168 llvm-svn: 334850
* [X86] Use target independent masked expandload and compressstore intrinsics ↵Craig Topper2018-06-101-24/+24
| | | | | | | | | | | | | | | | to implement expandload/compressstore builtins. Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them. Reviewers: RKSimon, delena, spatel, GBuella Reviewed By: RKSimon Subscribers: cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D47693 llvm-svn: 334366
* [X86] Add builtins for vpermq/vpermpd instructions to enable target feature ↵Craig Topper2018-06-081-6/+6
| | | | | | checking. llvm-svn: 334311
* [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature ↵Craig Topper2018-06-081-4/+4
| | | | | | and immediate range checking. llvm-svn: 334265
* [X86] Add subvector insert and extract builtins to enable target feature ↵Craig Topper2018-06-081-6/+6
| | | | | | | | checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261
* [X86] Add builtins for vpermilps/pd instructions to enable target feature ↵Craig Topper2018-06-081-8/+8
| | | | | | checking. llvm-svn: 334256
* [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable ↵Craig Topper2018-06-071-3/+3
| | | | | | target feature checking and immediate range checking. llvm-svn: 334244
* [X86] Lowering FMA intrinsics to native IR (Clang part)Gabor Buella2018-05-301-72/+390
| | | | | | | | | | | | | | | | This patch replaces all packed (and scalar without rounding mode) fused intrinsics with fmadd/fmaddsub variations. Then fmadd/fmaddsub are lowered to native IR. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47444 llvm-svn: 333555
* [X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins ↵Craig Topper2018-05-291-32/+56
| | | | | | to a single version without masking. Use select builtins with appropriate operand instead. llvm-svn: 333387
* [X86] Remove masking from pternlog llvm intrinsics and use a select ↵Craig Topper2018-05-211-12/+20
| | | | | | | | | | | | instruction instead. Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions. To avoid that just generate IR directly in CGBuiltin.cpp. Differential Revision: https://reviews.llvm.org/D47125 llvm-svn: 332891
* [X86] Use __builtin_convertvector to implement some of the packed integer to ↵Craig Topper2018-05-211-10/+14
| | | | | | | | | | | | packed float conversion intrinsics. I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions. We already do something similar for scalar conversions. Differential Revision: https://reviews.llvm.org/D46863 llvm-svn: 332882
* [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select ↵Craig Topper2018-05-201-5/+9
| | | | | | | | in IR instead. Someday maybe we'll use selects for all the builtins. llvm-svn: 332825
* [X86] Revert part of r332266: Use __builtin_convertvector to replace some of ↵Craig Topper2018-05-151-4/+2
| | | | | | | | the avx512 truncate builtins. The masking doesn't work right in the backend for the ones that produce byte or word elements without avx512bw. llvm-svn: 332322
* [X86] Use __builtin_convertvector to replace some of the avx512 truncate ↵Craig Topper2018-05-141-6/+10
| | | | | | | | | | builtins. As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend. Differential Revision: https://reviews.llvm.org/D46742 llvm-svn: 332266
* [X86] Remove '#ifdef __x86_64__' around mask_set1_epi64 intrinsics.Craig Topper2018-04-241-2/+0
| | | | | | The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either. llvm-svn: 330681
* [X86] Emit native IR for pmuldq/pmuludq builtins.Craig Topper2018-04-091-8/+32
| | | | | | | | I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts. Differential Revision: https://reviews.llvm.org/D45421 llvm-svn: 329605
* [X86] Remove some masked cvt builtins that can be replaced with legacy ↵Craig Topper2018-02-241-22/+44
| | | | | | sse/avx buiiltins and a select. llvm-svn: 326039
* [X86] Remove __builtin_ia32_permvarsf256_mask and ↵Craig Topper2018-02-241-6/+9
| | | | | | __builtin_ia32_permvarsi256_mask and use the avx2 unmasked versions and a select instead. llvm-svn: 326022
* [X86] Change the signature of the AVX512 packed fp compare intrinsics to ↵Craig Topper2018-02-101-8/+12
| | | | | | | | | | | | | | | | return vXi1 mask. Make bitcasts to scalar explicit in IR Summary: This is the clang equivalent of r324827 Reviewers: zvi, delena, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43143 llvm-svn: 324828
* [X86] test/testn intrinsics lowering to IR. clang sideUriel Korach2017-11-131-16/+41
| | | | | | | | | Change Header files of the intrinsics for lowering test and testn intrinsics to IR code. Removed test and testn builtins from clang Differential Revision: https://reviews.llvm.org/D38737 llvm-svn: 318035
* [x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IRJina Nahias2017-11-131-12/+24
| | | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38672 Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047 llvm-svn: 318025
* fixing a bug in mask[z]_set1 intrinsicJina Nahias2017-09-251-32/+4
| | | | | | | Differential Revision: https://reviews.llvm.org/D38231 Change-Id: I80bbff9cbe93e4be54d8a761ef9723edf3f57c57 llvm-svn: 314102
* Lowering Mask Set1 intrinsics to LLVM IRJina Nahias2017-09-191-8/+78
| | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37668 llvm-svn: 313624
* [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (clang)Uriel Korach2017-09-131-14/+40
| | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D37693), implements the lowering of X86 ABS intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37694 llvm-svn: 313133
* [x86] these aren't the undefs you're looking for (PR32176)Sanjay Patel2017-03-121-17/+17
| | | | | | | | | | | | | x86 has undef SSE/AVX intrinsics that should represent a bogus register operand. This is not the same as LLVM's undef value which can take on multiple bit patterns. There are better solutions / follow-ups to this discussed here: https://bugs.llvm.org/show_bug.cgi?id=32176 ...but this should prevent miscompiles with a one-line code change. Differential Revision: https://reviews.llvm.org/D30834 llvm-svn: 297588
* [AVX-512] Replace subvector broadcast builtins with shufflevectors and selects.Craig Topper2017-01-181-12/+16
| | | | | | Verified that the backend codegens this equally well. llvm-svn: 292329
* [AVX-512] Remove 128/256-bit masked vpermilvar builtins and replace with ↵Craig Topper2016-12-101-8/+16
| | | | | | select and the avx unmasked builtins. llvm-svn: 289338
* [X86][AVX512VL] Add missing _mm256_maskz_alignr_epi64 shufflevector checkSimon Pilgrim2016-11-231-0/+1
| | | | | | Missed in rL287733 llvm-svn: 287755
* [X86] Replace valignd/q builtins with appropriate __builtin_shufflevector.Craig Topper2016-11-231-12/+19
| | | | llvm-svn: 287733
* [X86][AVX512] Replace lossless i32/u32 to f64 conversion intrinsics with ↵Simon Pilgrim2016-11-161-20/+33
| | | | | | | | | | | | | | generic IR Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen. This patch removes the clang builtins and their use in the headers - a future patch will deal with removing the llvm intrinsics. This is an extension patch to D20528 which dealt with the equivalent sse/avx cases. Differential Revision: https://reviews.llvm.org/D26686 llvm-svn: 287088
OpenPOWER on IntegriCloud