summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGen/avx512f-builtins.c
Commit message (Collapse)AuthorAgeFilesLines
* [IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperatorCameron McInally2019-10-141-144/+144
| | | | | | | | Reapply r374240 with fix for Ocaml test, namely Bindings/OCaml/core.ml. Differential Revision: https://reviews.llvm.org/D61675 llvm-svn: 374782
* Revert "[IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator"Dmitri Gribenko2019-10-101-144/+144
| | | | | | | This reverts commit r374240. It broke OCaml tests: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19014 llvm-svn: 374354
* [IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperatorCameron McInally2019-10-091-144/+144
| | | | | | | | Also update Clang to call Builder.CreateFNeg(...) for UnaryMinus. Differential Revision: https://reviews.llvm.org/D61675 llvm-svn: 374240
* Fix reliance on -flax-vector-conversions in AVX intrinsics headers andRichard Smith2019-09-171-6/+6
| | | | | | corresponding tests. llvm-svn: 372063
* [x86] Fix bugs of some intrinsic functions in CLANG : _mm512_stream_ps, ↵Pengfei Wang2019-09-031-0/+17
| | | | | | | | | | | | | | | | _mm512_stream_pd, _mm512_stream_si512 Reviewers: craig.topper, pengfei, LuoYuanke, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Patch by Bing Yu (yubing) Differential Revision: https://reviews.llvm.org/D66786 llvm-svn: 370691
* [x86] Adding support for some missing intrinsics: _mm512_cvtsi512_si32Pengfei Wang2019-08-291-0/+6
| | | | | | | | | | | | | | | | | | Summary: Adding support for some missing intrinsics: _mm512_cvtsi512_si32 Reviewers: craig.topper, pengfei, LuoYuanke, spatel, RKSimon Reviewed By: craig.topper Subscribers: llvm-commits Patch by Bing Yu (yubing) Differential Revision: https://reviews.llvm.org/D66785 llvm-svn: 370297
* [NewPM] Run avx*-builtins.c tests under the new pass manager onlyLeonard Chan2019-07-261-4/+8
| | | | | | | | | | | | | | | | | | | | This patch changes the following tests to run under the new pass manager only: ``` Clang :: CodeGen/avx512-reduceMinMaxIntrin.c (1 of 4) Clang :: CodeGen/avx512vl-builtins.c (2 of 4) Clang :: CodeGen/avx512vlbw-builtins.c (3 of 4) Clang :: CodeGen/avx512f-builtins.c (4 of 4) ``` The new PM added extra bitcasts that weren't checked before. For reduceMinMaxIntrin.c, the issue was mostly the alloca's being in a different order. Other changes involved extra bitcasts, and differently ordered loads and stores, but the logic should still be the same. Differential revision: https://reviews.llvm.org/D65110 llvm-svn: 367157
* [X86] Don't use _MM_FROUND_CUR_DIRECTION in the intrinsics tests.Craig Topper2019-06-221-344/+344
| | | | | | | | | | _MM_FROUND_CUR_DIRECTION is the behavior of the intrinsics that don't take a rounding mode argument. So a better test is using _MM_FROUND_NO_EXC with the SAE only intrinsics and an explicit rounding mode with the intrinsics that support embedded rounding mode. llvm-svn: 364127
* [X86] Check the alignment argument for the masked.load/store for the ↵Craig Topper2019-05-201-6/+6
| | | | | | _mm_mask_store_ss/sd and _mm_mask(z)_load_ss/sd intrinsics. llvm-svn: 361187
* [X86] Follow up to r353878, add MSVC compatibility command lines to other ↵Craig Topper2019-02-121-112/+113
| | | | | | | | intrinsic tests that uses packed structs to control alignment. r353878 fixed a bug in _mm_loadu_ps and added a command line to catch it. Adding additional command lines to prevent breaking other intrinsics in the future. llvm-svn: 353887
* [X86] Add new variadic avx512 compress/expand intrinsics that use vXi1 types ↵Craig Topper2019-01-281-16/+16
| | | | | | | | for the mask argument. Custom lower the builtins to these intrinsics. This enables the middle end to optimize out bitcasts for the masks. llvm-svn: 352344
* [X86] Custom codegen 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops ↵Craig Topper2019-01-261-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | intrinsics. Summary: The 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics all have the possibility of taking an explicit rounding mode argument. If the rounding mode is CUR_DIRECTION we'd like to emit a sitofp/uitofp instruction and a select like we do for 256-bit intrinsics. For cvt(u)qqtopd and cvt(u)dqtops we do this when the form of the software intrinsics that doesn't take a rounding mode argument is used. This is done by using convertvector in the header with the select builtin. But if the explicit rounding mode form of the intrinsic is used and CUR_DIRECTION is passed, we don't do this. We shouldn't have this inconsistency. For cvt(u)qqtops nothing is done because we can't use the select builtin in the header without avx512vl. So we need to use custom codegen for this. Even when the rounding mode isn't CUR_DIRECTION we should also use select in IR for consistency. And it will remove another scalar integer mask from our intrinsics. To accomplish all of these goals I've taken a slightly unusual approach. I've added two new X86 specific intrinsics for sitofp/uitofp with rounding. These intrinsics are variadic on the input and output type so we only need 2 instead of 6. This avoids the need for a switch to map them in CGBuiltin.cpp. We just need to check signed vs unsigned. I believe other targets also use variadic intrinsics like this. So if the rounding mode is CUR_DIRECTION we'll use an sitofp/uitofp instruction. Otherwise we'll use one of the new intrinsics. After that we'll emit a select instruction if needed. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D56998 llvm-svn: 352267
* [X86] Add missing test cases for some int/fp->fp conversion intrinsics with ↵Craig Topper2019-01-201-19/+75
| | | | | | | | | | rounding mode. Use non-default rounding mode on some tests. For some reason we were missing tests for several unmasked conversion intrinsics, but had their mask form. Also use a non-default rounding mode on some tests to provide better coverage for a future patch. llvm-svn: 351708
* [X86] Add custom emission for the avx512 scatter builtins to convert from ↵Craig Topper2019-01-171-16/+16
| | | | | | scalar integer to vXi1 for the mask arguments to the intrinsics. llvm-svn: 351408
* [X86] Add versions of the avx512 gather intrinsics that take the mask as a ↵Craig Topper2019-01-161-16/+16
| | | | | | | | | | vXi1 vector instead of a scalar We need to custom handle these so we can turn the scalar mask into a vXi1 vector. Differential Revision: https://reviews.llvm.org/D56530 llvm-svn: 351390
* [X86] Add shift-by-immediate tests for non-immediate/out-of-range valuesSimon Pilgrim2019-01-081-0/+121
| | | | | | As noted on PR40203, for gcc compatibility we need to support non-immediate values in the 'slli/srli/srai' shift by immediate vector intrinsics. llvm-svn: 350619
* [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift ↵Simon Pilgrim2018-12-201-24/+24
| | | | | | | | | | | | intrinsics (clang) This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics. LLVM counterpart: https://reviews.llvm.org/D55938 Differential Revision: https://reviews.llvm.org/D55937 llvm-svn: 349796
* [X86] Add missing intrinsics to match icc.Craig Topper2018-10-201-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds _mm_and_epi32, _mm_and_epi64 _mm_andnot_epi32, _mm_andnot_epi64 _mm_or_epi32, _mm_or_epi64 _mm_xor_epi32, _mm_xor_epi64 _mm256_and_epi32, _mm256_and_epi64 _mm256_andnot_epi32, _mm256_andnot_epi64 _mm256_or_epi32, _mm256_or_epi64 _mm256_xor_epi32, _mm256_xor_epi64 _mm_loadu_epi32, _mm_loadu_epi64 _mm_load_epi32, _mm_load_epi64 _mm256_loadu_epi32, _mm256_loadu_epi64 _mm256_load_epi32, _mm256_load_epi64 _mm512_loadu_epi32, _mm512_loadu_epi64 _mm512_load_epi32, _mm512_load_epi64 _mm_storeu_epi32, _mm_storeu_epi64 _mm_store_epi32, _mm_load_epi64 _mm256_storeu_epi32, _mm256_storeu_epi64 _mm256_store_epi32, _mm256_load_epi64 _mm512_storeu_epi32, _mm512_storeu_epi64 _mm512_store_epi32,V _mm512_load_epi64 llvm-svn: 344861
* [X86] Add k-mask conversion and load/store instrinsics to match gcc and icc.Craig Topper2018-08-311-0/+29
| | | | | | | | | | | | This adds: _cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64 _cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64 _load_mask8, _load_mask16, _load_mask32, _load_mask64 _store_mask8, _store_mask16, _store_mask32, _store_mask64 These are currently missing from the Intel Intrinsics Guide webpage. llvm-svn: 341251
* [X86] Add kshift intrinsics to match gcc and icc.Craig Topper2018-08-311-0/+16
| | | | | | | | | | | | | | This adds the following intrinsics: _kshiftli_mask8 _kshiftli_mask16 _kshiftli_mask32 _kshiftli_mask64 _kshiftri_mask8 _kshiftri_mask16 _kshiftri_mask32 _kshiftri_mask64 llvm-svn: 341234
* [X86] Add kortest intrinsics for 8, 32, and 64 bit masks. Add new intrinsic ↵Craig Topper2018-08-281-0/+46
| | | | | | | | names for 16 bit masks. This matches gcc and icc despite not being documented in the Intel Intrinsics Guide. llvm-svn: 340798
* [X86] Add intrinsics for kand/kandn/knot/kor/kxnor/kxor with 8, 32, and ↵Craig Topper2018-08-271-0/+65
| | | | | | | | | | 64-bit mask registers. This also adds a second intrinsic name for the 16-bit mask versions. These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed. llvm-svn: 340719
* [X86] Change the rounding mode used when testing the sqrt_round intrinsics.Craig Topper2018-07-131-42/+12
| | | | | | Using CUR_DIRECTION is not a realistic scenario. That is equivalent to the intrinsic without rounding. llvm-svn: 337040
* [X86] Fix the test for _mm512_mullox_epi64 to test the intrinsic instead of ↵Craig Topper2018-07-101-1/+1
| | | | | | a copy of the intrinsic implementation. llvm-svn: 336739
* [X86] Use masked the masked scalar fma builtins to implement the default ↵Craig Topper2018-07-101-448/+220
| | | | | | | | | | rounding version of the fma intrinsics. The rounding mode is checked in CGBuiltin.cpp to generate the correct intrinsic call. Making this switch switchs the masking to use the i8 bitcast to <8 x i1> and extract i1 version of the IR for the mask. Previously we ended up with a scalar 'and' plus an icmp. llvm-svn: 336637
* [X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that ↵Craig Topper2018-07-101-76/+120
| | | | | | | | | | | | is suitable for use in scalar mask intrinsics. This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics. This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling. I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle. llvm-svn: 336622
* [X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types.Craig Topper2018-07-081-32/+268
| | | | | | This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma llvm-svn: 336507
* [X86] Fix a few intrinsics that were ignoring their rounding mode argument ↵Craig Topper2018-07-071-44/+44
| | | | | | | | | | and hardcoded _MM_FROUND_CUR_DIRECTION internally. I believe these have been broken since their introduction into clang. I've enhanced the tests for these intrinsics to using a real rounding mode and checking all the intrinsic arguments instead of just the name. llvm-svn: 336498
* [X86] Fix various type mismatches in intrinsic headers and intrinsic tests ↵Craig Topper2018-07-071-15/+15
| | | | | | | | that cause extra bitcasts to be emitted in the IR. Found via imprecise grepping of the -O0 IR. There could still be more bugs out there. llvm-svn: 336487
* [X86] When creating a select for scalar masked sqrt and div builtins make ↵Craig Topper2018-07-061-94/+90
| | | | | | | | | | sure we optimize the all ones mask case. This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases. Factor the code into a helper function to make this easy. llvm-svn: 336472
* [X86] Add missing scalar fma intrinsics with rounding, but no mask.Craig Topper2018-07-061-24/+72
| | | | | | | | We had the mask versions of the rounding intrinsics, but not one without masking. Also change the rounding tests to not use the CUR_DIRECTION rounding mode. llvm-svn: 336470
* [X86] Use shufflevector instead of a select with a constant mask for ↵Craig Topper2018-07-051-64/+64
| | | | | | | | | | fmaddsub/fmsubadd IR emission. Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles. While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle. llvm-svn: 336388
* [X86] Fix some vector cmp builtins - TRUE/FALSE predicatesGabor Buella2018-07-051-28/+24
| | | | | | | | | | | | | | | | | This patch removes on optimization used with the TRUE/FALSE predicates, as was suggested in https://reviews.llvm.org/D45616 for r335339. The optimization was buggy, since r335339 used it also for *_mask builtins, without actually applying the mask -- the mask argument was just ignored. Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D48715 llvm-svn: 336355
* [X86] NFC - add more test cases for vector cmp intrinsicsGabor Buella2018-07-051-26/+787
| | | | | | | | | | | | | | | | | | | | | | | | | | Add test cases with each predicate using the following intrinsics: _mm_cmp_pd _mm_cmp_ps _mm256_cmp_pd _mm256_cmp_ps _mm_cmp_pd_mask _mm_cmp_ps_mask _mm256_cmp_pd_mask _mm256_cmp_ps_mask _mm512_cmp_pd_mask _mm512_cmp_ps_mask _mm_mask_cmp_pd_mask _mm_mask_cmp_ps_mask _mm256_mask_cmp_pd_mask _mm256_mask_cmp_ps_mask _mm512_mask_cmp_pd_mask _mm512_mask_cmp_ps_mask Some of these are marked with FIXME, as there is bug in lowering e.g. _mm512_mask_cmp_ps_mask. llvm-svn: 336346
* NFC - typo fix in test/CodeGen/avx512f-builtins.cGabor Buella2018-07-041-2/+2
| | | | llvm-svn: 336243
* [X86] Correct the width of mask arguments in intrinsic headers and tests.Craig Topper2018-06-301-23/+21
| | | | | | | | | | | | | All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there. Some of these were real bugs where we lost bits from the user input: _mm512_mask_broadcast_f32x8 _mm512_maskz_broadcast_f32x8 _mm512_mask_broadcast_i32x8 _mm512_maskz_broadcast_i32x8 _mm256_mask_cvtusepi16_storeu_epi8 llvm-svn: 336042
* [X86] Remove masking from the avx512 rotate builtins. Use a select builtin ↵Craig Topper2018-06-301-24/+40
| | | | | | instead. llvm-svn: 336036
* [X86] Remove masking from the avx512 packed sqrt builtins. Use select ↵Craig Topper2018-06-291-12/+12
| | | | | | builtins instead. llvm-svn: 335945
* [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IRGabor Buella2018-06-221-40/+112
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Lowering some vector comparision builtins to fcmp IR instructions. This ignores the signaling behaviour specified in the predicate argument of said builtins. Affected AVX512 builtins: __builtin_ia32_cmpps128_mask __builtin_ia32_cmpps256_mask __builtin_ia32_cmpps512_mask __builtin_ia32_cmppd128_mask __builtin_ia32_cmppd256_mask __builtin_ia32_cmppd512_mask Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: craig.topper, spatel, efriedma Differential Revision: https://reviews.llvm.org/D45616 llvm-svn: 335339
* [X86] Remove masking from the 512-bit floating point max/min builtins. Use ↵Craig Topper2018-06-211-20/+35
| | | | | | select in IR instead. llvm-svn: 335200
* [X86] Lowering sqrt intrinsics to native IRTomasz Krupa2018-06-151-22/+102
| | | | | | | | | | | | Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, cfe-commits Differential Revision: https://reviews.llvm.org/D41168 llvm-svn: 334850
* [X86] Lowering Mask Scalar intrinsics to native IR (Clang part)Tomasz Krupa2018-06-141-16/+148
| | | | | | | | | | | | | | | Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls to native IR. Reviewers: craig.topper, RKSimon, spatel, sroland Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47979 llvm-svn: 334741
* [X86] Use target independent masked expandload and compressstore intrinsics ↵Craig Topper2018-06-101-10/+22
| | | | | | | | | | | | | | | | to implement expandload/compressstore builtins. Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them. Reviewers: RKSimon, delena, spatel, GBuella Reviewed By: RKSimon Subscribers: cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D47693 llvm-svn: 334366
* [X86] Remove masking from the 512-bit packed floating point add/sub/mul/div ↵Craig Topper2018-06-101-24/+40
| | | | | | builtins. Use select in IR instead. llvm-svn: 334359
* [X86] Add builtins for vpermq/vpermpd instructions to enable target feature ↵Craig Topper2018-06-081-6/+6
| | | | | | checking. llvm-svn: 334311
* [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature ↵Craig Topper2018-06-081-3/+3
| | | | | | and immediate range checking. llvm-svn: 334265
* [X86] Add subvector insert and extract builtins to enable target feature ↵Craig Topper2018-06-081-12/+12
| | | | | | | | checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261
* [X86] Add builtins for vpermilps/pd instructions to enable target feature ↵Craig Topper2018-06-081-6/+6
| | | | | | checking. llvm-svn: 334256
* [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable ↵Craig Topper2018-06-071-3/+3
| | | | | | target feature checking and immediate range checking. llvm-svn: 334244
* [X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit ↵Craig Topper2018-06-071-24/+24
| | | | | | | | | | | | | | | | | | | | | | | fmadd/fmsub/fmaddsub/fmsubadd builtins. Summary: We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't. I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim. The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform. I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it. Reviewers: tkrupa, RKSimon, spatel, GBuella Reviewed By: tkrupa Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47724 llvm-svn: 334159
OpenPOWER on IntegriCloud