summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGen/avx512f-builtins.c
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Avoid passing _mm_undefined* to builtin_shufflevector if we are able ↵Craig Topper2018-06-041-12/+12
| | | | | | | | to pass the first input a second time. This is more consistent with other usages of builtin_shufflevector. Later optimization passes or codegen will detect the duplicate vector and replace it with undef. Using _mm_undefined just puts a zeroinitializer that still needs to be optimized out later. llvm-svn: 333944
* [X86] Make 512-bit unmasked load/store builtins more like their 128/256-bit ↵Craig Topper2018-05-311-2/+2
| | | | | | | | equivalents. Previously we were just passing -1 mask to the masked builtin. This changes it to the more generic way that the 128/256 bit use. llvm-svn: 333626
* [X86] Lowering FMA intrinsics to native IR (Clang part)Gabor Buella2018-05-301-122/+808
| | | | | | | | | | | | | | | | This patch replaces all packed (and scalar without rounding mode) fused intrinsics with fmadd/fmaddsub variations. Then fmadd/fmaddsub are lowered to native IR. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47444 llvm-svn: 333555
* [X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins ↵Craig Topper2018-05-291-14/+38
| | | | | | to a single version without masking. Use select builtins with appropriate operand instead. llvm-svn: 333387
* [X86] Remove masking from pternlog llvm intrinsics and use a select ↵Craig Topper2018-05-211-6/+10
| | | | | | | | | | | | instruction instead. Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions. To avoid that just generate IR directly in CGBuiltin.cpp. Differential Revision: https://reviews.llvm.org/D47125 llvm-svn: 332891
* [X86] Use __builtin_convertvector to implement some of the packed integer to ↵Craig Topper2018-05-211-6/+10
| | | | | | | | | | | | packed float conversion intrinsics. I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions. We already do something similar for scalar conversions. Differential Revision: https://reviews.llvm.org/D46863 llvm-svn: 332882
* [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select ↵Craig Topper2018-05-201-12/+20
| | | | | | | | in IR instead. Someday maybe we'll use selects for all the builtins. llvm-svn: 332825
* [X86] Revert part of r332266: Use __builtin_convertvector to replace some of ↵Craig Topper2018-05-151-12/+6
| | | | | | | | the avx512 truncate builtins. The masking doesn't work right in the backend for the ones that produce byte or word elements without avx512bw. llvm-svn: 332322
* [X86] Use __builtin_convertvector to replace some of the avx512 truncate ↵Craig Topper2018-05-141-12/+20
| | | | | | | | | | builtins. As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend. Differential Revision: https://reviews.llvm.org/D46742 llvm-svn: 332266
* [X86] Use select instrution and fpextend in the implementation of ↵Craig Topper2018-05-141-3/+6
| | | | | | _mm512_mask_cvtps_pd and _mm512_maskz_cvtps_pd. llvm-svn: 332213
* [X86] Use __builtin_convertvector to implement _mm512_cvtps_pd.Craig Topper2018-05-141-2/+2
| | | | | | | | If we're using default rounding mode we can let __builtin_convertvector to generate an fpextend. This matches 128 and 256 bit. If we're using the version that takes an explicit rounding mode argument we would need to look at the immediate to see if its CUR_DIRECTION. llvm-svn: 332210
* [X86] Emit better code for _mm_cvtu32_sd, _mm_cvtu64_sd, _mm_cvtu32_ss, and ↵Craig Topper2018-05-131-4/+8
| | | | | | | | | | _mm_cvtu64_ss. We can use direct C code for these that will use uitofp and insertelement instructions. For the versions that take an explicit rounding mode we can't do this. llvm-svn: 332203
* [X86] Change the implementation of scalar masked load/store intrinsics to ↵Craig Topper2018-05-101-6/+6
| | | | | | | | | | not use a 512-bit intermediate vector. This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs. Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well. llvm-svn: 331958
* [X86] Add support for _mm512_mullox_epi64 and _mm512_mask_mullox_epi64 ↵Craig Topper2018-04-261-0/+13
| | | | | | | | | | intrinsics to match icc. On AVX512F targets we'll produce an emulated sequence using 3 pmuludqs with shifts and adds. On AVX512DQ we'll use vpmulld. Fixes PR37140. llvm-svn: 330923
* [X86] Remove '#ifdef __x86_64__' around mask_set1_epi64 intrinsics.Craig Topper2018-04-241-2/+0
| | | | | | The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either. llvm-svn: 330681
* CodeGen tests - typo fixes NFCGabor Buella2018-04-101-2/+2
| | | | llvm-svn: 329689
* [X86] Emit native IR for pmuldq/pmuludq builtins.Craig Topper2018-04-091-6/+24
| | | | | | | | I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts. Differential Revision: https://reviews.llvm.org/D45421 llvm-svn: 329605
* [X86] Reverse the operand order of the implementation of the kunpack builtins.Craig Topper2018-02-121-1/+1
| | | | | | | | The second operand needs to be in the lower bits of the concatenation. This matches llvm 5.0, gcc, and icc behavior. Fixes PR36360. llvm-svn: 324954
* [X86] Change the signature of the AVX512 packed fp compare intrinsics to ↵Craig Topper2018-02-101-8/+12
| | | | | | | | | | | | | | | | return vXi1 mask. Make bitcasts to scalar explicit in IR Summary: This is the clang equivalent of r324827 Reviewers: zvi, delena, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43143 llvm-svn: 324828
* [X86] Replace kortest intrinsics with native IR.Craig Topper2018-02-081-6/+18
| | | | llvm-svn: 324647
* [X86] Implement old kunpck intrinsics using vector ops on vXi1 instead of ↵Craig Topper2018-01-141-6/+6
| | | | | | | | | | | | | | | | | | | integer shift/and/or Summary: kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation. I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations. Reviewers: RKSimon, spatel, zvi, jina.nahias Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D42016 llvm-svn: 322461
* [X86] Use {{.*}} instead of hardcoded %1 in knot test.Martin Bohme2017-12-181-1/+1
| | | | | | | This makes the test more resilient and consistent with the other tests introduced in r320919. llvm-svn: 320971
* [X86] Implement kand/kandn/kor/kxor/kxnor/knot intrinsics using native IR.Craig Topper2017-12-161-16/+45
| | | | llvm-svn: 320919
* [x86][AVX512] Lowering kunpack intrinsics to LLVM IRJina Nahias2017-12-051-3/+10
| | | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D39720), implements the lowering of X86 kunpack intrinsics to IR. Differential Revision: https://reviews.llvm.org/D39719 Change-Id: Id5d3cb394ad33b98be79a6783d1d15569e2b798d llvm-svn: 319777
* [X86] test/testn intrinsics lowering to IR. clang sideUriel Korach2017-11-131-6/+15
| | | | | | | | | Change Header files of the intrinsics for lowering test and testn intrinsics to IR code. Removed test and testn builtins from clang Differential Revision: https://reviews.llvm.org/D38737 llvm-svn: 318035
* ChangeJina Nahias2017-11-131-3/+3
| | | | | | | | | | // CHECK: shufflevector <8 x double> %0, <8 x double> %{{.*}}, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 8, i32 9> To // CHECK: shufflevector <8 x double> %{{.*}}, <8 x double> %{{.*}}, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 8, i32 9> for fixing 318025 commit warning Change-Id: Id48a1fe1f247fe6a0b84e7189f18d2e637678e79 llvm-svn: 318031
* [x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IRJina Nahias2017-11-131-12/+20
| | | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38672 Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047 llvm-svn: 318025
* [x86] make assertions less strict in avx512f test fileSanjay Patel2017-09-251-1/+1
| | | | | | Missed a line in r314158. llvm-svn: 314159
* [x86] make assertions less strict in avx512f test fileSanjay Patel2017-09-251-16/+16
| | | | | | I'm not sure why yet, but there may be differences depending on the host? llvm-svn: 314158
* [x86] remove RUNs that were checking fully optimized IRSanjay Patel2017-09-251-81/+34
| | | | | | | | Clang regression tests that depend on the optimizer can break when there are changes to LLVM...as in: https://reviews.llvm.org/rL314117 llvm-svn: 314144
* Lowering Mask Set1 intrinsics to LLVM IRJina Nahias2017-09-191-54/+111
| | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37668 llvm-svn: 313624
* [X86] Disable _mm512_maskz_set1_epi64 intrinsic on 32-bit targets to prevent ↵Craig Topper2017-09-151-0/+7
| | | | | | | | | | | | a backend isel failure. The __builtin_ia32_pbroadcastq512_mem_mask we were previously trying to use in 32-bit mode is not implemented in the x86 backend and causes isel to fail in release builds. In debug builds it fails even earlier during legalization with an llvm_unreachable. While there add the missing test case for this intrinsic for this for 64-bit mode. This fixes PR34631. D37668 should be able to recover this for 32-bit mode soon. But I wanted to fix the crash ahead of that. llvm-svn: 313392
* [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (clang)Uriel Korach2017-09-131-4/+16
| | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D37693), implements the lowering of X86 ABS intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37694 llvm-svn: 313133
* [X86][AVX512] _mm512_stream_load_si512 should take a void const* argument ↵Simon Pilgrim2017-09-051-0/+6
| | | | | | | | | | | | (PR33977) Based off the Intel Intrinsics guide, we should expect a void const* argument. Prevents 'passing 'const void *' to parameter of type 'void *' discards qualifiers' warnings. Differential Revision: https://reviews.llvm.org/D37449 llvm-svn: 312523
* [x86] weaken test checks that shouldn't be here in the first placeSanjay Patel2017-06-271-12/+15
| | | | | | | This test would fail after the proposed change in: https://reviews.llvm.org/D34242 llvm-svn: 306433
* [X86][AVX] Added support for _mm256_zext* helper intrinsics (PR32839)Simon Pilgrim2017-04-291-0/+41
| | | | llvm-svn: 301749
* [X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (clang)Simon Pilgrim2017-04-141-1/+1
| | | | | | | | | | MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics. LLVM companion patch: D31767. Differential Revision: https://reviews.llvm.org/D31766 llvm-svn: 300326
* [X86][AVX512] Add _mm512_cvtsd_f64 and _mm512_cvtss_f32 intrinsics (PR32305)Simon Pilgrim2017-03-211-0/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D31155 llvm-svn: 298364
* [X86][AVX512][Clang][Intrinsics] Adding missing intrinsics to Clang .Igor Breger2017-03-191-0/+127
| | | | | | | | | | | | | | | | | | | Summary: Adding missing intrinsics : _mm512_set_epi16, _mm512_set_epi8, _mm512_permutevar_epi32 _mm512_mask_permutevar_epi32 Reviewers: zvi, guyblank, eladcohen, craig.topper Reviewed By: craig.topper Subscribers: craig.topper, cfe-commits Differential Revision: https://reviews.llvm.org/D31034 llvm-svn: 298208
* [x86] these aren't the undefs you're looking for (PR32176)Sanjay Patel2017-03-121-37/+37
| | | | | | | | | | | | | x86 has undef SSE/AVX intrinsics that should represent a bogus register operand. This is not the same as LLVM's undef value which can take on multiple bit patterns. There are better solutions / follow-ups to this discussed here: https://bugs.llvm.org/show_bug.cgi?id=32176 ...but this should prevent miscompiles with a one-line code change. Differential Revision: https://reviews.llvm.org/D30834 llvm-svn: 297588
* [AVX-512] Replace subvector broadcast builtins with shufflevectors and selects.Craig Topper2017-01-181-36/+44
| | | | | | Verified that the backend codegens this equally well. llvm-svn: 292329
* [AVX-512] Replace masked 512-bit pmuldq and pmuludq builtins with the newly ↵Craig Topper2016-12-271-4/+20
| | | | | | added unmasked versions and selects. llvm-svn: 290580
* Revert r290575 "[AVX-512] Replace masked 512-bit pmuldq and pmuludq builtins ↵Craig Topper2016-12-271-20/+4
| | | | | | | | with the newly added unmasked versions and selects." I failed to merge this with r290574. llvm-svn: 290578
* [AVX-512] Replace masked 512-bit pmuldq and pmuludq builtins with the newly ↵Craig Topper2016-12-271-4/+20
| | | | | | added unmasked versions and selects. llvm-svn: 290575
* [AVX-512] Remove masking from 512-bit vpermil builtins. The backend now has ↵Craig Topper2016-12-111-6/+10
| | | | | | | | versions without masking so wrap it with select. This will allow the backend to constant fold these to generic shuffle vectors like 128-bit and 256-bit without having to working about handling masking. llvm-svn: 289351
* [X86] Replace valignd/q builtins with appropriate __builtin_shufflevector.Craig Topper2016-11-231-6/+10
| | | | llvm-svn: 287733
* [X86][AVX512] Replace lossless i32/u32 to f64 conversion intrinsics with ↵Simon Pilgrim2016-11-161-12/+32
| | | | | | | | | | | | | | generic IR Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen. This patch removes the clang builtins and their use in the headers - a future patch will deal with removing the llvm intrinsics. This is an extension patch to D20528 which dealt with the equivalent sse/avx cases. Differential Revision: https://reviews.llvm.org/D26686 llvm-svn: 287088
* [AVX-512] Replace masked dword and qword variable shift builtins with ↵Craig Topper2016-11-131-18/+30
| | | | | | | | unmasked builtins and a select. This is part of a set of changes to allow InstCombine in the backend to optimize variable shifts without having to know about masking. llvm-svn: 286757
* [AVX-512] Use scalar vfmsub/vfnmsub mask3 intrinsics instead of inverting ↵Craig Topper2016-11-121-8/+8
| | | | | | | | | | | | | | the mask argument of a vfmadd intrinsic. Summary: Inverting the mask argument does not reflect the intended semantics of the intrinsic. Reviewers: igorb, delena Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D26019 llvm-svn: 286733
* [AVX-512] Convert the rest of the masked shift by immediate and by single ↵Craig Topper2016-11-121-36/+59
| | | | | | | | element builtins over to the newly added unmasked builtins and a select. This should also fix PR30691 since the new builtins are handled like the legacy builtins in the backend. llvm-svn: 286714
OpenPOWER on IntegriCloud