summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGen/avx2-builtins.c
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Restore the pavg intrinsics.Craig Topper2019-04-151-14/+2
| | | | | | | | | | | | | | | The pattern we replaced these with may be too hard to match as demonstrated by PR41496 and PR41316. This patch restores the intrinsics and then we can start focusing on the optimizing the intrinsics. I've mostly reverted the original patch that removed them. Though I modified the avx512 intrinsics to not have masking built in. Differential Revision: https://reviews.llvm.org/D60674 llvm-svn: 358427
* [X86] Add shift-by-immediate tests for non-immediate/out-of-range valuesSimon Pilgrim2019-01-081-0/+48
| | | | | | As noted on PR40203, for gcc compatibility we need to support non-immediate values in the 'slli/srli/srai' shift by immediate vector intrinsics. llvm-svn: 350619
* [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic ↵Simon Pilgrim2018-12-201-4/+4
| | | | | | | | | | | | intrinsics (clang) This emits SADD_SAT/SSUB_SAT generic intrinsics for the SSE signed saturated math intrinsics. LLVM counterpart: https://reviews.llvm.org/D55894 Differential Revision: https://reviews.llvm.org/D55890 llvm-svn: 349743
* [X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT ↵Simon Pilgrim2018-12-191-12/+4
| | | | | | | | | | generic intrinsics (clang) Sibling patch to D55855, this emits UADD_SAT/USUB_SAT generic intrinsics for the SSE saturated math intrinsics instead of expanding to a IR code sequence that could be difficult to reassemble. Differential Revision: https://reviews.llvm.org/D55879 llvm-svn: 349631
* [X86] Lowering addus/subus intrinsics to native IRTomasz Krupa2018-08-141-4/+16
| | | | | | | | | | | | | | Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. Reviewers: craig.topper, spatel, RKSimon Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D46892 llvm-svn: 339651
* [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IRGabor Buella2018-06-221-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Lowering some vector comparision builtins to fcmp IR instructions. This ignores the signaling behaviour specified in the predicate argument of said builtins. Affected AVX512 builtins: __builtin_ia32_cmpps128_mask __builtin_ia32_cmpps256_mask __builtin_ia32_cmpps512_mask __builtin_ia32_cmppd128_mask __builtin_ia32_cmppd256_mask __builtin_ia32_cmppd512_mask Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: craig.topper, spatel, efriedma Differential Revision: https://reviews.llvm.org/D45616 llvm-svn: 335339
* [X86] Add builtins for vpermq/vpermpd instructions to enable target feature ↵Craig Topper2018-06-081-2/+2
| | | | | | checking. llvm-svn: 334311
* [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature ↵Craig Topper2018-06-081-3/+3
| | | | | | and immediate range checking. llvm-svn: 334265
* [X86] Add subvector insert and extract builtins to enable target feature ↵Craig Topper2018-06-081-8/+8
| | | | | | | | checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261
* [X86] Add builtins for blend with immediate control to enforce target ↵Craig Topper2018-06-081-1/+1
| | | | | | feature requirements and check immediate range. llvm-svn: 334249
* [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar ↵Craig Topper2018-06-071-4/+4
| | | | | | | | | | intrinsics. We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits. This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously. llvm-svn: 334208
* [X86] NFC Include immintrin.h in CodeGen testsGabor Buella2018-05-241-1/+1
| | | | | | | Following r333110: "Move all Intel defined intrinsic includes into immintrin.h" llvm-svn: 333160
* [x86] Revert r330322 (& r330323): Lowering x86 adds/addus/subs/subus intrinsicsChandler Carruth2018-04-261-58/+8
| | | | | | | | The LLVM commit introduces a crash in LLVM's instruction selection. I filed http://llvm.org/PR37260 with the test case. llvm-svn: 330997
* Lowering x86 adds/addus/subs/subus intrinsics (clang)Alexander Ivchenko2018-04-191-8/+58
| | | | | | | | | | | This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D44786 llvm-svn: 330323
* [X86] Emit native IR for pmuldq/pmuludq builtins.Craig Topper2018-04-091-2/+8
| | | | | | | | I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts. Differential Revision: https://reviews.llvm.org/D45421 llvm-svn: 329605
* [X86] Use native shuffle vector for the perm2f128 intrinsicsCraig Topper2017-09-151-2/+2
| | | | | | | | | | This patch replaces the perm2f128 intrinsics with native shuffle vectors. This uses a pretty simple approach to allocate source 0 to the lower half input and source 1 to the upper half input. Then its just a matter of filling in the indices to use either the lower or upper half of that specific source. This can result in the same source being used by both operands. InstCombine or SelectionDAGBuilder should be able to clean that up. Differential Revision: https://reviews.llvm.org/D37892 llvm-svn: 313418
* [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (clang)Uriel Korach2017-09-131-3/+9
| | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D37693), implements the lowering of X86 ABS intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37694 llvm-svn: 313133
* [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IRYael Tsafrir2017-09-121-2/+14
| | | | | | Differential Revision: https://reviews.llvm.org/D37562 llvm-svn: 313011
* [X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (clang)Simon Pilgrim2017-04-141-1/+1
| | | | | | | | | | MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics. LLVM companion patch: D31767. Differential Revision: https://reviews.llvm.org/D31766 llvm-svn: 300326
* [x86] these aren't the undefs you're looking for (PR32176)Sanjay Patel2017-03-121-17/+17
| | | | | | | | | | | | | x86 has undef SSE/AVX intrinsics that should represent a bogus register operand. This is not the same as LLVM's undef value which can take on multiple bit patterns. There are better solutions / follow-ups to this discussed here: https://bugs.llvm.org/show_bug.cgi?id=32176 ...but this should prevent miscompiles with a one-line code change. Differential Revision: https://reviews.llvm.org/D30834 llvm-svn: 297588
* [X86] Remove the mm_malloc.h include guard hack from the X86 builtins testsElad Cohen2016-09-281-4/+2
| | | | | | | | | | | | The X86 clang/test/CodeGen/*builtins.c tests define the mm_malloc.h include guard as a hack for avoiding its inclusion (mm_malloc.h requires a hosted environment since it expects stdlib.h to be available - which is not the case in these internal clang codegen tests). This patch removes this hack and instead passes -ffreestanding to clang cc1. Differential Revision: https://reviews.llvm.org/D24825 llvm-svn: 282581
* After PR28761 use -Wall with -Werror in builtins tests to identifyEric Christopher2016-08-041-2/+2
| | | | | | possible problems in headers. llvm-svn: 277696
* [X86] Use native IR for immediate values 0-7 of packed fp cmp builtins. This ↵Craig Topper2016-07-061-3/+9
| | | | | | makes them the same as what is done when using the SSE builtins for these same encodings. llvm-svn: 274608
* [X86] Use undefined instead of setzero in shufflevector based intrinsics ↵Craig Topper2016-07-041-5/+5
| | | | | | when the second source is unused. Rewrite immediate extractions in shuffle intrinsics to be in ((c >> x) & y) form instead of ((c & z) >> x). This way only x varies between each use instead of having to vary x and z. llvm-svn: 274525
* [x86] generate IR for AVX2 integer min/max builtinsSanjay Patel2016-06-161-12/+24
| | | | | | | Sibling patch to r272932: http://reviews.llvm.org/rL272932 llvm-svn: 272933
* [x86] translate SSE packed FP comparison builtins to IRSanjay Patel2016-06-151-5/+15
| | | | | | | | | | | | | As noted in the code comment, a potential follow-on would be to remove the builtins themselves. Other than ord/unord, this already works as expected. Eg: typedef float v4sf __attribute__((__vector_size__(16))); v4sf fcmpgt(v4sf a, v4sf b) { return a > b; } Differential Revision: http://reviews.llvm.org/D21268 llvm-svn: 272840
* [X86] Handle AVX2 pslldqi and psrldqi intrinsics shufflevector creation ↵Craig Topper2016-06-091-4/+4
| | | | | | directly in the header file instead of in CGBuiltin.cpp. Simplify the sse2 equivalents as well. llvm-svn: 272246
* [X86][SSE] Replace VPMOVSX and (V)PMOVZX integer extension intrinsics with ↵Simon Pilgrim2016-05-281-12/+18
| | | | | | | | | | | | | | generic IR (clang) The VPMOVSX and (V)PMOVZX sign/zero extension intrinsics can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics. This patch removes the clang builtins and their use in the sse2/avx headers - a companion patch will remove/auto-upgrade the llvm intrinsics. Note: We already did this for SSE41 PMOVSX sometime ago. Differential Revision: http://reviews.llvm.org/D20684 llvm-svn: 271106
* [X86][AVX2] Improved checks for float/double mask generation for non-masked ↵Simon Pilgrim2016-05-261-0/+8
| | | | | | gathers llvm-svn: 270833
* [X86][AVX2] Full set of AVX2 intrinsics testsSimon Pilgrim2016-05-251-600/+745
| | | | | | llvm/test/CodeGen/X86/avx2-intrinsics-fast-isel.ll will be synced to this llvm-svn: 270708
* [X86][AVX2] Stripped backend codegen testsSimon Pilgrim2015-12-081-207/+1
| | | | | | | | As discussed on the ml, backend tests need to be put in llvm/test/CodeGen/X86 as fast-isel tests using IR that is as close to what is generated here as possible. The llvm tests will (re)added in a future commit. llvm-svn: 255050
* [X86] _mm256_permutevar8x32_ps should take an integer vector for its shuffle ↵Craig Topper2015-11-291-1/+1
| | | | | | index input. llvm-svn: 254270
* Canonicalize some of the x86 builtin tests and either remove or commentEric Christopher2015-10-141-5/+5
| | | | | | about optimization options. llvm-svn: 250271
* [Headers][X86] Fix stream_load (movntdqa) to accept const*.Ahmed Bougacha2015-10-021-1/+1
| | | | | | | | | | Per Intel intrinsics guide: - _mm256_stream_load_si256 takes `__m256i const *' - _mm_stream_load_si128 takes `__m128i *', for no good reason. Let's accept const* for both. llvm-svn: 249213
* Make test more resilient to FastIsel changes. NFC.Andrea Di Biagio2015-10-021-6/+6
| | | | | | | | | | | | | | | | | | | | | | Currently FastISel doesn't know how to select vector bitcasts. During instruction selection, fast-isel always falls back to SelectionDAG every time it encounters a vector bitcast. As a consequence of this, all the 'packed vector shift by immedate count' test cases in avx2-builtins.c are optimized by the DAGCombiner. In particular, the DAGCombiner would always fold trivial stack loads of constant shift counts into the operands of packed shift builtins. This behavior would start changing as soon as I reapply revision 249121. That revision would teach x86 fast-isel how to select bitcasts between vector types of the same size. As a consequence of that change, fast-isel would less often fall back to SelectionDAG. More importantly, DAGCombiner would no longer be able to simplify the code by folding the stack reload of a constant. No functional change. llvm-svn: 249142
* Fix the SSE4 byte sign extension in a cleaner way, and more thoroughlyChandler Carruth2015-10-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | test that our intrinsics behave the same under -fsigned-char and -funsigned-char. This further testing uncovered that AVX-2 has a broken cmpgt for 8-bit elements, and has for a long time. This is fixed in the same way as SSE4 handles the case. The other ISA extensions currently work correctly because they use specific instruction intrinsics. As soon as they are rewritten in terms of generic IR, they will need to add these special casts. I've added the necessary testing to catch this however, so we shouldn't have to chase it down again. I considered changing the core typedef to be signed, but that seems like a bad idea. Notably, it would be an ABI break if anyone is reaching into the innards of the intrinsic headers and passing __v16qi on an API boundary. I can't be completely confident that this wouldn't happen due to a macro expanding in a lambda, etc., so it seems much better to leave it alone. It also matches GCC's behavior exactly. A fun side note is that for both GCC and Clang, -funsigned-char really does change the semantics of __v16qi. To observe this, consider: % cat x.cc #include <smmintrin.h> #include <iostream> int main() { __v16qi a = { 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; __v16qi b = _mm_set1_epi8(-1); std::cout << (int)(a / b)[0] << ", " << (int)(a / b)[1] << '\n'; } % clang++ -o x x.cc && ./x -1, 1 % clang++ -funsigned-char -o x x.cc && ./x 0, 1 However, while this may be surprising, both Clang and GCC agree. Differential Revision: http://reviews.llvm.org/D13324 llvm-svn: 249097
* [Headers] Require x86-registered for r245987 codegen tests.Ahmed Bougacha2015-08-251-0/+2
| | | | llvm-svn: 245992
* [Headers][X86] Add -O0 assembly tests for avx2 intrinsics.Ahmed Bougacha2015-08-251-0/+229
| | | | | | | | | | | | We agreed for r245605 that, as long as we don't affect -O0 codegen too much, it's OK to use native constructs rather than intrinsics. Let's test that, starting with AVX2 here. See PR24580. Differential Revision: http://reviews.llvm.org/D12212 llvm-svn: 245987
* [Headers][X86] Use __builtin_shufflevector in AVX2 broadcasts.Ahmed Bougacha2015-08-201-11/+33
| | | | | | | | | | This lets us optimize them better. We agreed to remove the intrinsics, instead of combining them later, as, at -O0, we generate the expected instructions. Plus, it's a nice cleanup. Differential Revision: http://reviews.llvm.org/D10556 llvm-svn: 245605
* [X86] Add _mm_broadcastsd_pd intrinsicMichael Kuperstein2015-05-191-0/+5
| | | | | | _mm_broadcastsd_pd is basically an alias for _mm_movedup_pd, however the alias is only available from AVX2 forward. llvm-svn: 237698
* [X86] Added _mm256_bslli_epi128 and _mm256_bsrli_epi128.Michael Kuperstein2015-05-191-0/+10
| | | | | | These two intrinsics are alternative names for _mm256_slli_si256 and _mm256_srli_si256, respectively. llvm-svn: 237693
* [X86, AVX2] Replace inserti128 and extracti128 intrinsics with generic shufflesSanjay Patel2015-03-121-4/+32
| | | | | | | | | | | | This is nearly identical to the v*f128_si256 parts of r231792 and r232052. AVX2 introduced proper integer variants of the hacked integer insert/extract C intrinsics that were created for this same functionality with AVX1. This should complete the front end fixes for insert/extract128 intrinsics. Corresponding LLVM patch to follow. llvm-svn: 232109
* Lower _mm256_broadcastsi128_si256 directly to a vector shuffle.Juergen Ributzka2015-03-031-1/+1
| | | | | | | | | | | | | | Originally we were using the same GCC builtins to lower this AVX2 vector intrinsic. Instead we will now lower it directly to a vector shuffle. This will not only allow LLVM to generate better code, but it will also allow us to remove the GCC intrinsics. Reviewed by Andrea This is related to rdar://problem/18742778. llvm-svn: 231081
* Make tests independent of llvm variable naming.Manuel Klimek2015-02-171-1/+1
| | | | llvm-svn: 229484
* [X86] Convert palignr builtin handling to use shuffle form of right shift ↵Craig Topper2015-02-171-1/+1
| | | | | | instead of intrinsics. This should allow the instrinsics to removed from the backend. llvm-svn: 229474
* [X86] Teach clang to lower __builtin_ia32_psrldqi256 and ↵Craig Topper2015-02-161-2/+2
| | | | | | __builtin_ia32_pslldqi256 to vector shuffles the backend recognizes. This is a step towards removing the corresponding intrinsics from the backend. llvm-svn: 229348
* [x86] Clean up the x86 builtin specs to reflect r217310 in LLVM whichChandler Carruth2014-09-061-1/+1
| | | | | | | | | | | | | | | made the 8-bit masks actually 8-bit arguments to these intrinsics. These builtins are a mess. Many were missing the I qualifier which I added where obviously correct. Most aren't tested, but I've updated the relevant tests. I've tried to catch all the things that should become 'c' in this round. It's also frustrating because the set of these is really ad-hoc and doesn't really map that cleanly to the set supported by either GCC or LLVM. Oh well... llvm-svn: 217311
* Fixed a few tests and moved a comment to its proper placeFilipe Cabecinhas2014-05-131-5/+8
| | | | llvm-svn: 208665
* Patched clang to emit x86 blends as shufflevectors.Filipe Cabecinhas2014-05-131-5/+11
| | | | | | | | | | | | | | | | | Summary: Most of the clang header patch by Simon Pilgrim @ SCEE. Also fixed (or added) clang tests for these intrinsics. LLVM tests to make sure we get the blend instruction out of these shufflevectors are at http://reviews.llvm.org/D3600 Reviewers: eli.friedman, craig.topper, rafael Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D3601 llvm-svn: 208664
* Fix test to not depend on llvm optimizations.Michael J. Spencer2014-04-241-4/+4
| | | | llvm-svn: 207062
OpenPOWER on IntegriCloud