summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGen/avx2-builtins.c
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Use native shuffle vector for the perm2f128 intrinsicsCraig Topper2017-09-151-2/+2
| | | | | | | | | | This patch replaces the perm2f128 intrinsics with native shuffle vectors. This uses a pretty simple approach to allocate source 0 to the lower half input and source 1 to the upper half input. Then its just a matter of filling in the indices to use either the lower or upper half of that specific source. This can result in the same source being used by both operands. InstCombine or SelectionDAGBuilder should be able to clean that up. Differential Revision: https://reviews.llvm.org/D37892 llvm-svn: 313418
* [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (clang)Uriel Korach2017-09-131-3/+9
| | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D37693), implements the lowering of X86 ABS intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37694 llvm-svn: 313133
* [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IRYael Tsafrir2017-09-121-2/+14
| | | | | | Differential Revision: https://reviews.llvm.org/D37562 llvm-svn: 313011
* [X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (clang)Simon Pilgrim2017-04-141-1/+1
| | | | | | | | | | MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics. LLVM companion patch: D31767. Differential Revision: https://reviews.llvm.org/D31766 llvm-svn: 300326
* [x86] these aren't the undefs you're looking for (PR32176)Sanjay Patel2017-03-121-17/+17
| | | | | | | | | | | | | x86 has undef SSE/AVX intrinsics that should represent a bogus register operand. This is not the same as LLVM's undef value which can take on multiple bit patterns. There are better solutions / follow-ups to this discussed here: https://bugs.llvm.org/show_bug.cgi?id=32176 ...but this should prevent miscompiles with a one-line code change. Differential Revision: https://reviews.llvm.org/D30834 llvm-svn: 297588
* [X86] Remove the mm_malloc.h include guard hack from the X86 builtins testsElad Cohen2016-09-281-4/+2
| | | | | | | | | | | | The X86 clang/test/CodeGen/*builtins.c tests define the mm_malloc.h include guard as a hack for avoiding its inclusion (mm_malloc.h requires a hosted environment since it expects stdlib.h to be available - which is not the case in these internal clang codegen tests). This patch removes this hack and instead passes -ffreestanding to clang cc1. Differential Revision: https://reviews.llvm.org/D24825 llvm-svn: 282581
* After PR28761 use -Wall with -Werror in builtins tests to identifyEric Christopher2016-08-041-2/+2
| | | | | | possible problems in headers. llvm-svn: 277696
* [X86] Use native IR for immediate values 0-7 of packed fp cmp builtins. This ↵Craig Topper2016-07-061-3/+9
| | | | | | makes them the same as what is done when using the SSE builtins for these same encodings. llvm-svn: 274608
* [X86] Use undefined instead of setzero in shufflevector based intrinsics ↵Craig Topper2016-07-041-5/+5
| | | | | | when the second source is unused. Rewrite immediate extractions in shuffle intrinsics to be in ((c >> x) & y) form instead of ((c & z) >> x). This way only x varies between each use instead of having to vary x and z. llvm-svn: 274525
* [x86] generate IR for AVX2 integer min/max builtinsSanjay Patel2016-06-161-12/+24
| | | | | | | Sibling patch to r272932: http://reviews.llvm.org/rL272932 llvm-svn: 272933
* [x86] translate SSE packed FP comparison builtins to IRSanjay Patel2016-06-151-5/+15
| | | | | | | | | | | | | As noted in the code comment, a potential follow-on would be to remove the builtins themselves. Other than ord/unord, this already works as expected. Eg: typedef float v4sf __attribute__((__vector_size__(16))); v4sf fcmpgt(v4sf a, v4sf b) { return a > b; } Differential Revision: http://reviews.llvm.org/D21268 llvm-svn: 272840
* [X86] Handle AVX2 pslldqi and psrldqi intrinsics shufflevector creation ↵Craig Topper2016-06-091-4/+4
| | | | | | directly in the header file instead of in CGBuiltin.cpp. Simplify the sse2 equivalents as well. llvm-svn: 272246
* [X86][SSE] Replace VPMOVSX and (V)PMOVZX integer extension intrinsics with ↵Simon Pilgrim2016-05-281-12/+18
| | | | | | | | | | | | | | generic IR (clang) The VPMOVSX and (V)PMOVZX sign/zero extension intrinsics can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics. This patch removes the clang builtins and their use in the sse2/avx headers - a companion patch will remove/auto-upgrade the llvm intrinsics. Note: We already did this for SSE41 PMOVSX sometime ago. Differential Revision: http://reviews.llvm.org/D20684 llvm-svn: 271106
* [X86][AVX2] Improved checks for float/double mask generation for non-masked ↵Simon Pilgrim2016-05-261-0/+8
| | | | | | gathers llvm-svn: 270833
* [X86][AVX2] Full set of AVX2 intrinsics testsSimon Pilgrim2016-05-251-600/+745
| | | | | | llvm/test/CodeGen/X86/avx2-intrinsics-fast-isel.ll will be synced to this llvm-svn: 270708
* [X86][AVX2] Stripped backend codegen testsSimon Pilgrim2015-12-081-207/+1
| | | | | | | | As discussed on the ml, backend tests need to be put in llvm/test/CodeGen/X86 as fast-isel tests using IR that is as close to what is generated here as possible. The llvm tests will (re)added in a future commit. llvm-svn: 255050
* [X86] _mm256_permutevar8x32_ps should take an integer vector for its shuffle ↵Craig Topper2015-11-291-1/+1
| | | | | | index input. llvm-svn: 254270
* Canonicalize some of the x86 builtin tests and either remove or commentEric Christopher2015-10-141-5/+5
| | | | | | about optimization options. llvm-svn: 250271
* [Headers][X86] Fix stream_load (movntdqa) to accept const*.Ahmed Bougacha2015-10-021-1/+1
| | | | | | | | | | Per Intel intrinsics guide: - _mm256_stream_load_si256 takes `__m256i const *' - _mm_stream_load_si128 takes `__m128i *', for no good reason. Let's accept const* for both. llvm-svn: 249213
* Make test more resilient to FastIsel changes. NFC.Andrea Di Biagio2015-10-021-6/+6
| | | | | | | | | | | | | | | | | | | | | | Currently FastISel doesn't know how to select vector bitcasts. During instruction selection, fast-isel always falls back to SelectionDAG every time it encounters a vector bitcast. As a consequence of this, all the 'packed vector shift by immedate count' test cases in avx2-builtins.c are optimized by the DAGCombiner. In particular, the DAGCombiner would always fold trivial stack loads of constant shift counts into the operands of packed shift builtins. This behavior would start changing as soon as I reapply revision 249121. That revision would teach x86 fast-isel how to select bitcasts between vector types of the same size. As a consequence of that change, fast-isel would less often fall back to SelectionDAG. More importantly, DAGCombiner would no longer be able to simplify the code by folding the stack reload of a constant. No functional change. llvm-svn: 249142
* Fix the SSE4 byte sign extension in a cleaner way, and more thoroughlyChandler Carruth2015-10-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | test that our intrinsics behave the same under -fsigned-char and -funsigned-char. This further testing uncovered that AVX-2 has a broken cmpgt for 8-bit elements, and has for a long time. This is fixed in the same way as SSE4 handles the case. The other ISA extensions currently work correctly because they use specific instruction intrinsics. As soon as they are rewritten in terms of generic IR, they will need to add these special casts. I've added the necessary testing to catch this however, so we shouldn't have to chase it down again. I considered changing the core typedef to be signed, but that seems like a bad idea. Notably, it would be an ABI break if anyone is reaching into the innards of the intrinsic headers and passing __v16qi on an API boundary. I can't be completely confident that this wouldn't happen due to a macro expanding in a lambda, etc., so it seems much better to leave it alone. It also matches GCC's behavior exactly. A fun side note is that for both GCC and Clang, -funsigned-char really does change the semantics of __v16qi. To observe this, consider: % cat x.cc #include <smmintrin.h> #include <iostream> int main() { __v16qi a = { 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; __v16qi b = _mm_set1_epi8(-1); std::cout << (int)(a / b)[0] << ", " << (int)(a / b)[1] << '\n'; } % clang++ -o x x.cc && ./x -1, 1 % clang++ -funsigned-char -o x x.cc && ./x 0, 1 However, while this may be surprising, both Clang and GCC agree. Differential Revision: http://reviews.llvm.org/D13324 llvm-svn: 249097
* [Headers] Require x86-registered for r245987 codegen tests.Ahmed Bougacha2015-08-251-0/+2
| | | | llvm-svn: 245992
* [Headers][X86] Add -O0 assembly tests for avx2 intrinsics.Ahmed Bougacha2015-08-251-0/+229
| | | | | | | | | | | | We agreed for r245605 that, as long as we don't affect -O0 codegen too much, it's OK to use native constructs rather than intrinsics. Let's test that, starting with AVX2 here. See PR24580. Differential Revision: http://reviews.llvm.org/D12212 llvm-svn: 245987
* [Headers][X86] Use __builtin_shufflevector in AVX2 broadcasts.Ahmed Bougacha2015-08-201-11/+33
| | | | | | | | | | This lets us optimize them better. We agreed to remove the intrinsics, instead of combining them later, as, at -O0, we generate the expected instructions. Plus, it's a nice cleanup. Differential Revision: http://reviews.llvm.org/D10556 llvm-svn: 245605
* [X86] Add _mm_broadcastsd_pd intrinsicMichael Kuperstein2015-05-191-0/+5
| | | | | | _mm_broadcastsd_pd is basically an alias for _mm_movedup_pd, however the alias is only available from AVX2 forward. llvm-svn: 237698
* [X86] Added _mm256_bslli_epi128 and _mm256_bsrli_epi128.Michael Kuperstein2015-05-191-0/+10
| | | | | | These two intrinsics are alternative names for _mm256_slli_si256 and _mm256_srli_si256, respectively. llvm-svn: 237693
* [X86, AVX2] Replace inserti128 and extracti128 intrinsics with generic shufflesSanjay Patel2015-03-121-4/+32
| | | | | | | | | | | | This is nearly identical to the v*f128_si256 parts of r231792 and r232052. AVX2 introduced proper integer variants of the hacked integer insert/extract C intrinsics that were created for this same functionality with AVX1. This should complete the front end fixes for insert/extract128 intrinsics. Corresponding LLVM patch to follow. llvm-svn: 232109
* Lower _mm256_broadcastsi128_si256 directly to a vector shuffle.Juergen Ributzka2015-03-031-1/+1
| | | | | | | | | | | | | | Originally we were using the same GCC builtins to lower this AVX2 vector intrinsic. Instead we will now lower it directly to a vector shuffle. This will not only allow LLVM to generate better code, but it will also allow us to remove the GCC intrinsics. Reviewed by Andrea This is related to rdar://problem/18742778. llvm-svn: 231081
* Make tests independent of llvm variable naming.Manuel Klimek2015-02-171-1/+1
| | | | llvm-svn: 229484
* [X86] Convert palignr builtin handling to use shuffle form of right shift ↵Craig Topper2015-02-171-1/+1
| | | | | | instead of intrinsics. This should allow the instrinsics to removed from the backend. llvm-svn: 229474
* [X86] Teach clang to lower __builtin_ia32_psrldqi256 and ↵Craig Topper2015-02-161-2/+2
| | | | | | __builtin_ia32_pslldqi256 to vector shuffles the backend recognizes. This is a step towards removing the corresponding intrinsics from the backend. llvm-svn: 229348
* [x86] Clean up the x86 builtin specs to reflect r217310 in LLVM whichChandler Carruth2014-09-061-1/+1
| | | | | | | | | | | | | | | made the 8-bit masks actually 8-bit arguments to these intrinsics. These builtins are a mess. Many were missing the I qualifier which I added where obviously correct. Most aren't tested, but I've updated the relevant tests. I've tried to catch all the things that should become 'c' in this round. It's also frustrating because the set of these is really ad-hoc and doesn't really map that cleanly to the set supported by either GCC or LLVM. Oh well... llvm-svn: 217311
* Fixed a few tests and moved a comment to its proper placeFilipe Cabecinhas2014-05-131-5/+8
| | | | llvm-svn: 208665
* Patched clang to emit x86 blends as shufflevectors.Filipe Cabecinhas2014-05-131-5/+11
| | | | | | | | | | | | | | | | | Summary: Most of the clang header patch by Simon Pilgrim @ SCEE. Also fixed (or added) clang tests for these intrinsics. LLVM tests to make sure we get the blend instruction out of these shufflevectors are at http://reviews.llvm.org/D3600 Reviewers: eli.friedman, craig.topper, rafael Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D3601 llvm-svn: 208664
* Fix test to not depend on llvm optimizations.Michael J. Spencer2014-04-241-4/+4
| | | | llvm-svn: 207062
* Fix argument types of some AVX2 intrinsics.Eli Friedman2013-09-231-9/+9
| | | | | | | | This fix makes our headers consistent with gcc. PR17312. llvm-svn: 191248
* Fix the name and the type of the argument for intriniscJuergen Ributzka2013-08-171-2/+2
| | | | | | | | _mm256_broadcastsi128_si256 to align with the Intel documentation. This fixes bug PR 16581 and rdar:14747994. llvm-svn: 188609
* X86: add more GATHER intrinsics in ClangManman Ren2012-06-291-0/+66
| | | | | | | | | | | | | | Support the following intrinsics: _mm_i32gather_pd, _mm256_i32gather_pd, _mm_i64gather_pd, _mm256_i64gather_pd, _mm_i32gather_ps, _mm256_i32gather_ps, _mm_i64gather_ps, _mm256_i64gather_ps, _mm_i32gather_epi64, _mm256_i32gather_epi64, _mm_i64gather_epi64, _mm256_i64gather_epi64, _mm_i32gather_epi32, _mm256_i32gather_epi32, _mm_i64gather_epi32, _mm256_i64gather_epi32 llvm-svn: 159410
* X86: add more GATHER intrinsics in ClangManman Ren2012-06-291-3/+50
| | | | | | | | | | | | | | | Corrected type for index of _mm256_mask_i32gather_pd from 256-bit to 128-bit Corrected types for src|dst|mask of _mm256_mask_i64gather_ps from 256-bit to 128-bit Support the following intrinsics: _mm_mask_i32gather_epi64, _mm256_mask_i32gather_epi64, _mm_mask_i64gather_epi64, _mm256_mask_i64gather_epi64, _mm_mask_i32gather_epi32, _mm256_mask_i32gather_epi32, _mm_mask_i64gather_epi32, _mm256_mask_i64gather_epi32 llvm-svn: 159403
* X86: add GATHER intrinsics (AVX2) in ClangManman Ren2012-06-261-0/+43
| | | | | | | | | Support the following intrinsics: _mm_mask_i32gather_pd, _mm256_mask_i32gather_pd, _mm_mask_i64gather_pd _mm256_mask_i64gather_pd, _mm_mask_i32gather_ps, _mm256_mask_i32gather_ps _mm_mask_i64gather_ps, _mm256_mask_i64gather_ps llvm-svn: 159222
* Convert vperm2f128 and vperm2i128 intrinsics back to using llvm intrinsics. ↵Craig Topper2012-04-171-1/+1
| | | | | | Unfortunately, these instructions have behavior that can't be modeled with shuffle vector. llvm-svn: 154906
* Change _mm256_permute4x64_epi64 and _mm256_permute4x64_pd to use ↵Craig Topper2012-04-151-2/+2
| | | | | | builtin_shufflevector instead of specific builtins. Old builtins will be removed from llvm now that vpermq/vpermpd are supported by shuffle lowering code. llvm-svn: 154777
* Remove vperm2f* and vperm2i builtins. Same effect can be achieved with ↵Craig Topper2012-02-081-2/+2
| | | | | | builtin_shufflevector. llvm-svn: 150064
* Add last of the AVX2 intrinsics except for gather.Craig Topper2011-12-241-0/+100
| | | | llvm-svn: 147253
* Add AVX2 permute intrinsics. Also add parentheses on some macro arguments in ↵Craig Topper2011-12-241-1/+66
| | | | | | other intrinsic headers. llvm-svn: 147242
* Add AVX2 intrinsics for FP vbroadcast, vbroadcasti128, and vpblendd.Craig Topper2011-12-241-0/+35
| | | | llvm-svn: 147240
* Intrinsics for AVX2 unpack instructions.Craig Topper2011-12-241-0/+40
| | | | llvm-svn: 147237
* More AVX2 intrinsics for shift, psign, some shuffles, and psadbw.Craig Topper2011-12-241-82/+207
| | | | llvm-svn: 147236
* Add AVX2 multiply intrinsics.Craig Topper2011-12-231-0/+35
| | | | llvm-svn: 147219
* Add AVX2 intrinsics for max, min, sign extend, and zero extend.Craig Topper2011-12-221-0/+125
| | | | llvm-svn: 147141
OpenPOWER on IntegriCloud