summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGen/sse2-builtins.c
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Add test cases for _mm_movepi64_pi64 and _mm_movpi64_epi64.Craig Topper2019-08-151-0/+17
| | | | llvm-svn: 368969
* [X86] Add guards to some of the x86 intrinsic tests to skip 64-bit mode only ↵Craig Topper2019-07-101-0/+10
| | | | | | | | | | intrinsics when compiled for 32-bit mode. All the command lines are for 64-bit mode, but sometimes I compile the tests in 32-bit mode to see what assembly we get and we need to skip these to do that. llvm-svn: 365668
* [X86] Restore the pavg intrinsics.Craig Topper2019-04-151-14/+2
| | | | | | | | | | | | | | | The pattern we replaced these with may be too hard to match as demonstrated by PR41496 and PR41316. This patch restores the intrinsics and then we can start focusing on the optimizing the intrinsics. I've mostly reverted the original patch that removed them. Though I modified the avx512 intrinsics to not have masking built in. Differential Revision: https://reviews.llvm.org/D60674 llvm-svn: 358427
* [X86] Follow up to r353878, add MSVC compatibility command lines to other ↵Craig Topper2019-02-121-0/+1
| | | | | | | | intrinsic tests that uses packed structs to control alignment. r353878 fixed a bug in _mm_loadu_ps and added a command line to catch it. Adding additional command lines to prevent breaking other intrinsics in the future. llvm-svn: 353887
* [X86] Add shift-by-immediate tests for non-immediate/out-of-range valuesSimon Pilgrim2019-01-081-0/+96
| | | | | | As noted on PR40203, for gcc compatibility we need to support non-immediate values in the 'slli/srli/srai' shift by immediate vector intrinsics. llvm-svn: 350619
* [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic ↵Simon Pilgrim2018-12-201-4/+4
| | | | | | | | | | | | intrinsics (clang) This emits SADD_SAT/SSUB_SAT generic intrinsics for the SSE signed saturated math intrinsics. LLVM counterpart: https://reviews.llvm.org/D55894 Differential Revision: https://reviews.llvm.org/D55890 llvm-svn: 349743
* [X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT ↵Simon Pilgrim2018-12-191-12/+4
| | | | | | | | | | generic intrinsics (clang) Sibling patch to D55855, this emits UADD_SAT/USUB_SAT generic intrinsics for the SSE saturated math intrinsics instead of expanding to a IR code sequence that could be difficult to reassemble. Differential Revision: https://reviews.llvm.org/D55879 llvm-svn: 349631
* [X86] Add more of the icc unaligned load/store to/from 128 bit vector intrinsicsCraig Topper2018-09-291-0/+48
| | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds _mm_loadu_si32 _mm_loadu_si16 _mm_storeu_si64 _mm_storeu_si32 _mm_storeu_si16 We already had _mm_load_si64. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D52665 llvm-svn: 343388
* [X86] Lowering addus/subus intrinsics to native IRTomasz Krupa2018-08-141-4/+16
| | | | | | | | | | | | | | Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. Reviewers: craig.topper, spatel, RKSimon Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D46892 llvm-svn: 339651
* [X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that ↵Craig Topper2018-07-101-3/+1
| | | | | | | | | | | | is suitable for use in scalar mask intrinsics. This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics. This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling. I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle. llvm-svn: 336622
* [X86] Update handling in CGBuiltin to be tolerant of out of range immediates.Craig Topper2018-06-211-2/+2
| | | | | | | | D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled. This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled. llvm-svn: 335308
* [X86] Lowering sqrt intrinsics to native IRTomasz Krupa2018-06-151-6/+4
| | | | | | | | | | | | Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, cfe-commits Differential Revision: https://reviews.llvm.org/D41168 llvm-svn: 334850
* [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature ↵Craig Topper2018-06-081-3/+3
| | | | | | and immediate range checking. llvm-svn: 334265
* [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar ↵Craig Topper2018-06-071-6/+6
| | | | | | | | | | intrinsics. We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits. This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously. llvm-svn: 334208
* [X86] Add builtins for vector element insert and extract for different 128 ↵Craig Topper2018-06-061-6/+4
| | | | | | | | | | | | | | and 256 bit vector types. Use them to implement the extract and insert intrinsics. Previously we were just using extended vector operations in the header file. This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load. By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count. User code still has the option to use extended vector operations themselves if they need non-constant indexing. llvm-svn: 334057
* [X86] NFC Include immintrin.h in CodeGen testsGabor Buella2018-05-241-1/+1
| | | | | | | Following r333110: "Move all Intel defined intrinsic includes into immintrin.h" llvm-svn: 333160
* [X86] Use __builtin_convertvector to implement some of the packed integer to ↵Craig Topper2018-05-211-1/+1
| | | | | | | | | | | | packed float conversion intrinsics. I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions. We already do something similar for scalar conversions. Differential Revision: https://reviews.llvm.org/D46863 llvm-svn: 332882
* [x86] Revert r330322 (& r330323): Lowering x86 adds/addus/subs/subus intrinsicsChandler Carruth2018-04-261-58/+8
| | | | | | | | The LLVM commit introduces a crash in LLVM's instruction selection. I filed http://llvm.org/PR37260 with the test case. llvm-svn: 330997
* Lowering x86 adds/addus/subs/subus intrinsics (clang)Alexander Ivchenko2018-04-191-8/+58
| | | | | | | | | | | This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D44786 llvm-svn: 330323
* [X86] Emit native IR for pmuldq/pmuludq builtins.Craig Topper2018-04-091-1/+3
| | | | | | | | I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts. Differential Revision: https://reviews.llvm.org/D45421 llvm-svn: 329605
* [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IRYael Tsafrir2017-09-121-2/+14
| | | | | | Differential Revision: https://reviews.llvm.org/D37562 llvm-svn: 313011
* [X86][SSE] Add _mm_set_pd1 (PR32827)Simon Pilgrim2017-04-281-0/+7
| | | | | | Matches _mm_set_ps1 implementation llvm-svn: 301637
* [x86] these aren't the undefs you're looking for (PR32176)Sanjay Patel2017-03-121-2/+2
| | | | | | | | | | | | | x86 has undef SSE/AVX intrinsics that should represent a bogus register operand. This is not the same as LLVM's undef value which can take on multiple bit patterns. There are better solutions / follow-ups to this discussed here: https://bugs.llvm.org/show_bug.cgi?id=32176 ...but this should prevent miscompiles with a one-line code change. Differential Revision: https://reviews.llvm.org/D30834 llvm-svn: 297588
* [X86] Remove the mm_malloc.h include guard hack from the X86 builtins testsElad Cohen2016-09-281-4/+2
| | | | | | | | | | | | The X86 clang/test/CodeGen/*builtins.c tests define the mm_malloc.h include guard as a hack for avoiding its inclusion (mm_malloc.h requires a hosted environment since it expects stdlib.h to be available - which is not the case in these internal clang codegen tests). This patch removes this hack and instead passes -ffreestanding to clang cc1. Differential Revision: https://reviews.llvm.org/D24825 llvm-svn: 282581
* [X86] Use v2i64 vectors to implement _mm_and/andn/or/xor_pd.Craig Topper2016-08-311-5/+5
| | | | | | These will be reused when removing some builtins from avx512vldqintrin.h and this will make the tests for that change show a better number of vector elements. llvm-svn: 280196
* After PR28761 use -Wall with -Werror in builtins tests to identifyEric Christopher2016-08-041-2/+2
| | | | | | possible problems in headers. llvm-svn: 277696
* [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using ↵Simon Pilgrim2016-07-201-6/+4
| | | | | | | | | | | | | | | | generic IR D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead. It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match). This patch changes both scalar and packed versions back to using x86-specific builtins. It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding. Differential Revision: https://reviews.llvm.org/D22105 llvm-svn: 276102
* [X86][SSE2] Updated tests to match ↵Simon Pilgrim2016-06-291-9/+8
| | | | | | llvm\test\CodeGen\X86\sse2-intrinsics-fast-isel-x86_64.ll llvm-svn: 274126
* [X86] add _mm_loadu_si64Asaf Badouh2016-06-261-0/+9
| | | | | | Differential Revision: http://reviews.llvm.org/D21504 llvm-svn: 273812
* [X86] Fix pslldq/psrldq intrinsics to not fail compilation with immediates ↵Craig Topper2016-06-251-0/+12
| | | | | | larger than 16. This was accidentally broken in r272246. llvm-svn: 273775
* [x86] translate SSE packed FP comparison builtins to IRSanjay Patel2016-06-151-12/+48
| | | | | | | | | | | | | As noted in the code comment, a potential follow-on would be to remove the builtins themselves. Other than ord/unord, this already works as expected. Eg: typedef float v4sf __attribute__((__vector_size__(16))); v4sf fcmpgt(v4sf a, v4sf b) { return a > b; } Differential Revision: http://reviews.llvm.org/D21268 llvm-svn: 272840
* [x86] generate IR for SSE integer min/max builtinsSanjay Patel2016-06-151-4/+8
| | | | | | | Sibling patch to r272806: http://reviews.llvm.org/rL272806 llvm-svn: 272807
* [X86][SSE] Replace (V)CVTTPS2DQ and VCVTTPD2DQ truncating (round to zero) ↵Simon Pilgrim2016-06-011-1/+1
| | | | | | | | | | | | f32/f64 to i32 with generic IR (clang) The 'cvtt' truncation (round to zero) conversions can be safely represented as generic __builtin_convertvector (fptosi) calls instead of x86 intrinsics. We already do this (implicitly) for the scalar equivalents. Note: I looked at updating _mm_cvttpd_epi32 as well but this still requires a lot more backend work to correctly lower (both for debug and optimized builds). Differential Revision: http://reviews.llvm.org/D20859 llvm-svn: 271436
* [X86] Ensure load/store tests unaligned pointers really are align 1Simon Pilgrim2016-05-301-5/+5
| | | | llvm-svn: 271227
* [X86][SSE] _mm_store1_ps/_mm_store1_pd should require an aligned pointerSimon Pilgrim2016-05-301-3/+9
| | | | | | | | | | | | | | | | According to the gcc headers, intel intrinsics docs and msdn codegen the _mm_store1_pd (and its _mm_store_pd1 equivalent) should use an aligned pointer - the clang headers are the only implementation I can find that assume non-aligned stores (by storing with _mm_storeu_pd). Additionally, according to the intel intrinsics docs and msdn codegen the _mm_store1_ps (_mm_store_ps1) requires a similarly aligned pointer. This patch raises the alignment requirements to match the other implementations by calling _mm_store_ps/_mm_store_pd instead. I've also added the missing _mm_store_pd1 intrinsic (which maps to _mm_store1_pd like _mm_store_ps1 does to _mm_store1_ps). As a followup I'll update the llvm fast-isel tests to match this codegen. Differential Revision: http://reviews.llvm.org/D20617 llvm-svn: 271218
* [X86] Replace unaligned store builtins in SSE/AVX intrinsic files with code ↵Craig Topper2016-05-301-2/+4
| | | | | | | | that will compile to a native unaligned store. Remove the builtins since they are no longer used. Intrinsics will be removed from llvm in a future commit. llvm-svn: 271214
* [X86] Update test cases to make sure storeu builtins use the storeu ↵Craig Topper2016-05-251-2/+2
| | | | | | | | instrinsics. We were previously matching on other stores in the IR from this being an -O0 test. We should probably look into making the storeu builtins just emit a normal store with an alignment of 1. llvm-svn: 270664
* [X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IRSimon Pilgrim2016-05-231-2/+4
| | | | | | | | | | Both the (V)CVTDQ2PD(Y) (i32 to f64) and (V)CVTPS2PD(Y) (f32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen. This patch removes the clang builtins and their use in the sse2/avx headers - a future patch will deal with removing the llvm intrinsics, but that will require a bit more work. Differential Revision: http://reviews.llvm.org/D20528 llvm-svn: 270499
* [X86][SSE2] Fixed shuffle of results in _mm_cmpnge_sd/_mm_cmpngt_sd testsSimon Pilgrim2016-05-191-0/+8
| | | | llvm-svn: 270079
* [X86][SSE2] Added _mm_move_* testsSimon Pilgrim2016-05-191-0/+15
| | | | llvm-svn: 270043
* [X86][SSE2] Added _mm_cast* and _mm_set* testsSimon Pilgrim2016-05-191-0/+236
| | | | llvm-svn: 270042
* [X86][SSE2] Sync with llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.llSimon Pilgrim2016-05-191-53/+107
| | | | llvm-svn: 270034
* Revert r269967 (SSE2 builtin checks) due to failed buildbotsSimon Pilgrim2016-05-181-98/+52
| | | | llvm-svn: 269970
* [X86][SSE2] Sync with llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.llSimon Pilgrim2016-05-181-52/+98
| | | | llvm-svn: 269967
* [X86][SSE] Tidied up MMX/SSE/SSE2 builtin tests to the correct test fileSimon Pilgrim2016-05-171-0/+54
| | | | llvm-svn: 269852
* [X86] Stripped backend codegen testsSimon Pilgrim2015-12-031-926/+375
| | | | | | | | | | As discussed on the ml, backend tests need to be put in llvm/test/CodeGen/X86 as fast-isel tests using IR that is as close to what is generated here as possible. The llvm tests will (re)added in a future commit I will update PR24580 on this new plan llvm-svn: 254594
* [X86][SSE2] Added SSE2 IR + assembly codegen builtin testsSimon Pilgrim2015-11-291-0/+1656
Improved tests as discussed in PR24580 llvm-svn: 254262
OpenPOWER on IntegriCloud