summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGen/avx-builtins.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix reliance on -flax-vector-conversions in AVX intrinsics headers andRichard Smith2019-09-171-6/+6
| | | | | | corresponding tests. llvm-svn: 372063
* [X86] Add guards to some of the x86 intrinsic tests to skip 64-bit mode only ↵Craig Topper2019-07-101-0/+4
| | | | | | | | | | intrinsics when compiled for 32-bit mode. All the command lines are for 64-bit mode, but sometimes I compile the tests in 32-bit mode to see what assembly we get and we need to skip these to do that. llvm-svn: 365668
* [X86] Follow up to r353878, add MSVC compatibility command lines to other ↵Craig Topper2019-02-121-0/+1
| | | | | | | | intrinsic tests that uses packed structs to control alignment. r353878 fixed a bug in _mm_loadu_ps and added a command line to catch it. Adding additional command lines to prevent breaking other intrinsics in the future. llvm-svn: 353887
* [X86] Fix various type mismatches in intrinsic headers and intrinsic tests ↵Craig Topper2018-07-071-7/+7
| | | | | | | | that cause extra bitcasts to be emitted in the IR. Found via imprecise grepping of the -O0 IR. There could still be more bugs out there. llvm-svn: 336487
* [X86] Fix some vector cmp builtins - TRUE/FALSE predicatesGabor Buella2018-07-051-32/+16
| | | | | | | | | | | | | | | | | This patch removes on optimization used with the TRUE/FALSE predicates, as was suggested in https://reviews.llvm.org/D45616 for r335339. The optimization was buggy, since r335339 used it also for *_mask builtins, without actually applying the mask -- the mask argument was just ignored. Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D48715 llvm-svn: 336355
* [X86] NFC - add more test cases for vector cmp intrinsicsGabor Buella2018-07-051-112/+776
| | | | | | | | | | | | | | | | | | | | | | | | | | Add test cases with each predicate using the following intrinsics: _mm_cmp_pd _mm_cmp_ps _mm256_cmp_pd _mm256_cmp_ps _mm_cmp_pd_mask _mm_cmp_ps_mask _mm256_cmp_pd_mask _mm256_cmp_ps_mask _mm512_cmp_pd_mask _mm512_cmp_ps_mask _mm_mask_cmp_pd_mask _mm_mask_cmp_ps_mask _mm256_mask_cmp_pd_mask _mm256_mask_cmp_ps_mask _mm512_mask_cmp_pd_mask _mm512_mask_cmp_ps_mask Some of these are marked with FIXME, as there is bug in lowering e.g. _mm512_mask_cmp_ps_mask. llvm-svn: 336346
* [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IRGabor Buella2018-06-221-33/+81
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Lowering some vector comparision builtins to fcmp IR instructions. This ignores the signaling behaviour specified in the predicate argument of said builtins. Affected AVX512 builtins: __builtin_ia32_cmpps128_mask __builtin_ia32_cmpps256_mask __builtin_ia32_cmpps512_mask __builtin_ia32_cmppd128_mask __builtin_ia32_cmppd256_mask __builtin_ia32_cmppd512_mask Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: craig.topper, spatel, efriedma Differential Revision: https://reviews.llvm.org/D45616 llvm-svn: 335339
* [X86] Update handling in CGBuiltin to be tolerant of out of range immediates.Craig Topper2018-06-211-8/+8
| | | | | | | | D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled. This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled. llvm-svn: 335308
* [X86] Lowering sqrt intrinsics to native IRTomasz Krupa2018-06-151-2/+2
| | | | | | | | | | | | Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, cfe-commits Differential Revision: https://reviews.llvm.org/D41168 llvm-svn: 334850
* [X86] Add subvector insert and extract builtins to enable target feature ↵Craig Topper2018-06-081-14/+14
| | | | | | | | checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261
* [X86] Add builtins for vpermilps/pd instructions to enable target feature ↵Craig Topper2018-06-081-5/+5
| | | | | | checking. llvm-svn: 334256
* [X86] Add builtins for blend with immediate control to enforce target ↵Craig Topper2018-06-081-1/+1
| | | | | | feature requirements and check immediate range. llvm-svn: 334249
* [X86] Add builtins for vector element insert and extract for different 128 ↵Craig Topper2018-06-061-21/+13
| | | | | | | | | | | | | | and 256 bit vector types. Use them to implement the extract and insert intrinsics. Previously we were just using extended vector operations in the header file. This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load. By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count. User code still has the option to use extended vector operations themselves if they need non-constant indexing. llvm-svn: 334057
* [X86] Remove __extension__ from macro intrinsics when its not needed.Craig Topper2018-05-311-12/+12
| | | | | | | | | | I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added. Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load. It also removed a bogus error message on another test. llvm-svn: 333613
* [X86] NFC Include immintrin.h in CodeGen testsGabor Buella2018-05-241-1/+1
| | | | | | | Following r333110: "Move all Intel defined intrinsic includes into immintrin.h" llvm-svn: 333160
* [X86] Use __builtin_convertvector to implement some of the packed integer to ↵Craig Topper2018-05-211-1/+1
| | | | | | | | | | | | packed float conversion intrinsics. I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions. We already do something similar for scalar conversions. Differential Revision: https://reviews.llvm.org/D46863 llvm-svn: 332882
* [X86] Use native shuffle vector for the perm2f128 intrinsicsCraig Topper2017-09-151-3/+3
| | | | | | | | | | This patch replaces the perm2f128 intrinsics with native shuffle vectors. This uses a pretty simple approach to allocate source 0 to the lower half input and source 1 to the upper half input. Then its just a matter of filling in the indices to use either the lower or upper half of that specific source. This can result in the same source being used by both operands. InstCombine or SelectionDAGBuilder should be able to clean that up. Differential Revision: https://reviews.llvm.org/D37892 llvm-svn: 313418
* Expand vector oparation to as IR constants, PR28129.Dinar Temirbulatov2017-06-161-0/+48
| | | | llvm-svn: 305551
* [X86][AVX] Added support for _mm256_zext* helper intrinsics (PR32839)Simon Pilgrim2017-04-291-0/+21
| | | | llvm-svn: 301749
* [x86] these aren't the undefs you're looking for (PR32176)Sanjay Patel2017-03-121-14/+14
| | | | | | | | | | | | | x86 has undef SSE/AVX intrinsics that should represent a bogus register operand. This is not the same as LLVM's undef value which can take on multiple bit patterns. There are better solutions / follow-ups to this discussed here: https://bugs.llvm.org/show_bug.cgi?id=32176 ...but this should prevent miscompiles with a one-line code change. Differential Revision: https://reviews.llvm.org/D30834 llvm-svn: 297588
* [X86] Remove the mm_malloc.h include guard hack from the X86 builtins testsElad Cohen2016-09-281-4/+2
| | | | | | | | | | | | The X86 clang/test/CodeGen/*builtins.c tests define the mm_malloc.h include guard as a hack for avoiding its inclusion (mm_malloc.h requires a hosted environment since it expects stdlib.h to be available - which is not the case in these internal clang codegen tests). This patch removes this hack and instead passes -ffreestanding to clang cc1. Differential Revision: https://reviews.llvm.org/D24825 llvm-svn: 282581
* [X86][AVX] Ensure we only match against 1-byte alignmentSimon Pilgrim2016-08-101-2/+2
| | | | llvm-svn: 278208
* [x86] Fix a really nasty bug introduced in r276417 where alignmentChandler Carruth2016-08-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | constraints were added to _mm256_broadcast_{pd,ps} intel intrinsics. The spec for these intrinics is ... pretty much silent on alignment. This is especially frustrating considering the amount of discussion of alignment in the load and store instrinsics. So I was forced to rely on the specification for the VBROADCASTF128 instruction. That instruction's spec is *also* completely silent on alignment. Fortunately, when it comes to the instruction's spec, silence is enough. There is no #GP fault option for an underaligned address so this instruction, and by inference the intrinsic, can read any alignment. As it happens, the old code worked exactly this way and in fact we have plenty of code that hands pointers with less than 16-byte alignment to these intrinsics. This code broke pretty spectacularly with this commit. Fortunately, the fix is super simple! Change a 16 to a 1, and ta da! Anyways, a lot of debugging for a really boring fix. =] llvm-svn: 278202
* After PR28761 use -Wall with -Werror in builtins tests to identifyEric Christopher2016-08-041-2/+2
| | | | | | possible problems in headers. llvm-svn: 277696
* [X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 with ↵Simon Pilgrim2016-07-221-2/+4
| | | | | | | | | | generic IR As discussed on D22460, I've updated the vbroadcastf128 pd256/ps256 builtins to map directly to generic IR - load+splat a 128-bit vector to both lanes of a 256-bit vector. Fix for PR28657. llvm-svn: 276417
* [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using ↵Simon Pilgrim2016-07-201-2/+2
| | | | | | | | | | | | | | | | generic IR D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead. It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match). This patch changes both scalar and packed versions back to using x86-specific builtins. It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding. Differential Revision: https://reviews.llvm.org/D22105 llvm-svn: 276102
* [X86] Use undefined instead of setzero in shufflevector based intrinsics ↵Craig Topper2016-07-041-11/+11
| | | | | | when the second source is unused. Rewrite immediate extractions in shuffle intrinsics to be in ((c >> x) & y) form instead of ((c & z) >> x). This way only x varies between each use instead of having to vary x and z. llvm-svn: 274525
* [X86][SSE] Replace (V)CVTTPS2DQ and VCVTTPD2DQ truncating (round to zero) ↵Simon Pilgrim2016-06-011-2/+2
| | | | | | | | | | | | f32/f64 to i32 with generic IR (clang) The 'cvtt' truncation (round to zero) conversions can be safely represented as generic __builtin_convertvector (fptosi) calls instead of x86 intrinsics. We already do this (implicitly) for the scalar equivalents. Note: I looked at updating _mm_cvttpd_epi32 as well but this still requires a lot more backend work to correctly lower (both for debug and optimized builds). Differential Revision: http://reviews.llvm.org/D20859 llvm-svn: 271436
* Adding front-end support to several intrinsics (bit scanning, conversion and ↵Michael Zuckerman2016-06-011-0/+21
| | | | | | | | | | | | | | | | | | | state reading intrinsics) Adding LLVM front-end support to two intrinsics dealing with bit scan: _bit_scan_forward and _bit_scan_reverse. Their functionality is as described in Intel intrinsics guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_bit_scan_forward&expand=371,370 https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_bit_scan_reverse&expand=371,370 Furthermore, adding clang front-end support to these conversion intrinsics: _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32. Finally, adding tests to all of the above, as well as to the state reading intrinsics _rdpmc and _rdtsc. Their functionality is also specified in the Intel intrinsics guide. Commit on behalf of Omer Paparo Bivas llvm-svn: 271387
* [X86] Ensure load/store tests unaligned pointers really are align 1Simon Pilgrim2016-05-301-9/+9
| | | | llvm-svn: 271227
* [X86] Replace unaligned store builtins in SSE/AVX intrinsic files with code ↵Craig Topper2016-05-301-9/+12
| | | | | | | | that will compile to a native unaligned store. Remove the builtins since they are no longer used. Intrinsics will be removed from llvm in a future commit. llvm-svn: 271214
* [X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IRSimon Pilgrim2016-05-231-2/+2
| | | | | | | | | | Both the (V)CVTDQ2PD(Y) (i32 to f64) and (V)CVTPS2PD(Y) (f32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen. This patch removes the clang builtins and their use in the sse2/avx headers - a future patch will deal with removing the llvm intrinsics, but that will require a bit more work. Differential Revision: http://reviews.llvm.org/D20528 llvm-svn: 270499
* [X86][AVX] Ensure zero-extension of _mm256_extract_epi8 and _mm256_extract_epi16Simon Pilgrim2016-05-211-4/+2
| | | | | | | | | | Ensure _mm256_extract_epi8 and _mm256_extract_epi16 zero extend their i8/i16 result to i32. This matches _mm_extract_epi8 and _mm_extract_epi16. Fix for PR27594 Differential Revision: http://reviews.llvm.org/D20468 llvm-svn: 270330
* [X86][AVX] Added _mm256_testc_si256/_mm256_testnzc_si256/_mm256_testz_si256 ↵Simon Pilgrim2016-05-201-0/+18
| | | | | | tests llvm-svn: 270227
* [X86][AVX] Added _mm256_extract_epi64 testSimon Pilgrim2016-05-201-0/+7
| | | | llvm-svn: 270212
* [X86][AVX] Full set of AVX intrinsics testsSimon Pilgrim2016-05-201-51/+1312
| | | | | | llvm/test/CodeGen/X86/avx-intrinsics-fast-isel.ll will be synced to this llvm-svn: 270210
* Removed duplicate SSE42 builtin tests from avx-builtins.cSimon Pilgrim2016-05-181-70/+0
| | | | llvm-svn: 269932
* Canonicalize some of the x86 builtin tests and either remove or commentEric Christopher2015-10-141-4/+7
| | | | | | about optimization options. llvm-svn: 250271
* [X86][SSE] Add _mm_undefined_* intrinsicsSimon Pilgrim2015-08-261-0/+18
| | | | | | | | | | | | | | | | Added missing SSE/AVX 'undefined' intrinsics (PR24040): _mm_undefined_pd, _mm_undefined_ps + _mm_undefined_si128 _mm256_undefined_pd, _mm256_undefined_ps + _mm256_undefined_si256 _mm512_undefined, _mm512_undefined_ps, _mm512_undefined_pd + _mm512_undefined_epi32 Added builtin intrinsicss: __builtin_ia32_undef128, __builtin_ia32_undef256 + __builtin_ia32_undef512 Differential Revision: http://reviews.llvm.org/D12052 llvm-svn: 246083
* Update Clang tests to handle explicitly typed load changes in LLVM.David Blaikie2015-02-271-3/+3
| | | | llvm-svn: 230795
* [Headers] Add tests for _mm256_insert_epi64 and fix its definitionFilipe Cabecinhas2015-02-191-0/+24
| | | | | | | | | | | | | | | | Summary: The definition for _mm256_insert_epi64 was taking an int, which would get truncated before being inserted in the vector. Original patch by Joshua Magee! Reviewers: bruno, craig.topper Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D7179 llvm-svn: 229811
* Patched clang to emit x86 blends as shufflevectors.Filipe Cabecinhas2014-05-131-0/+12
| | | | | | | | | | | | | | | | | Summary: Most of the clang header patch by Simon Pilgrim @ SCEE. Also fixed (or added) clang tests for these intrinsics. LLVM tests to make sure we get the blend instruction out of these shufflevectors are at http://reviews.llvm.org/D3600 Reviewers: eli.friedman, craig.topper, rafael Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D3601 llvm-svn: 208664
* Intrinsics: fix extract & insert when index is out of bound.Manman Ren2013-10-231-0/+18
| | | | | | | | | Now, all extract & insert intrinsics should have the correct and operation to ignore higher bits. rdar://15250497 llvm-svn: 193267
* Re-enable pcmpistri/pcmpestri builtins in clang now that llvm supports them ↵Craig Topper2012-08-061-0/+70
| | | | | | properly. llvm-svn: 161319
* test/CodeGen/avx-builtins.c: Fix more for -Asserts.NAKAMURA Takumi2012-01-251-1/+1
| | | | llvm-svn: 148944
* fix broken testcase.Chris Lattner2012-01-251-2/+2
| | | | llvm-svn: 148925
* Represent 256-bit unaligned loads natively and remove the builtins. Similar ↵Craig Topper2012-01-251-0/+25
change was made for 128-bit versions a while back. llvm-svn: 148919
OpenPOWER on IntegriCloud