summaryrefslogtreecommitdiffstats
path: root/clang/lib/Headers/emmintrin.h
Commit message (Collapse)AuthorAgeFilesLines
...
* [Clang][X86] Convert non-temporal store builtins to generic ↵Simon Pilgrim2016-06-131-2/+2
| | | | | | | | | | | | | | __builtin_nontemporal_store in headers We can now use __builtin_nontemporal_store instead of target specific builtins for naturally aligned nontemporal stores which avoids the need for handling in CGBuiltin.cpp The scalar integer nontemporal (unaligned) store builtins will have to wait as __builtin_nontemporal_store currently assumes natural alignment and doesn't accept the 'packed struct' trick that we use for normal unaligned load/stores. The nontemporal loads require further backend support before we can safely convert them to __builtin_nontemporal_load Differential Revision: http://reviews.llvm.org/D21272 llvm-svn: 272540
* [X86] Handle AVX2 pslldqi and psrldqi intrinsics shufflevector creation ↵Craig Topper2016-06-091-38/+42
| | | | | | directly in the header file instead of in CGBuiltin.cpp. Simplify the sse2 equivalents as well. llvm-svn: 272246
* [X86] Add void to the argument list of intrinsics that don't take arguments ↵Craig Topper2016-06-091-2/+2
| | | | | | since empty argument list mean something else in C. llvm-svn: 272244
* [X86] Use unsigned types for vector arithmetic in intrinsics to avoid ↵Craig Topper2016-06-041-18/+17
| | | | | | | | undefined behavior for signed integer overflow. This is really only needed for addition, subtraction, and multiplication, but I did the bitwise ops too for overall consistency. Clang currently doesn't set NSW for signed vector operations so the undefined behavior shouldn't happen today. llvm-svn: 271778
* [X86][SSE] Replace (V)CVTTPS2DQ and VCVTTPD2DQ truncating (round to zero) ↵Simon Pilgrim2016-06-011-1/+1
| | | | | | | | | | | | f32/f64 to i32 with generic IR (clang) The 'cvtt' truncation (round to zero) conversions can be safely represented as generic __builtin_convertvector (fptosi) calls instead of x86 intrinsics. We already do this (implicitly) for the scalar equivalents. Note: I looked at updating _mm_cvttpd_epi32 as well but this still requires a lot more backend work to correctly lower (both for debug and optimized builds). Differential Revision: http://reviews.llvm.org/D20859 llvm-svn: 271436
* [X86][SSE] _mm_store1_ps/_mm_store1_pd should require an aligned pointerSimon Pilgrim2016-05-301-7/+10
| | | | | | | | | | | | | | | | According to the gcc headers, intel intrinsics docs and msdn codegen the _mm_store1_pd (and its _mm_store_pd1 equivalent) should use an aligned pointer - the clang headers are the only implementation I can find that assume non-aligned stores (by storing with _mm_storeu_pd). Additionally, according to the intel intrinsics docs and msdn codegen the _mm_store1_ps (_mm_store_ps1) requires a similarly aligned pointer. This patch raises the alignment requirements to match the other implementations by calling _mm_store_ps/_mm_store_pd instead. I've also added the missing _mm_store_pd1 intrinsic (which maps to _mm_store1_pd like _mm_store_ps1 does to _mm_store1_ps). As a followup I'll update the llvm fast-isel tests to match this codegen. Differential Revision: http://reviews.llvm.org/D20617 llvm-svn: 271218
* [X86] Replace unaligned store builtins in SSE/AVX intrinsic files with code ↵Craig Topper2016-05-301-2/+8
| | | | | | | | that will compile to a native unaligned store. Remove the builtins since they are no longer used. Intrinsics will be removed from llvm in a future commit. llvm-svn: 271214
* [X86][SSE] Make unsigned integer vector types generally availableSimon Pilgrim2016-05-291-0/+6
| | | | | | As discussed on http://reviews.llvm.org/D20684, move the unsigned integer vector types used for zero extension to make them available for general use. llvm-svn: 271187
* [X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IRSimon Pilgrim2016-05-231-2/+4
| | | | | | | | | | Both the (V)CVTDQ2PD(Y) (i32 to f64) and (V)CVTPS2PD(Y) (f32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen. This patch removes the clang builtins and their use in the sse2/avx headers - a future patch will deal with removing the llvm intrinsics, but that will require a bit more work. Differential Revision: http://reviews.llvm.org/D20528 llvm-svn: 270499
* [X86] Add typecasts to remove most assumptions about what __m128i/__m256i is ↵Craig Topper2016-05-161-79/+79
| | | | | | defined as. Add similar typecasts for the fp types as well. llvm-svn: 269632
* Add doxygen comments to emmintrin.h's intrinsics. Only around 25% of the ↵Ekaterina Romanova2016-04-081-1/+939
| | | | | | | | | | intrinsics in this file are documented now. The patches for the rest of the intrisics in this file will be send out later. The doxygen comments are automatically generated based on Sony's intrinsics document. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. This patch was internally reviewed by Paul Robinson. llvm-svn: 265844
* [X86] Add 'pause' builtin that's already in llvm and use it instead of ↵Craig Topper2015-11-111-1/+1
| | | | | | inline assembly to implement _mm_pause. llvm-svn: 252712
* [X86] Use __builtin_ia32_paddq and __builtin_ia32_psubq to implement a ↵Craig Topper2015-11-111-2/+2
| | | | | | couple intrinsics that were supposed to operate on MMX registers. Otherwise we end up operating on GPRs. Throw in a test for _mm_mul_su32 while I was there. llvm-svn: 252711
* [X86] Add missing typecasts in intrinsic macros. This should make them more ↵Craig Topper2015-11-111-2/+2
| | | | | | robust against inputs that aren't already the right type. llvm-svn: 252700
* [X86] Use setzero instead of set1(0) in a few places in intrinsic headers.Craig Topper2015-11-101-3/+3
| | | | llvm-svn: 252587
* Fix the SSE4 byte sign extension in a cleaner way, and more thoroughlyChandler Carruth2015-10-011-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | test that our intrinsics behave the same under -fsigned-char and -funsigned-char. This further testing uncovered that AVX-2 has a broken cmpgt for 8-bit elements, and has for a long time. This is fixed in the same way as SSE4 handles the case. The other ISA extensions currently work correctly because they use specific instruction intrinsics. As soon as they are rewritten in terms of generic IR, they will need to add these special casts. I've added the necessary testing to catch this however, so we shouldn't have to chase it down again. I considered changing the core typedef to be signed, but that seems like a bad idea. Notably, it would be an ABI break if anyone is reaching into the innards of the intrinsic headers and passing __v16qi on an API boundary. I can't be completely confident that this wouldn't happen due to a macro expanding in a lambda, etc., so it seems much better to leave it alone. It also matches GCC's behavior exactly. A fun side note is that for both GCC and Clang, -funsigned-char really does change the semantics of __v16qi. To observe this, consider: % cat x.cc #include <smmintrin.h> #include <iostream> int main() { __v16qi a = { 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; __v16qi b = _mm_set1_epi8(-1); std::cout << (int)(a / b)[0] << ", " << (int)(a / b)[1] << '\n'; } % clang++ -o x x.cc && ./x -1, 1 % clang++ -funsigned-char -o x x.cc && ./x 0, 1 However, while this may be surprising, both Clang and GCC agree. Differential Revision: http://reviews.llvm.org/D13324 llvm-svn: 249097
* [X86] Make f16c intrinsics accessible through emmintrin.h, per Intel docsMichael Kuperstein2015-09-211-0/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D13015 llvm-svn: 248156
* [X86] Fix some non-reserved parameter names in intrinsic headersMichael Kuperstein2015-09-211-18/+18
| | | | | | Differential Revision: http://reviews.llvm.org/D13009 llvm-svn: 248150
* [X86][SSE] Add _mm_undefined_* intrinsicsSimon Pilgrim2015-08-261-0/+12
| | | | | | | | | | | | | | | | Added missing SSE/AVX 'undefined' intrinsics (PR24040): _mm_undefined_pd, _mm_undefined_ps + _mm_undefined_si128 _mm256_undefined_pd, _mm256_undefined_ps + _mm256_undefined_si256 _mm512_undefined, _mm512_undefined_ps, _mm512_undefined_pd + _mm512_undefined_epi32 Added builtin intrinsicss: __builtin_ia32_undef128, __builtin_ia32_undef256 + __builtin_ia32_undef512 Differential Revision: http://reviews.llvm.org/D12052 llvm-svn: 246083
* [X86] Rename DEFAULT_FN_ATTR macro to __DEFAULT_FN_ATTRMichael Kuperstein2015-06-301-216/+216
| | | | llvm-svn: 241065
* Update the intel intrinsic headers to use the target attribute support.Eric Christopher2015-06-171-7/+1
| | | | | | | | | | | | | | | | | | | This involved removing the conditional inclusion and replacing them with target attributes matching the original conditional inclusion and checks. The testcase update removes the macro checks for each file and replaces them with usage of the __target__ attribute, e.g.: int __attribute__((__target__(("sse3")))) foo(int a) { _mm_mwait(0, 0); return 4; } This usage does require the enclosing function have the requisite __target__ attribute for inlining and code generation - also for any macro intrinsic uses in the enclosing function. There's no change for existing uses of the intrinsic headers. llvm-svn: 239883
* Use a define for per-file function attributes for the Intel intrinsic headers.Eric Christopher2015-06-171-214/+219
| | | | | | | This is a precursor to changing them to use the new target attribute code. llvm-svn: 239882
* [X86] Add _mm_bslli_si128 and _mm_bsrli_si128 as aliases of _mm_slli_si128 ↵Craig Topper2015-02-131-0/+6
| | | | | | and _mm_srli_si128. This matches Intel documentation and gcc. llvm-svn: 229066
* [X86] Simplify some code and remove some -Wshadow disables from intrinsic ↵Craig Topper2015-02-131-65/+46
| | | | | | header. llvm-svn: 229065
* Make the byte-shift SSE intrinsics emit vector shuffles which we know the ↵Filipe Cabecinhas2015-02-071-11/+48
| | | | | | | | | | | | | | | | backend can handle. Also removed unused builtins. Original patch by Andrea Di Biagio! Reviewers: craig.topper, nadav Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D7199 llvm-svn: 228481
* Headers: Don't use attribute keywords which aren't reservedDavid Majnemer2015-02-041-2/+2
| | | | | | Instead of using 'unavailable', use '__unavailable__' llvm-svn: 228087
* [x86] Add the (v)cmpps/pd/ss/sd builtins to match gcc. Use them in the sse ↵Craig Topper2014-12-271-24/+24
| | | | | | | | | | intrinsic files. This still lower to the same intrinsics as before. This is preparation for bounds checking the immediate on the avx version of the builtin so we don't pass illegal immediates into the backend. Since SSE uses a smaller size immediate its not possible to bounds check when using a shared builtin. Rather than creating a clang specific builtin for the different immediate, I decided (after consulting with Chandler) that it was better to match gcc. llvm-svn: 224879
* Fix a SSE2 intrinsics typoAlp Toker2013-11-231-1/+1
| | | | | | | | | | | Full discourse at: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20131104/092514.html http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-November/068124.html Patch by Dimitry Andric and Alexey Dokuchaev! llvm-svn: 195558
* _mm_extract_epi16: use "& 7" when index is out of bound.Manman Ren2013-10-221-1/+1
| | | | | | | This is in line with implementation of _mm_extract_pi16. rdar://15250497 llvm-svn: 193187
* Suppress useless -Wshadow warning when using _mm* macros from emmintrin.hTed Kremenek2013-10-071-0/+12
| | | | | | | | | | Fixes <rdar://problem/10679282>. I'm not completely satisfied with this patch. Sprinkling "diagnostic ignored" _Pragmas throughout this file is gross, but I couldn't suppress it for the entire file. llvm-svn: 192143
* Add _mm_stream_si64 intrinsic.Eli Friedman2013-09-231-0/+8
| | | | | | | | | While I'm here, also fix the alignment computation for the whole family of intrinsics. PR17298. llvm-svn: 191243
* X86 intrinsics: cmpge|gt|nge|ngt_ss|_sdManman Ren2013-06-171-4/+8
| | | | | | | | | | | | | These intrinsics should return the comparision result in the low bits and keep the high bits of the first source operand. When calling to builtin functions, the source operands are swapped and the high bits of the second source operand are kept. To fix the issue, an extra shufflevector is used. rdar://14153896 llvm-svn: 184110
* Avoid names like __in that conflict with SAL in builtin headersReid Kleckner2013-04-191-12/+12
| | | | | | | | | | | | | | Microsoft's Source Annotation Language (SAL) defines a bunch of keywords for annotating the inputs and outputs of functions. Empty definitions for the keywords are provided by <stdlib.h> -> <crtdefs.h> -> <sal.h>. This makes it basically impossible to include MSVC's stdlib.h and Clang's *mmintrin.h headers at the same time if they have variables named __in. As a workaround, I've renamed those variables. This fixes the Modules/compiler_builtins.m test which was XFAILed, presumably due to this conflict. llvm-svn: 179860
* PR14964: intrinsic headers using non-reserved identifiersDavid Blaikie2013-01-161-430/+430
| | | | | | | | | | | | | | | | Several of the intrinsic headers were using plain non-reserved identifiers. C++11 17.6.4.3.2 [global.names] p1 reservers names containing a double begining with an underscore followed by an uppercase letter for any use. I think I got them all, but open to being corrected. For the most part I didn't bother updating function-like macro parameter names because I don't believe they're subject to any such collission - though some function-like macros already follow this convention (I didn't update them in part because the churn was more significant as several function-like macros use the double underscore prefixed version of the same name as a parameter in their implementation) llvm-svn: 172666
* Get rid of storelv4si builtin as it can be expressed directly. This is generalChad Rosier2012-05-011-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | goodness because it provides opportunites to cleanup things. For example, uint64_t t1(__m128i vA) { uint64_t Alo; _mm_storel_epi64((__m128i*)&Alo, vA); return Alo; } was generating movq %xmm0, -8(%rbp) movq -8(%rbp), %rax and now generates movd %xmm0, %rax rdar://11282581 llvm-svn: 155924
* Comment mystery code.Nick Lewycky2012-02-041-0/+2
| | | | llvm-svn: 149742
* Make _mm_cmpgt_epi8 immute to -funsigned-char.Nick Lewycky2012-02-031-1/+2
| | | | llvm-svn: 149725
* Fix vector macros to correctly check argument types. <rdar://problem/10261670>Bob Wilson2011-11-051-22/+29
| | | | llvm-svn: 143792
* Add _mm_comige_sd to emmintrin.h, since I apparently forgot to do this in ↵Eli Friedman2011-10-061-0/+6
| | | | | | | | r138769. <rdar://problem/10230751> llvm-svn: 141310
* Tweak *mmintrin.h so that they don't make any bad assumptions about ↵Eli Friedman2011-09-151-13/+45
| | | | | | | | alignment (which probably has little effect in practice, but better to get it right). Make the load in _mm_loadh_pi and _mm_loadl_pi a single LLVM IR instruction to make optimizing easier for CodeGen. rdar://10054986 llvm-svn: 139874
* Add missing function _mm_ucomige_sd to emmintrin.h. PR10803.Eli Friedman2011-08-291-0/+6
| | | | llvm-svn: 138769
* Add 'may_alias' attribute. Noticed by Eli.Bill Wendling2011-05-131-2/+2
| | | | llvm-svn: 131278
* Represent the unaligned loads natively. These are converted into a call to theBill Wendling2011-05-131-2/+8
| | | | | | correct unaligned load. llvm-svn: 131268
* LLVM doesn't always optimize away the four loads from this:Bill Wendling2011-05-121-1/+1
| | | | | | | | | | (__m128){ p[0], p[1], p[2], p[3] } which produces really bad code. This could be done in instcombine, but it's probably better to do it in the front-end instead. <rdar://problem/9424836> llvm-svn: 131237
* PR9866: Fix the implementation of _mm_loadl_pd and _mm_loadh_pd to not makeEli Friedman2011-05-071-2/+2
| | | | | | bad assumptions about the alignment of the double* argument. llvm-svn: 131052
* don't use compound literals in MM macros, since they will be instantiatedChris Lattner2011-04-251-3/+5
| | | | | | into user code which may warn about them with -pedantic. Patch by Jonathan Sauer! llvm-svn: 130149
* Just use a native "load" instead of translating the builtin later. Clang canBill Wendling2011-04-131-1/+1
| | | | | | | | | take it! I wasn't able to get __builtin_ia32_loaddqu to transform into an unaligned load...I'll have to look into it further. llvm-svn: 129427
* __builtin_ia32_psrldqi128 tooChris Lattner2010-10-011-6/+4
| | | | llvm-svn: 115301
* the second argument to __builtin_ia32_pslldqi128 must be an immediate,Chris Lattner2010-10-011-5/+2
| | | | | | | so it needs to be called from a macro, not a function. This is a necessary but insufficient step towards fixing PR8221 llvm-svn: 115299
* Move some type defines from smmintrin.h to emmintrin.h to match whereEric Christopher2010-08-261-0/+3
| | | | | | gcc defines them. llvm-svn: 112146
OpenPOWER on IntegriCloud