summaryrefslogtreecommitdiffstats
path: root/clang/lib/Headers
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Fold masking into subvector extract builtins.Craig Topper2018-06-084-84/+126
| | | | | | | | I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features. The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported. llvm-svn: 334330
* [X86] Add builtins for vpermq/vpermpd instructions to enable target feature ↵Craig Topper2018-06-083-40/+6
| | | | | | checking. llvm-svn: 334311
* [X86] Change immediate type for some builtins from char to int.Craig Topper2018-06-082-4/+4
| | | | | | These builtins are all handled by CGBuiltin.cpp so it doesn't much matter what the immediate type is, but int matches the intrinsic spec. llvm-svn: 334310
* [X86] Add builtins for shufps and shufpd to enable target feature and ↵Craig Topper2018-06-083-24/+8
| | | | | | immediate range checking. llvm-svn: 334266
* [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature ↵Craig Topper2018-06-084-112/+9
| | | | | | and immediate range checking. llvm-svn: 334265
* [X86] Fix some typecasts in intrinsic headers that I messed up in r334261.Craig Topper2018-06-081-3/+3
| | | | | | This was caught by the Header tests, but not the CodeGen tests. llvm-svn: 334264
* [X86] Add subvector insert and extract builtins to enable target feature ↵Craig Topper2018-06-086-268/+48
| | | | | | | | checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261
* [X86] Add builtins for vpermilps/pd instructions to enable target feature ↵Craig Topper2018-06-082-51/+6
| | | | | | checking. llvm-svn: 334256
* [X86] Add builtins for blend with immediate control to enforce target ↵Craig Topper2018-06-083-69/+16
| | | | | | feature requirements and check immediate range. llvm-svn: 334249
* [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable ↵Craig Topper2018-06-072-76/+16
| | | | | | target feature checking and immediate range checking. llvm-svn: 334244
* [MS] Re-add support for the ARM interlocked bittest intrinscsReid Kleckner2018-06-071-0/+17
| | | | | | | | | | | | | | | Adds support for these intrinsics, which are ARM and ARM64 only: _interlockedbittestandreset_acq _interlockedbittestandreset_rel _interlockedbittestandreset_nf _interlockedbittestandset_acq _interlockedbittestandset_rel _interlockedbittestandset_nf Refactor the bittest intrinsic handling to decompose each intrinsic into its action, its width, and its atomicity. llvm-svn: 334239
* [X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking.Craig Topper2018-06-072-54/+12
| | | | | | | | We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature. I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that. llvm-svn: 334237
* [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar ↵Craig Topper2018-06-073-248/+14
| | | | | | | | | | intrinsics. We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits. This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously. llvm-svn: 334208
* [X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit ↵Craig Topper2018-06-071-524/+432
| | | | | | | | | | | | | | | | | | | | | | | fmadd/fmsub/fmaddsub/fmsubadd builtins. Summary: We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't. I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim. The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform. I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it. Reviewers: tkrupa, RKSimon, spatel, GBuella Reviewed By: tkrupa Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47724 llvm-svn: 334159
* [CUDA] Replace 'nv_weak' attributes in CUDA headers with 'weak'.Artem Belevich2018-06-061-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D47804 llvm-svn: 334108
* [X86] Add builtins for vector element insert and extract for different 128 ↵Craig Topper2018-06-063-92/+49
| | | | | | | | | | | | | | and 256 bit vector types. Use them to implement the extract and insert intrinsics. Previously we were just using extended vector operations in the header file. This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load. By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count. User code still has the option to use extended vector operations themselves if they need non-constant indexing. llvm-svn: 334057
* [X86] Mark all the builtins and intrinsics that require MMX and an SSE ↵Craig Topper2018-06-053-45/+51
| | | | | | | | feature as requiring both mmx and the sse feature. Previously we only checked the sse feature, but this means that if you passed -mno-mmx, the builtins/intrinsics wouldn't be disabled in the frontend and would instead fail backend isel. llvm-svn: 333980
* Reimplement the bittest intrinsic family as builtins with inline asmReid Kleckner2018-06-051-81/+0
| | | | | | | | | | | We need to implement _interlockedbittestandset as a builtin for windows.h, so we might as well do the whole family. It reduces code duplication anyway. Fixes PR33188, a long standing bug in our bittest implementation encountered by Chakra. llvm-svn: 333978
* Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin ↵Reid Kleckner2018-06-042-36/+43
| | | | | | | | | | | | | | platforms." Adding __attribute__((aligned(32))) to __m256 breaks the implementation of _mm256_loadu_ps on Windows. On Windows, alignment attributes have higher precedence than packing attributes. We also might want to carefully consider the consequences of changing our vector typedefs, since many users copy them and invent their own new, non-Intel specific vector type names. llvm-svn: 333958
* [X86] Avoid passing _mm_undefined* to builtin_shufflevector if we are able ↵Craig Topper2018-06-044-22/+11
| | | | | | | | to pass the first input a second time. This is more consistent with other usages of builtin_shufflevector. Later optimization passes or codegen will detect the duplicate vector and replace it with undef. Using _mm_undefined just puts a zeroinitializer that still needs to be optimized out later. llvm-svn: 333944
* [X86] Fix a couple places that were using macro arguments twice when of the ↵Craig Topper2018-06-041-2/+4
| | | | | | | | usages could just be undefined. One of the arguments was being used when the passthru argument is unused due to the mask being all 1s. But in that case the actual value doesn't matter so we should use undef instead to avoid expanding the macro argument unnecessarily. llvm-svn: 333865
* [X86] Remove superfluous escaped new lines from intrinsic files.Craig Topper2018-06-031-4/+4
| | | | llvm-svn: 333858
* [X86] Explicitly make the arguments to __slwpcb intrinsic 'void'.Craig Topper2018-06-031-1/+1
| | | | | | This is the correct way to say it takes no arguments in C. llvm-svn: 333855
* [X86] Replace __builtin_ia32_vbroadcastf128_pd256 and ↵Craig Topper2018-06-031-2/+6
| | | | | | __builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call. llvm-svn: 333853
* Cap "voluntary" vector alignment at 16 for all Darwin platforms.John McCall2018-06-012-43/+36
| | | | | | | | | | | | | | | | | | | | | This fixes two major problems: - We were not capping vector alignment as desired on 32-bit ARM. - We were using different alignments based on the AVX settings on Intel, so we did not have a consistent ABI. This is an ABI break, but we think we can get away with it because vectors tend to be used mostly in inline code (which is why not having a consistent ABI has not proven disastrous on Intel). Intel's AVX types are specified as having 32-byte / 64-byte alignment, so align them explicitly instead of relying on the base ABI rule. Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects. The right long-term solution for this is for alignment attributes to be interpreted as true qualifiers and thus preserved in the canonical type. llvm-svn: 333791
* [X86] Rewrite avx512vbmi unmasked and maskz macro intrinsics to be wrappers ↵Craig Topper2018-06-012-36/+180
| | | | | | | | | | around their __builtin function with appropriate arguments rather than just passing arguments to the masked intrinsic. This is more consistent with all of our other avx512 macro intrinsics. It also fixes a bad cast where an argument was casted to mmask8 when it should have been a mmask16. llvm-svn: 333778
* [X86] Remove leftover semicolons at end of macrosMartin Storsjo2018-06-015-13/+13
| | | | | | | This was missed in a few places in SVN r333613, causing compilation errors if these macros are used e.g. as parameter to a function. llvm-svn: 333734
* [X86] Make 512-bit unmasked load/store builtins more like their 128/256-bit ↵Craig Topper2018-05-311-16/+18
| | | | | | | | equivalents. Previously we were just passing -1 mask to the masked builtin. This changes it to the more generic way that the 128/256 bit use. llvm-svn: 333626
* [X86] Fix wrong intrinsic semantic.Tim Shen2018-05-312-17/+17
| | | | llvm-svn: 333617
* [X86] Fix some places where macro arguments to intrinsics weren't cast to ↵Craig Topper2018-05-317-104/+104
| | | | | | | | | | _m512(i|d)/_m256(i|d/_m128(i|d) first. The majority of the cases were correct. This fixes the few that weren't. I also removed some superfluous parentheses in non-macros that confused by attempts at grepping for missing casts. llvm-svn: 333615
* [X86] Remove __extension__ from macro intrinsics when its not needed.Craig Topper2018-05-3121-2926/+2926
| | | | | | | | | | I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added. Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load. It also removed a bogus error message on another test. llvm-svn: 333613
* [X86] Use C style comments in intrinsic headers for overall consistency.Craig Topper2018-05-306-91/+100
| | | | | | | | Most of the origial comments used C style /* */ comments, but some C++ // comments had snuck in over time. Still need to convert all the doxygen comments. Which is much harder to do. llvm-svn: 333603
* [X86] Add __extension__ to a bunch of places in our intrinsic headers that ↵Craig Topper2018-05-304-103/+109
| | | | | | | | fail if you run it through -pedantic -ansi. All of these are lines that create a 'compound literal' to concatenate elements together. llvm-svn: 333593
* [X86] Simplify the implementation of _mm_sqrt_ss, _mm_rcp_ss, and _mm_rsqrt_ss.Craig Topper2018-05-301-7/+4
| | | | | | | | We don't need the insertion back into the original vector at the end. The builtin already understands that. This is different than _mm_sqrt_sd which takes two arguments and we do need to insert. llvm-svn: 333572
* [X86] Reduce the number of setzero intrinsics to just the set defined by the ↵Craig Topper2018-05-3011-179/+148
| | | | | | | | | | Intel Intrinsics Guide. We had quite a few for different element sizes of integers sometimes with strange target features attached to them. We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types. llvm-svn: 333568
* [X86] Remove 'return' from a bunch of intrinsics that return void and use a ↵Craig Topper2018-05-309-19/+19
| | | | | | | | builtin that returns void. Found by running the intrinsic headers through -pedantic -ansi. llvm-svn: 333563
* [X86] Lowering FMA intrinsics to native IR (Clang part)Gabor Buella2018-05-302-870/+1025
| | | | | | | | | | | | | | | | This patch replaces all packed (and scalar without rounding mode) fused intrinsics with fmadd/fmaddsub variations. Then fmadd/fmaddsub are lowered to native IR. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47444 llvm-svn: 333555
* Add missing curly from r333509Hans Wennborg2018-05-301-1/+1
| | | | llvm-svn: 333515
* [X86] Remove masking from the AVX512VNNI builtins. Use a select in IR instead.Craig Topper2018-05-302-168/+118
| | | | llvm-svn: 333509
* [X86] Fix the names of a bunch of icelake intrinsics.Craig Topper2018-05-303-119/+108
| | | | | | | | Mostly this fixes the names of all the 128-bit intrinsics to start with _mm_ instead of _mm128_ as is the convention and what the Intel docs say. This also fixes the name of the bitshuffle intrinsics to say epi64 for 128 and 256 bit versions. llvm-svn: 333497
* [X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins ↵Craig Topper2018-05-296-532/+350
| | | | | | to a single version without masking. Use select builtins with appropriate operand instead. llvm-svn: 333387
* Revert r333347 "[X86] Rewrite the max and min reduction intrinsics to make ↵Craig Topper2018-05-261-118/+221
| | | | | | | | better use of other functions and to reduce width to 256 and 128 bits were possible." This wasn't supposed to be commited yet. llvm-svn: 333349
* [X86] Remove mask from avx512ifma builtins. Use a select instruction instead.Craig Topper2018-05-262-80/+52
| | | | | | This reduces from 12 builtins to 6 since we no longer need a mask and maskz version. llvm-svn: 333348
* [X86] Rewrite the max and min reduction intrinsics to make better use of ↵Craig Topper2018-05-261-221/+118
| | | | | | | | | | | | | | | | | | | other functions and to reduce width to 256 and 128 bits were possible. Summary: We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX. For v16i32 and floating point we have legacy 128/256 bit instructions we can use. I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization. Reviewers: RKSimon, spatel, GBuella Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47401 llvm-svn: 333347
* [x86] invpcid intrinsicGabor Buella2018-05-255-0/+44
| | | | | | | | | | | | An intrinsic for an old instruction, as described in the Intel SDM. Reviewers: craig.topper, rnk Reviewed By: craig.topper, rnk Differential Revision: https://reviews.llvm.org/D47142 llvm-svn: 333256
* [X86] Fix a bad cast in _mm512_mask_abs_epi32 and _mm512_maskz_abs_epi32.Craig Topper2018-05-241-2/+2
| | | | llvm-svn: 333211
* [X86] Move the include of clzerointrin.h from immintrin.h back to x86intrin.h.Craig Topper2018-05-232-4/+5
| | | | | | This is an AMD intrinsic not an Intel intrinsic so it shouldn't be in immintrin.h llvm-svn: 333124
* [modules] Mark __wmmintrin_pclmul.h/__wmmintrin_aes.h as textualRaphael Isemann2018-05-231-8/+3
| | | | | | | | | | | | | | | | | | | | | | | Summary: Since clang r332929 these two headers throw errors when included from somewhere else than their wrapper header. It seems marking them as textual is the best way to fix the builds. Fixes this new module build error: While building module '_Builtin_intrinsics' imported from ...: In file included from <module-includes>:2: In file included from lib/clang/7.0.0/include/immintrin.h:54: In file included from lib/clang/7.0.0/include/wmmintrin.h:29: lib/clang/7.0.0/include/__wmmintrin_aes.h:25:2: error: "Never use <__wmmintrin_aes.h> directly; include <wmmintrin.h> instead." #error "Never use <__wmmintrin_aes.h> directly; include <wmmintrin.h> instead." Reviewers: rsmith, v.g.vassilev, craig.topper Reviewed By: craig.topper Subscribers: craig.topper, cfe-commits Differential Revision: https://reviews.llvm.org/D47277 llvm-svn: 333123
* [X86] Move all Intel defined intrinsic includes into immintrin.hCraig Topper2018-05-2311-66/+50
| | | | | | | | | | This matches the Intel documentation which shows them available by importing immintrin.h. x86intrin.h also includes immintrin.h so anyone including x86intrin.h will still get them. This is different than gcc, but I don't think we were a perfect match there already. I'm unclear what gcc's policy is about how they choose which to add things to. Differential Revision: https://reviews.llvm.org/D47182 llvm-svn: 333110
* [DOXYGEN] Formatting changes for better intrinsics documentation renderingEkaterina Romanova2018-05-235-31/+43
| | | | | | | | (1) I added some \see cross-references to a few select intrinsics that are related (and have the same or similar semantics). (2) pmmintrin.h, smmintrin.h, xmmintrin.h have very few minor formatting changes. They make rendering of our intrinsics documentation better. llvm-svn: 333065
OpenPOWER on IntegriCloud