summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen/CGBuiltin.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add back some masked vector truncate builtins. Custom IRgen a a few ↵Craig Topper2018-06-081-0/+29
| | | | | | | | | | others. I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl. By using special buitlins we can emit a select without using the 128/256-bit select builtins. llvm-svn: 334331
* [X86] Fold masking into subvector extract builtins.Craig Topper2018-06-081-16/+21
| | | | | | | | I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features. The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported. llvm-svn: 334330
* [X86] Add builtins for vpermq/vpermpd instructions to enable target feature ↵Craig Topper2018-06-081-0/+18
| | | | | | checking. llvm-svn: 334311
* [X86] Add builtins for shufps and shufpd to enable target feature and ↵Craig Topper2018-06-081-0/+30
| | | | | | immediate range checking. llvm-svn: 334266
* [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature ↵Craig Topper2018-06-081-0/+51
| | | | | | and immediate range checking. llvm-svn: 334265
* [X86] Add subvector insert and extract builtins to enable target feature ↵Craig Topper2018-06-081-0/+69
| | | | | | | | checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261
* [X86] Add builtins for vpermilps/pd instructions to enable target feature ↵Craig Topper2018-06-081-0/+27
| | | | | | checking. llvm-svn: 334256
* [X86] Add builtins for blend with immediate control to enforce target ↵Craig Topper2018-06-081-0/+21
| | | | | | feature requirements and check immediate range. llvm-svn: 334249
* [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable ↵Craig Topper2018-06-071-0/+29
| | | | | | target feature checking and immediate range checking. llvm-svn: 334244
* [MS] Re-add support for the ARM interlocked bittest intrinscsReid Kleckner2018-06-071-68/+117
| | | | | | | | | | | | | | | Adds support for these intrinsics, which are ARM and ARM64 only: _interlockedbittestandreset_acq _interlockedbittestandreset_rel _interlockedbittestandreset_nf _interlockedbittestandset_acq _interlockedbittestandset_rel _interlockedbittestandset_nf Refactor the bittest intrinsic handling to decompose each intrinsic into its action, its width, and its atomicity. llvm-svn: 334239
* [X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking.Craig Topper2018-06-071-0/+20
| | | | | | | | We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature. I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that. llvm-svn: 334237
* [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar ↵Craig Topper2018-06-071-0/+62
| | | | | | | | | | intrinsics. We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits. This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously. llvm-svn: 334208
* [X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit ↵Craig Topper2018-06-071-61/+112
| | | | | | | | | | | | | | | | | | | | | | | fmadd/fmsub/fmaddsub/fmsubadd builtins. Summary: We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't. I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim. The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform. I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it. Reviewers: tkrupa, RKSimon, spatel, GBuella Reviewed By: tkrupa Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47724 llvm-svn: 334159
* [MS][ARM64]: Promote _setjmp to_setjmpex as there is no _setjmp in the ARM64 ↵Reid Kleckner2018-06-061-49/+55
| | | | | | | | | | | | libvcruntime.lib Factor out the common setjmp call emission code. Based on a patch by Chris January Differential Revision: https://reviews.llvm.org/D47784 llvm-svn: 334112
* Fix std::tuple errorsReid Kleckner2018-06-061-12/+12
| | | | llvm-svn: 334060
* Implement bittest intrinsics generically for non-x86 platformsReid Kleckner2018-06-061-26/+142
| | | | | | | | | | I tested these locally on an x86 machine by disabling the inline asm codepath and confirming that it does the same bitflips as we do with the inline asm. Addresses code review feedback. llvm-svn: 334059
* [X86] Add builtins for vector element insert and extract for different 128 ↵Craig Topper2018-06-061-2/+23
| | | | | | | | | | | | | | and 256 bit vector types. Use them to implement the extract and insert intrinsics. Previously we were just using extended vector operations in the header file. This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load. By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count. User code still has the option to use extended vector operations themselves if they need non-constant indexing. llvm-svn: 334057
* [X86] Implement __builtin_ia32_vec_ext_v2si correctly even though we only ↵Craig Topper2018-06-051-1/+1
| | | | | | | | use it with an index of 0. This builtin takes an index as its second operand, but the codegen hardcodes an index of 0 and doesn't use the operand. The only use of the builtin in the header file passes 0 to the operand so this works for that usage. But its more correct to use the real operand. llvm-svn: 334054
* Reimplement the bittest intrinsic family as builtins with inline asmReid Kleckner2018-06-051-19/+56
| | | | | | | | | | | We need to implement _interlockedbittestandset as a builtin for windows.h, so we might as well do the whole family. It reduces code duplication anyway. Fixes PR33188, a long standing bug in our bittest implementation encountered by Chakra. llvm-svn: 333978
* Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin ↵Reid Kleckner2018-06-041-19/+18
| | | | | | | | | | | | | | platforms." Adding __attribute__((aligned(32))) to __m256 breaks the implementation of _mm256_loadu_ps on Windows. On Windows, alignment attributes have higher precedence than packing attributes. We also might want to carefully consider the consequences of changing our vector typedefs, since many users copy them and invent their own new, non-Intel specific vector type names. llvm-svn: 333958
* [X86] Replace __builtin_ia32_vbroadcastf128_pd256 and ↵Craig Topper2018-06-031-26/+0
| | | | | | __builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call. llvm-svn: 333853
* [X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper ↵Craig Topper2018-06-031-10/+10
| | | | | | functions. NFC llvm-svn: 333851
* Revert r333848 "[X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 ↵Craig Topper2018-06-031-8/+8
| | | | | | | | builtin helper functions. NFC" Looks like I missed some changes to make this work. llvm-svn: 333850
* [X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper ↵Craig Topper2018-06-031-8/+8
| | | | | | functions. NFC llvm-svn: 333848
* [X86] When emitting masked loads/stores don't check for all ones mask.Craig Topper2018-06-031-10/+0
| | | | | | | | This seems like a premature optimization. It's unlikely a user would pass something the frontend can tell is all ones to the masked load/store intrinsics. We do this optimization for emitting select for masking because we have builtin calls in header files that pass an all ones mask in. Though at this point we may not longer have any builtins that emit some IR and a select. We may only have the select builtins so maybe we can remove that optimization too. llvm-svn: 333847
* [NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)Ivan A. Kosarev2018-06-021-30/+27
| | | | | | | | | We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333829
* Cap "voluntary" vector alignment at 16 for all Darwin platforms.John McCall2018-06-011-18/+19
| | | | | | | | | | | | | | | | | | | | | This fixes two major problems: - We were not capping vector alignment as desired on 32-bit ARM. - We were using different alignments based on the AVX settings on Intel, so we did not have a consistent ABI. This is an ABI break, but we think we can get away with it because vectors tend to be used mostly in inline code (which is why not having a consistent ABI has not proven disastrous on Intel). Intel's AVX types are specified as having 32-byte / 64-byte alignment, so align them explicitly instead of relying on the base ABI rule. Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects. The right long-term solution for this is for alignment attributes to be interpreted as true qualifiers and thus preserved in the canonical type. llvm-svn: 333791
* [WebAssembly] Update to the new names for the memory builtin functions.Dan Gohman2018-06-011-0/+15
| | | | | | | | | The WebAssembly committee has decided on the names `memory.size` and `memory.grow` for the memory intrinsics, so update the clang builtin functions to follow those names, keeping both sets of old names in place for compatibility. llvm-svn: 333712
* [X86] Lowering FMA intrinsics to native IR (Clang part)Gabor Buella2018-05-301-0/+94
| | | | | | | | | | | | | | | | This patch replaces all packed (and scalar without rounding mode) fused intrinsics with fmadd/fmaddsub variations. Then fmadd/fmaddsub are lowered to native IR. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47444 llvm-svn: 333555
* Support __iso_volatile_load8 etc on aarch64-win32.Simon Tatham2018-05-301-26/+42
| | | | | | | | | | | | | | | | | These intrinsics are used by MSVC's header files on AArch64 Windows as well as AArch32, so we should support them for both targets. I've factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate functions that EmitAArch64BuiltinExpr can call as well. Reviewers: javed.absar, mstorsjo Reviewed By: mstorsjo Subscribers: kristof.beyls, cfe-commits Differential Revision: https://reviews.llvm.org/D47476 llvm-svn: 333513
* [X86] Remove mask argument from more builtins that are handled completely in ↵Craig Topper2018-05-231-38/+33
| | | | | | CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead. llvm-svn: 333061
* [CodeGen] use nsw negation for builtin absSanjay Patel2018-05-221-1/+2
| | | | | | | | | | | | | The clang builtins have the same semantics as the stdlib functions. The stdlib functions are defined in section 7.20.6.1 of the C standard with: "If the result cannot be represented, the behavior is undefined." That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would be UB/poison. Differential Revision: https://reviews.llvm.org/D47202 llvm-svn: 333038
* [X86] Remove mask argument from some builtins that are handled completely in ↵Craig Topper2018-05-221-19/+11
| | | | | | CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead. llvm-svn: 333027
* [CodeGen] produce the LLVM canonical form of absSanjay Patel2018-05-221-8/+4
| | | | | | | | We chose the 'slt' form as canonical in IR with: rL332819 ...so we should generate that form directly for efficiency. llvm-svn: 332989
* [X86] Remove masking from pternlog llvm intrinsics and use a select ↵Craig Topper2018-05-211-0/+47
| | | | | | | | | | | | instruction instead. Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions. To avoid that just generate IR directly in CGBuiltin.cpp. Differential Revision: https://reviews.llvm.org/D47125 llvm-svn: 332891
* [AMDGPU] fixes for lds f32 builtinsDaniil Fukalov2018-05-211-0/+43
| | | | | | | | | | | | 1. added restrictions to memory scope, order and volatile parameters 2. added custom processing for these builtins - currently is not used code, needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm tree) 3. builtins renamed as requested Differential Revision: https://reviews.llvm.org/D43281 llvm-svn: 332848
* [X86] Change the implementation of scalar masked load/store intrinsics to ↵Craig Topper2018-05-101-2/+2
| | | | | | | | | | not use a 512-bit intermediate vector. This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs. Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well. llvm-svn: 331958
* [Builtins] Improve the IR emitted for MSVC compatible rotr/rotl builtins to ↵Craig Topper2018-05-101-24/+12
| | | | | | | | | | | | | | | | | | | | | | | match what the middle and backends understand Previously we emitted something like rotl(x, n) { n &= bitwidth-1; return n != 0 ? ((x << n) | (x >> (bitwidth - n)) : x; } We use a select to avoid the undefined behavior on the (bitwidth - n) shift. The middle and backend don't really recognize this as a rotate and end up emitting a cmov or control flow because of the select. A better pattern is (x << (n & mask)) | (x << (-n & mask)) where mask is bitwidth - 1. Fixes the main complaint in PR37387. There's still some work to be done if the user writes that sequence directly on a short or char where type promotion rules can prevent it from being recognized. The builtin is emitting direct IR with unpromoted types so that isn't a problem for it. Differential Revision: https://reviews.llvm.org/D46656 llvm-svn: 331943
* [OpenCL] Fix typos in emitted enqueue kernel function namesYaxun Liu2018-05-091-4/+4
| | | | | | | | | | Two typos: vaarg => vararg get_kernel_preferred_work_group_multiple => get_kernel_preferred_work_group_size_multiple Differential Revision: https://reviews.llvm.org/D46601 llvm-svn: 331895
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-091-3/+3
| | | | | | | | | | | | | | | | | | | This is similar to the LLVM change https://reviews.llvm.org/D46290. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46320 llvm-svn: 331834
* [ARM,AArch64] Add intrinsics for dot product instructionsOliver Stannard2018-04-271-0/+12
| | | | | | | | | | The ACLE spec which describes these intrinsics hasn't been published yet, but this is based on the final draft which will be published soon, and these have already been implemented by GCC. Differential revision: https://reviews.llvm.org/D46109 llvm-svn: 331039
* [OpenCL] Add separate read_only and write_only pipe IR typesSven van Haastregt2018-04-271-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | SPIR-V encodes the read_only and write_only access qualifiers of pipes, so separate LLVM IR types are required to target SPIR-V. Other backends may also find this useful. These new types are `opencl.pipe_ro_t` and `opencl.pipe_wo_t`, which replace `opencl.pipe_t`. This replaces __get_pipe_num_packets(...) and __get_pipe_max_packets(...) which took a read_only pipe with separate versions for read_only and write_only pipes, namely: * __get_pipe_num_packets_ro(...) * __get_pipe_num_packets_wo(...) * __get_pipe_max_packets_ro(...) * __get_pipe_max_packets_wo(...) These separate versions exist to avoid needing a bitcast to one of the two qualified pipe types. Patch by Stuart Brady. Differential Revision: https://reviews.llvm.org/D46015 llvm-svn: 331026
* [x86] Revert r330322 (& r330323): Lowering x86 adds/addus/subs/subus intrinsicsChandler Carruth2018-04-261-98/+1
| | | | | | | | The LLVM commit introduces a crash in LLVM's instruction selection. I filed http://llvm.org/PR37260 with the test case. llvm-svn: 330997
* Lowering x86 adds/addus/subs/subus intrinsics (clang)Alexander Ivchenko2018-04-191-1/+98
| | | | | | | | | | | This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D44786 llvm-svn: 330323
* [NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma ↵Artem Belevich2018-04-181-16/+134
| | | | | | | | | | instructions. The new instructions were added added for sm_70+ GPUs in CUDA-9.1. Differential Revision: https://reviews.llvm.org/D45068 llvm-svn: 330296
* [XRay] Add clang builtin for xray typed events.Keith Wyss2018-04-171-0/+38
| | | | | | | | | | | | | | | | | | | Summary: A clang builtin for xray typed events. Differs from __xray_customevent(...) by the presence of a type tag that is vended by compiler-rt in typical usage. This allows xray handlers to expand logged events with their type description and plugins to process traced events based on type. This change depends on D45633 for the intrinsic definition. Reviewers: dberris, pelikan, rnk, eizan Subscribers: cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D45716 llvm-svn: 330220
* Add modifiers for unsigned char and signed char field printing for ↵Aaron Ballman2018-04-171-0/+2
| | | | | | | | __builtin_dump_struct. Patch by Paul Semel. llvm-svn: 330188
* Add checks for format specifiers used by __builtin_dump_struct and added a ↵Aaron Ballman2018-04-171-0/+1
| | | | | | | | new specifier for null-terminated constant strings. Patch by Paul Semel. llvm-svn: 330185
* [NEON] Support vrndns_f32 intrinsicIvan A. Kosarev2018-04-131-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D45515 llvm-svn: 330012
* [XRay][clang] Add flag to choose instrumentation bundlesDean Michael Berris2018-04-131-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change addresses http://llvm.org/PR36926 by allowing users to pick which instrumentation bundles to use, when instrumenting with XRay. In particular, the flag `-fxray-instrumentation-bundle=` has four valid values: - `all`: the default, emits all instrumentation kinds - `none`: equivalent to -fnoxray-instrument - `function`: emits the entry/exit instrumentation - `custom`: emits the custom event instrumentation These can be combined either as comma-separated values, or as repeated flag values. Reviewers: echristo, kpw, eizan, pelikan Reviewed By: pelikan Subscribers: mgorny, cfe-commits Differential Revision: https://reviews.llvm.org/D44970 llvm-svn: 329985
OpenPOWER on IntegriCloud