summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen/CGBuiltin.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Implement _builtin_ia32_vfmaddss and _builtin_ia32_vfmaddsd with ↵Craig Topper2018-07-061-0/+10
| | | | | | | | native IR using llvm.fma intrinsic. This generates some extra zeroing currently, but we should be able to quickly address that with some isel patterns. llvm-svn: 336417
* [X86] Use shufflevector instead of a select with a constant mask for ↵Craig Topper2018-07-051-8/+4
| | | | | | | | | | fmaddsub/fmsubadd IR emission. Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles. While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle. llvm-svn: 336388
* [X86] Fix some vector cmp builtins - TRUE/FALSE predicatesGabor Buella2018-07-051-37/+32
| | | | | | | | | | | | | | | | | This patch removes on optimization used with the TRUE/FALSE predicates, as was suggested in https://reviews.llvm.org/D45616 for r335339. The optimization was buggy, since r335339 used it also for *_mask builtins, without actually applying the mask -- the mask argument was just ignored. Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D48715 llvm-svn: 336355
* [X86] Remove masking from the avx512 packed sqrt builtins. Use select ↵Craig Topper2018-06-291-16/+14
| | | | | | builtins instead. llvm-svn: 335945
* [X86] Rename llvm.x86.avx512.mask.fpclass.p* to exclude 'mask.' from the ↵Craig Topper2018-06-271-6/+6
| | | | | | name to match llvm. llvm-svn: 335745
* [NEON] Support vldNq intrinsics in AArch32 (Clang part)Ivan A. Kosarev2018-06-271-63/+16
| | | | | | | | | This patch reworks the support for dup NEON intrinsics as described in D48439. Differential Revision: https://reviews.llvm.org/D48440 llvm-svn: 335734
* [X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and ↵Craig Topper2018-06-261-0/+37
| | | | | | | | implement the mask input argument using an 'and' IR instruction. Additional IR is emitted to convert between scalar and vXi1 type to match the expected software inferface for the builtin that clang exposes. llvm-svn: 335564
* [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IRGabor Buella2018-06-221-91/+74
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Lowering some vector comparision builtins to fcmp IR instructions. This ignores the signaling behaviour specified in the predicate argument of said builtins. Affected AVX512 builtins: __builtin_ia32_cmpps128_mask __builtin_ia32_cmpps256_mask __builtin_ia32_cmpps512_mask __builtin_ia32_cmppd128_mask __builtin_ia32_cmppd256_mask __builtin_ia32_cmppd512_mask Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: craig.topper, spatel, efriedma Differential Revision: https://reviews.llvm.org/D45616 llvm-svn: 335339
* [X86] Update handling in CGBuiltin to be tolerant of out of range immediates.Craig Topper2018-06-211-13/+29
| | | | | | | | D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled. This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled. llvm-svn: 335308
* Fix a bug introduced by rL334850Tomasz Krupa2018-06-181-2/+2
| | | | | | | | | | | | | | | Summary: All *_sqrt_round_s[s|d] intrinsics should execute a square root on zeroth element from B (Ops[1]) and insert in to A (Ops[0]), not the other way around. Reviewers: itaraban, craig.topper Reviewed By: craig.topper Subscribers: craig.topper, cfe-commits Differential Revision: https://reviews.llvm.org/D48288 llvm-svn: 334964
* [X86] Lowering sqrt intrinsics to native IRTomasz Krupa2018-06-151-1/+50
| | | | | | | | | | | | Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, cfe-commits Differential Revision: https://reviews.llvm.org/D41168 llvm-svn: 334850
* [AArch64] Reverted rC334696 with Clang VCVTA test fixLuke Geeson2018-06-151-0/+3
| | | | llvm-svn: 334820
* [X86] Rename __builtin_ia32_pslldqi128 to ↵Craig Topper2018-06-141-10/+8
| | | | | | | | | | | | __builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files. The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8. This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic. Fixes the remaining issue from PR37795. llvm-svn: 334773
* [X86] Lowering Mask Scalar intrinsics to native IR (Clang part)Tomasz Krupa2018-06-141-0/+29
| | | | | | | | | | | | | | | Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls to native IR. Reviewers: craig.topper, RKSimon, spatel, sroland Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47979 llvm-svn: 334741
* [AArch64] reverting rC334693 due to build failuresLuke Geeson2018-06-141-3/+0
| | | | llvm-svn: 334696
* [AArch64] Added support for the vcvta_u16_f16 instrinsic for FP16 Armv8.2-ALuke Geeson2018-06-141-0/+3
| | | | llvm-svn: 334693
* [COFF] Add ARM64 intrinsics: __yield, __wfe, __wfi, __sev, __sevlMandeep Singh Grang2018-06-131-0/+5
| | | | | | | | | | | | | | Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility. Reviewers: mstorsjo, compnerd, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D48132 llvm-svn: 334639
* [X86] Fix operand order in the shuffle created for blend builtins.Craig Topper2018-06-111-1/+1
| | | | | | This was broken when the builtin was added in r334249. llvm-svn: 334422
* [X86] Use target independent masked expandload and compressstore intrinsics ↵Craig Topper2018-06-101-0/+74
| | | | | | | | | | | | | | | | to implement expandload/compressstore builtins. Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them. Reviewers: RKSimon, delena, spatel, GBuella Reviewed By: RKSimon Subscribers: cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D47693 llvm-svn: 334366
* [NEON] Support VST1xN intrinsics in AArch32 mode (Clang part)Ivan A. Kosarev2018-06-101-28/+29
| | | | | | | | | We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47446 llvm-svn: 334362
* [X86] Add back some masked vector truncate builtins. Custom IRgen a a few ↵Craig Topper2018-06-081-0/+29
| | | | | | | | | | others. I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl. By using special buitlins we can emit a select without using the 128/256-bit select builtins. llvm-svn: 334331
* [X86] Fold masking into subvector extract builtins.Craig Topper2018-06-081-16/+21
| | | | | | | | I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features. The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported. llvm-svn: 334330
* [X86] Add builtins for vpermq/vpermpd instructions to enable target feature ↵Craig Topper2018-06-081-0/+18
| | | | | | checking. llvm-svn: 334311
* [X86] Add builtins for shufps and shufpd to enable target feature and ↵Craig Topper2018-06-081-0/+30
| | | | | | immediate range checking. llvm-svn: 334266
* [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature ↵Craig Topper2018-06-081-0/+51
| | | | | | and immediate range checking. llvm-svn: 334265
* [X86] Add subvector insert and extract builtins to enable target feature ↵Craig Topper2018-06-081-0/+69
| | | | | | | | checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261
* [X86] Add builtins for vpermilps/pd instructions to enable target feature ↵Craig Topper2018-06-081-0/+27
| | | | | | checking. llvm-svn: 334256
* [X86] Add builtins for blend with immediate control to enforce target ↵Craig Topper2018-06-081-0/+21
| | | | | | feature requirements and check immediate range. llvm-svn: 334249
* [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable ↵Craig Topper2018-06-071-0/+29
| | | | | | target feature checking and immediate range checking. llvm-svn: 334244
* [MS] Re-add support for the ARM interlocked bittest intrinscsReid Kleckner2018-06-071-68/+117
| | | | | | | | | | | | | | | Adds support for these intrinsics, which are ARM and ARM64 only: _interlockedbittestandreset_acq _interlockedbittestandreset_rel _interlockedbittestandreset_nf _interlockedbittestandset_acq _interlockedbittestandset_rel _interlockedbittestandset_nf Refactor the bittest intrinsic handling to decompose each intrinsic into its action, its width, and its atomicity. llvm-svn: 334239
* [X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking.Craig Topper2018-06-071-0/+20
| | | | | | | | We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature. I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that. llvm-svn: 334237
* [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar ↵Craig Topper2018-06-071-0/+62
| | | | | | | | | | intrinsics. We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits. This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously. llvm-svn: 334208
* [X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit ↵Craig Topper2018-06-071-61/+112
| | | | | | | | | | | | | | | | | | | | | | | fmadd/fmsub/fmaddsub/fmsubadd builtins. Summary: We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't. I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim. The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform. I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it. Reviewers: tkrupa, RKSimon, spatel, GBuella Reviewed By: tkrupa Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47724 llvm-svn: 334159
* [MS][ARM64]: Promote _setjmp to_setjmpex as there is no _setjmp in the ARM64 ↵Reid Kleckner2018-06-061-49/+55
| | | | | | | | | | | | libvcruntime.lib Factor out the common setjmp call emission code. Based on a patch by Chris January Differential Revision: https://reviews.llvm.org/D47784 llvm-svn: 334112
* Fix std::tuple errorsReid Kleckner2018-06-061-12/+12
| | | | llvm-svn: 334060
* Implement bittest intrinsics generically for non-x86 platformsReid Kleckner2018-06-061-26/+142
| | | | | | | | | | I tested these locally on an x86 machine by disabling the inline asm codepath and confirming that it does the same bitflips as we do with the inline asm. Addresses code review feedback. llvm-svn: 334059
* [X86] Add builtins for vector element insert and extract for different 128 ↵Craig Topper2018-06-061-2/+23
| | | | | | | | | | | | | | and 256 bit vector types. Use them to implement the extract and insert intrinsics. Previously we were just using extended vector operations in the header file. This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load. By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count. User code still has the option to use extended vector operations themselves if they need non-constant indexing. llvm-svn: 334057
* [X86] Implement __builtin_ia32_vec_ext_v2si correctly even though we only ↵Craig Topper2018-06-051-1/+1
| | | | | | | | use it with an index of 0. This builtin takes an index as its second operand, but the codegen hardcodes an index of 0 and doesn't use the operand. The only use of the builtin in the header file passes 0 to the operand so this works for that usage. But its more correct to use the real operand. llvm-svn: 334054
* Reimplement the bittest intrinsic family as builtins with inline asmReid Kleckner2018-06-051-19/+56
| | | | | | | | | | | We need to implement _interlockedbittestandset as a builtin for windows.h, so we might as well do the whole family. It reduces code duplication anyway. Fixes PR33188, a long standing bug in our bittest implementation encountered by Chakra. llvm-svn: 333978
* Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin ↵Reid Kleckner2018-06-041-19/+18
| | | | | | | | | | | | | | platforms." Adding __attribute__((aligned(32))) to __m256 breaks the implementation of _mm256_loadu_ps on Windows. On Windows, alignment attributes have higher precedence than packing attributes. We also might want to carefully consider the consequences of changing our vector typedefs, since many users copy them and invent their own new, non-Intel specific vector type names. llvm-svn: 333958
* [X86] Replace __builtin_ia32_vbroadcastf128_pd256 and ↵Craig Topper2018-06-031-26/+0
| | | | | | __builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call. llvm-svn: 333853
* [X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper ↵Craig Topper2018-06-031-10/+10
| | | | | | functions. NFC llvm-svn: 333851
* Revert r333848 "[X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 ↵Craig Topper2018-06-031-8/+8
| | | | | | | | builtin helper functions. NFC" Looks like I missed some changes to make this work. llvm-svn: 333850
* [X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper ↵Craig Topper2018-06-031-8/+8
| | | | | | functions. NFC llvm-svn: 333848
* [X86] When emitting masked loads/stores don't check for all ones mask.Craig Topper2018-06-031-10/+0
| | | | | | | | This seems like a premature optimization. It's unlikely a user would pass something the frontend can tell is all ones to the masked load/store intrinsics. We do this optimization for emitting select for masking because we have builtin calls in header files that pass an all ones mask in. Though at this point we may not longer have any builtins that emit some IR and a select. We may only have the select builtins so maybe we can remove that optimization too. llvm-svn: 333847
* [NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)Ivan A. Kosarev2018-06-021-30/+27
| | | | | | | | | We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333829
* Cap "voluntary" vector alignment at 16 for all Darwin platforms.John McCall2018-06-011-18/+19
| | | | | | | | | | | | | | | | | | | | | This fixes two major problems: - We were not capping vector alignment as desired on 32-bit ARM. - We were using different alignments based on the AVX settings on Intel, so we did not have a consistent ABI. This is an ABI break, but we think we can get away with it because vectors tend to be used mostly in inline code (which is why not having a consistent ABI has not proven disastrous on Intel). Intel's AVX types are specified as having 32-byte / 64-byte alignment, so align them explicitly instead of relying on the base ABI rule. Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects. The right long-term solution for this is for alignment attributes to be interpreted as true qualifiers and thus preserved in the canonical type. llvm-svn: 333791
* [WebAssembly] Update to the new names for the memory builtin functions.Dan Gohman2018-06-011-0/+15
| | | | | | | | | The WebAssembly committee has decided on the names `memory.size` and `memory.grow` for the memory intrinsics, so update the clang builtin functions to follow those names, keeping both sets of old names in place for compatibility. llvm-svn: 333712
* [X86] Lowering FMA intrinsics to native IR (Clang part)Gabor Buella2018-05-301-0/+94
| | | | | | | | | | | | | | | | This patch replaces all packed (and scalar without rounding mode) fused intrinsics with fmadd/fmaddsub variations. Then fmadd/fmaddsub are lowered to native IR. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47444 llvm-svn: 333555
* Support __iso_volatile_load8 etc on aarch64-win32.Simon Tatham2018-05-301-26/+42
| | | | | | | | | | | | | | | | | These intrinsics are used by MSVC's header files on AArch64 Windows as well as AArch32, so we should support them for both targets. I've factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate functions that EmitAArch64BuiltinExpr can call as well. Reviewers: javed.absar, mstorsjo Reviewed By: mstorsjo Subscribers: kristof.beyls, cfe-commits Differential Revision: https://reviews.llvm.org/D47476 llvm-svn: 333513
OpenPOWER on IntegriCloud