summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add subvector insert and extract builtins to enable target feature ↵Craig Topper2018-06-081-0/+69
| | | | | | | | checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261
* [X86] Add builtins for vpermilps/pd instructions to enable target feature ↵Craig Topper2018-06-081-0/+27
| | | | | | checking. llvm-svn: 334256
* [CodeGen] Always use MSVC personality for windows-msvc targetsShoaib Meenai2018-06-081-6/+10
| | | | | | | | | | | | | | | | | The windows-msvc target is meant to be ABI compatible with MSVC, including the exception handling. Ensure that a windows-msvc triple always equates to the MSVC personality being used. This mostly affects the GNUStep and ObjFW Obj-C runtimes. To the best of my knowledge, those are normally not used with windows-msvc triples. I believe WinObjC is based on GNUStep (or it at least uses libobjc2), but that also takes the approach of wrapping Obj-C exceptions in C++ exceptions, so the MSVC personality function is the right one to use there as well. Differential Revision: https://reviews.llvm.org/D47862 llvm-svn: 334253
* [X86] Add builtins for blend with immediate control to enforce target ↵Craig Topper2018-06-081-0/+21
| | | | | | feature requirements and check immediate range. llvm-svn: 334249
* [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable ↵Craig Topper2018-06-071-0/+29
| | | | | | target feature checking and immediate range checking. llvm-svn: 334244
* [MS] Re-add support for the ARM interlocked bittest intrinscsReid Kleckner2018-06-071-68/+117
| | | | | | | | | | | | | | | Adds support for these intrinsics, which are ARM and ARM64 only: _interlockedbittestandreset_acq _interlockedbittestandreset_rel _interlockedbittestandreset_nf _interlockedbittestandset_acq _interlockedbittestandset_rel _interlockedbittestandset_nf Refactor the bittest intrinsic handling to decompose each intrinsic into its action, its width, and its atomicity. llvm-svn: 334239
* [X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking.Craig Topper2018-06-071-0/+20
| | | | | | | | We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature. I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that. llvm-svn: 334237
* [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar ↵Craig Topper2018-06-071-0/+62
| | | | | | | | | | intrinsics. We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits. This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously. llvm-svn: 334208
* [CodeGen] Improve diagnostics related to target attributesGabor Buella2018-06-073-10/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When requirement imposed by __target__ attributes on functions are not satisfied, prefer printing those requirements, which are explicitly mentioned in the attributes. This makes such messages more useful, e.g. printing avx512f instead of avx2 in the following scenario: ``` $ cat foo.c static inline void __attribute__((__always_inline__, __target__("avx512f"))) x(void) { } int main(void) { x(); } $ clang foo.c foo.c:7:2: error: always_inline function 'x' requires target feature 'avx2', but would be inlined into function 'main' that is compiled without support for 'avx2' x(); ^ 1 error generated. ``` bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37338 Reviewers: craig.topper, echristo, dblaikie Reviewed By: craig.topper, echristo Differential Revision: https://reviews.llvm.org/D46541 llvm-svn: 334174
* [X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit ↵Craig Topper2018-06-071-61/+112
| | | | | | | | | | | | | | | | | | | | | | | fmadd/fmsub/fmaddsub/fmsubadd builtins. Summary: We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't. I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim. The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform. I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it. Reviewers: tkrupa, RKSimon, spatel, GBuella Reviewed By: tkrupa Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47724 llvm-svn: 334159
* [MS][ARM64]: Promote _setjmp to_setjmpex as there is no _setjmp in the ARM64 ↵Reid Kleckner2018-06-061-49/+55
| | | | | | | | | | | | libvcruntime.lib Factor out the common setjmp call emission code. Based on a patch by Chris January Differential Revision: https://reviews.llvm.org/D47784 llvm-svn: 334112
* Fix std::tuple errorsReid Kleckner2018-06-061-12/+12
| | | | llvm-svn: 334060
* Implement bittest intrinsics generically for non-x86 platformsReid Kleckner2018-06-061-26/+142
| | | | | | | | | | I tested these locally on an x86 machine by disabling the inline asm codepath and confirming that it does the same bitflips as we do with the inline asm. Addresses code review feedback. llvm-svn: 334059
* [X86] Add builtins for vector element insert and extract for different 128 ↵Craig Topper2018-06-061-2/+23
| | | | | | | | | | | | | | and 256 bit vector types. Use them to implement the extract and insert intrinsics. Previously we were just using extended vector operations in the header file. This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load. By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count. User code still has the option to use extended vector operations themselves if they need non-constant indexing. llvm-svn: 334057
* [X86] Implement __builtin_ia32_vec_ext_v2si correctly even though we only ↵Craig Topper2018-06-051-1/+1
| | | | | | | | use it with an index of 0. This builtin takes an index as its second operand, but the codegen hardcodes an index of 0 and doesn't use the operand. The only use of the builtin in the header file passes 0 to the operand so this works for that usage. But its more correct to use the real operand. llvm-svn: 334054
* [CUDA][HIP] Do not emit type info when compiling for deviceYaxun Liu2018-06-051-1/+1
| | | | | | | | | | | | | CUDA/HIP does not support RTTI on device side, therefore there is no point of emitting type info when compiling for device. Emitting type info for device not only clutters the IR with useless global variables, but also causes undefined symbol at linking since vtable for cxxabiv1::class_type_info has external linkage. Differential Revision: https://reviews.llvm.org/D47694 llvm-svn: 334021
* Reimplement the bittest intrinsic family as builtins with inline asmReid Kleckner2018-06-051-19/+56
| | | | | | | | | | | We need to implement _interlockedbittestandset as a builtin for windows.h, so we might as well do the whole family. It reduces code duplication anyway. Fixes PR33188, a long standing bug in our bittest implementation encountered by Chakra. llvm-svn: 333978
* Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin ↵Reid Kleckner2018-06-041-19/+18
| | | | | | | | | | | | | | platforms." Adding __attribute__((aligned(32))) to __m256 breaks the implementation of _mm256_loadu_ps on Windows. On Windows, alignment attributes have higher precedence than packing attributes. We also might want to carefully consider the consequences of changing our vector typedefs, since many users copy them and invent their own new, non-Intel specific vector type names. llvm-svn: 333958
* Update for an LLVM header file moveDavid Blaikie2018-06-041-1/+1
| | | | llvm-svn: 333955
* Remove llvm::Triple argument from get***Personality() functions. NFC.Heejin Ahn2018-06-041-17/+18
| | | | | | | | | | | | | | Summary: Because `llvm::Triple` can be derived from `TargetInfo`, it is simpler to take only `TargetInfo` argument. Reviewers: sbc100 Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47620 llvm-svn: 333938
* This diff includes changes for supporting the following types.Leonard Chan2018-06-043-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | // Primary fixed point types signed short _Accum s_short_accum; signed _Accum s_accum; signed long _Accum s_long_accum; unsigned short _Accum u_short_accum; unsigned _Accum u_accum; unsigned long _Accum u_long_accum; // Aliased fixed point types short _Accum short_accum; _Accum accum; long _Accum long_accum; This diff only allows for declaration of the fixed point types. Assignment and other operations done on fixed point types according to http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf will be added in future patches. The saturated versions of these types and the equivalent _Fract types will also be added in future patches. The tests included are for asserting that we can declare these types. Fixed the test that was failing by not checking for dso_local on some targets. Differential Revision: https://reviews.llvm.org/D46084 llvm-svn: 333923
* [X86] Replace __builtin_ia32_vbroadcastf128_pd256 and ↵Craig Topper2018-06-031-26/+0
| | | | | | __builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call. llvm-svn: 333853
* [X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper ↵Craig Topper2018-06-031-10/+10
| | | | | | functions. NFC llvm-svn: 333851
* Revert r333848 "[X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 ↵Craig Topper2018-06-031-8/+8
| | | | | | | | builtin helper functions. NFC" Looks like I missed some changes to make this work. llvm-svn: 333850
* [X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper ↵Craig Topper2018-06-031-8/+8
| | | | | | functions. NFC llvm-svn: 333848
* [X86] When emitting masked loads/stores don't check for all ones mask.Craig Topper2018-06-031-10/+0
| | | | | | | | This seems like a premature optimization. It's unlikely a user would pass something the frontend can tell is all ones to the masked load/store intrinsics. We do this optimization for emitting select for masking because we have builtin calls in header files that pass an all ones mask in. Though at this point we may not longer have any builtins that emit some IR and a select. We may only have the select builtins so maybe we can remove that optimization too. llvm-svn: 333847
* [NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)Ivan A. Kosarev2018-06-021-30/+27
| | | | | | | | | We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333829
* Revert "This diff includes changes for supporting the following types."Leonard Chan2018-06-023-22/+0
| | | | | | | This reverts commit r333814, which fails for a test checking the bit width on ubuntu. llvm-svn: 333815
* This diff includes changes for supporting the following types.Leonard Chan2018-06-023-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | ``` // Primary fixed point types signed short _Accum s_short_accum; signed _Accum s_accum; signed long _Accum s_long_accum; unsigned short _Accum u_short_accum; unsigned _Accum u_accum; unsigned long _Accum u_long_accum; // Aliased fixed point types short _Accum short_accum; _Accum accum; long _Accum long_accum; ``` This diff only allows for declaration of the fixed point types. Assignment and other operations done on fixed point types according to http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf will be added in future patches. The saturated versions of these types and the equivalent `_Fract` types will also be added in future patches. The tests included are for asserting that we can declare these types. Differential Revision: https://reviews.llvm.org/D46084 llvm-svn: 333814
* Cap "voluntary" vector alignment at 16 for all Darwin platforms.John McCall2018-06-011-18/+19
| | | | | | | | | | | | | | | | | | | | | This fixes two major problems: - We were not capping vector alignment as desired on 32-bit ARM. - We were using different alignments based on the AVX settings on Intel, so we did not have a consistent ABI. This is an ABI break, but we think we can get away with it because vectors tend to be used mostly in inline code (which is why not having a consistent ABI has not proven disastrous on Intel). Intel's AVX types are specified as having 32-byte / 64-byte alignment, so align them explicitly instead of relying on the base ABI rule. Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects. The right long-term solution for this is for alignment attributes to be interpreted as true qualifiers and thus preserved in the canonical type. llvm-svn: 333791
* [WebAssembly] Hide new Wasm EH behind its feature flagHeejin Ahn2018-06-012-10/+17
| | | | | | | | | | | | | | | | Summary: clang's current wasm EH implementation is a non-MVP feature in progress. We had a `-mexception-handling` wasm feature but were not using it. This patch hides the non-MVP wasm EH behind a flag, so it does not affect other code for now. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits Differential Revision: https://reviews.llvm.org/D47614 llvm-svn: 333716
* [Coverage] End deferred regions before labels, fixes PR35867Vedant Kumar2018-06-011-0/+1
| | | | | | | | | A deferred region should end before the start of a label, and should not extend to the start of the label sub-statement. Fixes llvm.org/PR35867. llvm-svn: 333715
* [WebAssembly] Update to the new names for the memory builtin functions.Dan Gohman2018-06-011-0/+15
| | | | | | | | | The WebAssembly committee has decided on the names `memory.size` and `memory.grow` for the memory intrinsics, so update the clang builtin functions to follow those names, keeping both sets of old names in place for compatibility. llvm-svn: 333712
* [WebAssembly] Use Windows EH instructions for Wasm EHHeejin Ahn2018-05-317-23/+204
| | | | | | | | | | | | | | | | | | | | | | | Summary: Because wasm control flow needs to be structured, using WinEH instructions to support wasm EH brings several benefits. This patch makes wasm EH uses Windows EH instructions, with some changes: 1. Because wasm uses a single catch block to catch all C++ exceptions, this merges all catch clauses into a single catchpad, within which we test the EH selector as in Itanium EH. 2. Generates a call to `__clang_call_terminate` in case a cleanup throws. Wasm does not have a runtime to handle this. 3. In case there is no catch-all clause, inserts a call to `__cxa_rethrow` at the end of a catchpad in order to unwind to an enclosing EH scope. Reviewers: majnemer, dschuff Subscribers: jfb, sbc100, jgravelle-google, sunfish, cfe-commits Differential Revision: https://reviews.llvm.org/D44931 llvm-svn: 333703
* Fix null MSInheritanceAttr deref in CXXRecordDecl::getMSInheritanceModel()Reid Kleckner2018-05-311-1/+1
| | | | | | | | | | | | | | Ensure latest MPT decl has a MSInheritanceAttr when instantiating templates, to avoid null MSInheritanceAttr deref in CXXRecordDecl::getMSInheritanceModel(). See PR#37399 for repo / details. Patch by Andrew Rogers! Differential Revision: https://reviews.llvm.org/D46664 llvm-svn: 333680
* IRGen: Write .dwo files when -split-dwarf-file is used together with ↵Peter Collingbourne2018-05-311-0/+1
| | | | | | | | -fthinlto-index. Differential Revision: https://reviews.llvm.org/D47597 llvm-svn: 333677
* [Coverage] Discard the last uncompleted deferred region in a declVedant Kumar2018-05-301-25/+5
| | | | | | | | | | | | | | | | | | | Discard the last uncompleted deferred region in a decl, if one exists. This prevents lines at the end of a function containing only whitespace or closing braces from being marked as uncovered, if they follow a region terminator (return/break/etc). The previous behavior was to heuristically complete deferred regions at the end of a decl. In practice this ended up being too brittle for too little gain. Users would complain that there was no way to reach full code coverage because whitespace at the end of a function would be marked uncovered. rdar://40238228 Differential Revision: https://reviews.llvm.org/D46918 llvm-svn: 333609
* IRGen: Rename bitsets -> type metadata. NFC.Peter Collingbourne2018-05-301-18/+17
| | | | | | | "Type metadata" is the term that we've been using for the CFI-related information on vtables for a while now. llvm-svn: 333602
* [X86] Lowering FMA intrinsics to native IR (Clang part)Gabor Buella2018-05-301-0/+94
| | | | | | | | | | | | | | | | This patch replaces all packed (and scalar without rounding mode) fused intrinsics with fmadd/fmaddsub variations. Then fmadd/fmaddsub are lowered to native IR. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47444 llvm-svn: 333555
* Support __iso_volatile_load8 etc on aarch64-win32.Simon Tatham2018-05-302-26/+46
| | | | | | | | | | | | | | | | | These intrinsics are used by MSVC's header files on AArch64 Windows as well as AArch32, so we should support them for both targets. I've factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate functions that EmitAArch64BuiltinExpr can call as well. Reviewers: javed.absar, mstorsjo Reviewed By: mstorsjo Subscribers: kristof.beyls, cfe-commits Differential Revision: https://reviews.llvm.org/D47476 llvm-svn: 333513
* Make the mangled name collision diagnostic a bit more useful by listing the ↵Richard Smith2018-05-301-5/+6
| | | | | | | | | | mangling. This helps especially when the collision is for a template specialization, where the template arguments are not available from anywhere else in the diagnostic, and are likely relevant to the problem. llvm-svn: 333489
* Revert r332839.Richard Smith2018-05-301-14/+4
| | | | | | | This is causing miscompiles and "definition with same mangled name as another definition" errors. llvm-svn: 333482
* [CodeGen][Darwin] Set the calling-convention of thread-local variableAkira Hatanaka2018-05-291-1/+5
| | | | | | | | | | | | | | | initialization functions to 'cxx_fast_tlscc'. This fixes a bug where instructions calling initialization functions for thread-local static members of c++ template classes were using calling convention 'cxx_fast_tlscc' while the called functions weren't annotated with the calling convention. rdar://problem/40447463 Differential Revision: https://reviews.llvm.org/D47354 llvm-svn: 333447
* Revert "[DebugInfo] Don't bother with MD5 checksums of preprocessed files."Paul Robinson2018-05-252-13/+4
| | | | | | | This reverts commit d734f2aa3f76fbf355ecd2bbe081d0c1f49867ab. Also known as r333311. A very small but nonzero number of bots fail. llvm-svn: 333319
* Support Swift calling convention for PPC64 targetsBob Wilson2018-05-251-2/+11
| | | | | | | This adds basic support for the Swift calling convention with PPC64 targets. Patch provided by Atul Sowani in bug report #37223 llvm-svn: 333316
* [DebugInfo] Don't bother with MD5 checksums of preprocessed files.Paul Robinson2018-05-252-4/+13
| | | | | | | | | | The checksum will not reflect the real source, so there's no clear reason to include them in the debug info. Also this was causing a crash on the DWARF side. Differential Revision: https://reviews.llvm.org/D47260 llvm-svn: 333311
* [OPENMP, NVPTX] Fixed codegen for orphaned parallel region.Alexey Bataev2018-05-251-25/+19
| | | | | | | | | | | | | | If orphaned parallel region is found, the next code must be emitted: ``` if(__kmpc_is_spmd_exec_mode() || __kmpc_parallel_level(loc, gtid)) Serialized execution. else if (IsMasterThread()) Prepare and signal worker. else Outined function call. ``` llvm-svn: 333301
* Use zeroinitializer for (trailing zero portion of) large array initializersRichard Smith2018-05-231-79/+91
| | | | | | | | more reliably. This re-commits r333044 with a fix for PR37560. llvm-svn: 333141
* Revert r333044 "Use zeroinitializer for (trailing zero portion of) large ↵Hans Wennborg2018-05-231-78/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | array initializers" It caused asserts, see PR37560. > Use zeroinitializer for (trailing zero portion of) large array initializers > more reliably. > > Clang has two different ways it emits array constants (from InitListExprs and > from APValues), and both had some ability to emit zeroinitializer, but neither > was able to catch all cases where we could use zeroinitializer reliably. In > particular, emitting from an APValue would fail to notice if all the explicit > array elements happened to be zero. In addition, for large arrays where only an > initial portion has an explicit initializer, we would emit the complete > initializer (which could be huge) rather than emitting only the non-zero > portion. With this change, when the element would have a suffix of more than 8 > zero elements, we emit the array constant as a packed struct of its initial > portion followed by a zeroinitializer constant for the trailing zero portion. > > In passing, I found a bug where SemaInit would sometimes walk the entire array > when checking an initializer that only covers the first few elements; that's > fixed here to unblock testing of the rest. > > Differential Revision: https://reviews.llvm.org/D47166 llvm-svn: 333067
* [X86] Remove mask argument from more builtins that are handled completely in ↵Craig Topper2018-05-231-38/+33
| | | | | | CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead. llvm-svn: 333061
OpenPOWER on IntegriCloud