bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add __warn_memset_zero_len builtin as a workaround for glibc issue	serge-sans-paille	2020-01-17	1	-0/+2
\| \| \| \| \| \| \| \| \|	Glibc issue: https://sourceware.org/bugzilla/show_bug.cgi?id=25399 The fix consist in considering the missing function as a builtin lowered to a nop. Differential Revision: https://reviews.llvm.org/D72869 (cherry picked from commit d293417931d3a9d46799b42795988ca3b5cfd766)
*	Make helper functions static or move them into anonymous namespaces. NFC.	Benjamin Kramer	2020-01-14	1	-0/+2
\|
*	Add builtins for aligning and checking alignment of pointers and integers	Alex Richardson	2020-01-09	1	-0/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change introduces three new builtins (which work on both pointers and integers) that can be used instead of common bitwise arithmetic: __builtin_align_up(x, alignment), __builtin_align_down(x, alignment) and __builtin_is_aligned(x, alignment). I originally added these builtins to the CHERI fork of LLVM a few years ago to handle the slightly different C semantics that we use for CHERI [1]. Until recently these builtins (or sequences of other builtins) were required to generate correct code. I have since made changes to the default C semantics so that they are no longer strictly necessary (but using them does generate slightly more efficient code). However, based on our experience using them in various projects over the past few years, I believe that adding these builtins to clang would be useful. These builtins have the following benefit over bit-manipulation and casts via uintptr_t: - The named builtins clearly convey the semantics of the operation. While checking alignment using __builtin_is_aligned(x, 16) versus ((x & 15) == 0) is probably not a huge win in readably, I personally find __builtin_align_up(x, N) a lot easier to read than (x+(N-1))&~(N-1). - They preserve the type of the argument (including const qualifiers). When using casts via uintptr_t, it is easy to cast to the wrong type or strip qualifiers such as const. - If the alignment argument is a constant value, clang can check that it is a power-of-two and within the range of the type. Since the semantics of these builtins is well defined compared to arbitrary bit-manipulation, it is possible to add a UBSAN checker that the run-time value is a valid power-of-two. I intend to add this as a follow-up to this change. - The builtins avoids int-to-pointer casts both in C and LLVM IR. In the future (i.e. once most optimizations handle it), we could use the new llvm.ptrmask intrinsic to avoid the ptrtoint instruction that would normally be generated. - They can be used to round up/down to the next aligned value for both integers and pointers without requiring two separate macros. - In many projects the alignment operations are already wrapped in macros (e.g. roundup2 and rounddown2 in FreeBSD), so by replacing the macro implementation with a builtin call, we get improved diagnostics for many call-sites while only having to change a few lines. - Finally, the builtins also emit assume_aligned metadata when used on pointers. This can improve code generation compared to the uintptr_t casts. [1] In our CHERI compiler we have compilation mode where all pointers are implemented as capabilities (essentially unforgeable 128-bit fat pointers). In our original model, casts from uintptr_t (which is a 128-bit capability) to an integer value returned the "offset" of the capability (i.e. the difference between the virtual address and the base of the allocation). This causes problems for cases such as checking the alignment: for example, the expression `if ((uintptr_t)ptr & 63) == 0` is generally used to check if the pointer is aligned to a multiple of 64 bytes. The problem with offsets is that any pointer to the beginning of an allocation will have an offset of zero, so this check always succeeds in that case (even if the address is not correctly aligned). The same issues also exist when aligning up or down. Using the alignment builtins ensures that the address is used instead of the offset. While I have since changed the default C semantics to return the address instead of the offset when casting, this offset compilation mode can still be used by passing a command-line flag. Reviewers: rsmith, aaron.ballman, theraven, fhahn, lebedev.ri, nlopes, aqjune Reviewed By: aaron.ballman, lebedev.ri Differential Revision: https://reviews.llvm.org/D71499
*	[Clang] Handle target-specific builtins returning aggregates.	Simon Tatham	2020-01-09	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A few of the ARM MVE builtins directly return a structure type. This causes an assertion failure at code-gen time if you try to assign the result of the builtin to a variable, because the `RValue` created in `EmitBuiltinExpr` from the `llvm::Value` produced by codegen is always made by `RValue::get()`, which creates a non-aggregate `RValue` that will fail an assertion when `AggExprEmitter::withReturnValueSlot` calls `Src.getAggregatePointer()`. A similar failure occurs if you try to use the struct return value directly to extract one field, e.g. `vld2q(address).val[0]`. The existing code-gen tests for those MVE builtins pass the returned structure type directly to the C `return` statement, which apparently managed to avoid that particular code path, so we didn't notice the crash. Now `EmitBuiltinExpr` checks the evaluation kind of the builtin's return value, and does the necessary handling for aggregate returns. I've added two extra test cases, both of which crashed before this change. Reviewers: dmgreen, rjmccall Reviewed By: rjmccall Subscribers: kristof.beyls, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D72271
*	Improve support of GNU mempcpy	serge-sans-paille	2020-01-09	1	-2/+8
\| \| \| \| \| \| \|	- Lower to the memcpy intrinsic - Raise warnings when size/bounds are known Differential Revision: https://reviews.llvm.org/D71374
*	Add Triple::isX86()	Fangrui Song	2020-01-06	1	-2/+1
\| \| \| \| \| \|	Reviewed By: craig.topper, skan Differential Revision: https://reviews.llvm.org/D72247
*	[SystemZ] Use FNeg in s390x clang builtins	Kevin P. Neal	2020-01-02	1	-9/+5
\| \| \| \|	The s390x builtins are still using FSub instead of FNeg. Correct that.
*	[CodeGen] Emit conj/conjf/confjl libcalls as fneg instructions if possible.	Craig Topper	2019-12-31	1	-1/+4
\| \| \| \| \| \| \|	We already recognize the __builtin versions of these, might as well recognize the libcall version. Differential Revision: https://reviews.llvm.org/D72028
*	[CodeGen] Use IRBuilder::CreateFNeg for __builtin_conj	Craig Topper	2019-12-30	1	-6/+1
\| \| \| \| \| \|	This replaces the fsub -0.0 idiom with an fneg instruction. We didn't see to have a test that showed the current codegen. Just some tests for constant folding and a test that was only checking the declare lines for libcalls. The latter just checked that we did not have a declare for @conj when using __builtin_conj. Differential Revision: https://reviews.llvm.org/D72012
*	[NFC] Remove some dead code from CGBuiltin.cpp.	Kevin P. Neal	2019-12-24	1	-14/+0
\|
*	ConstrainedFP: use API compatible with opaque pointers.	Tim Northover	2019-12-19	1	-6/+6
\| \| \| \| \| \|	This just updates an IRBuilder interface to take Functions instead of Values so the type can be derived, and fixes some callsites in Clang to call the updated API.
*	[WebAssembly] Add avgr_u intrinsics and require nuw in patterns	Thomas Lively	2019-12-18	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The vector pattern `(a + b + 1) / 2` was previously selected to an avgr_u instruction regardless of nuw flags, but this is incorrect in the case where either addition may have an unsigned wrap. This CL changes the existing pattern to require both adds to have nuw flags and adds builtin functions and intrinsics for the avgr_u instructions because the corrected pattern is not representable in C. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71648
*	[WebAssembly] Replace SIMD int min/max builtins with patterns	Thomas Lively	2019-12-16	1	-43/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The instructions were originally implemented via builtins and intrinsics so users would have to explicitly opt-in to using them. This was useful while were validating whether these instructions should have been merged into the spec proposal. Now that they have been, we can use normal codegen patterns, so the intrinsics and builtins are no longer useful. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71500
*	[IR] Split out target specific intrinsic enums into separate headers	Reid Kleckner	2019-12-11	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320
*	[ARM][MVE] Add intrinsics for immediate shifts. (reland)	Simon Tatham	2019-12-11	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen, MarkMurrayARM Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065
*	[FPEnv] clang support for constrained FP builtins	Kevin P. Neal	2019-12-10	1	-34/+147
\| \| \| \| \| \| \| \|	Change the IRBuilder and clang so that constrained FP intrinsics will be emitted for builtins when appropriate. Only non-target-specific builtins are affected in this patch. Differential Revision: https://reviews.llvm.org/D70256
*	[Alignment][NFC] CreateMemSet use MaybeAlign	Guillaume Chatelet	2019-12-10	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71213
*	Revert "[ARM][MVE] Add intrinsics for immediate shifts."	Eric Christopher	2019-12-09	1	-29/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	and two follow-on commits: one warning fix and one functionality. As it's breaking at least the lto bot: http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15132/steps/test-stage1-compiler/logs/stdio This reverts commits: 8d70f3c933a5b81a87a5ab1af0e3e98ee2cd7c67 ff4dceef9201c5ae3924e92f6955977f243ac71d d97b3e3e65cd77a81b39732af84a1a4229e95091
*	Avoid Attr.h includes, CodeGen edition	Reid Kleckner	2019-12-09	1	-0/+1
\| \| \| \|	This saves around 20 includes of Attr.h. Not much.
*	Fix the compiler warnings: "-Winconsistent-missing-override", ↵	Haojian Wu	2019-12-09	1	-2/+2
\| \| \| \| \| \|	"-Wunused-variable" for d97b3e3e65cd77a81b39732af84a1a4229e95091
*	[ARM][MVE] Add intrinsics for immediate shifts.	Simon Tatham	2019-12-09	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065
*	[NFC] Pass a reference to CodeGenFunction to methods of LValue and	Akira Hatanaka	2019-12-03	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	AggValueSlot This reapplies 8a5b7c35709d9ce1f44a99f0c5b084bf2696ea17 after a null dereference bug in CGOpenMPRuntime::emitUserDefinedMapper. Original commit message: This is needed for the pointer authentication work we plan to do in the near future. https://github.com/apple/llvm-project/blob/a63a81bd9911f87a0b5dcd5bdd7ccdda7124af87/clang/docs/PointerAuthentication.rst
*	Revert "[NFC] Pass a reference to CodeGenFunction to methods of LValue and"	Akira Hatanaka	2019-12-03	1	-7/+7
\| \| \| \| \|	This reverts commit 8a5b7c35709d9ce1f44a99f0c5b084bf2696ea17. This seems to have broken UBSan because of a null dereference.
*	[NFC] Pass a reference to CodeGenFunction to methods of LValue and	Akira Hatanaka	2019-12-03	1	-7/+7
\| \| \| \| \| \| \| \| \|	AggValueSlot This is needed for the pointer authentication work we plan to do in the near future. https://github.com/apple/llvm-project/blob/a63a81bd9911f87a0b5dcd5bdd7ccdda7124af87/clang/docs/PointerAuthentication.rst
*	[ARM][AArch64] Complex addition Neon intrinsics for Armv8.3-A	Victor Campos	2019-12-02	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add support for vcadd_* family of intrinsics. This set of intrinsics is available in Armv8.3-A. The fp16 versions require the FP16 extension, which has been available (opt-in) since Armv8.2-A. Reviewers: t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70862
*	[ARM] Replace arm_neon_vqadds with sadd_sat	David Green	2019-11-27	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat. This helps generate vqadds from standard IR nodes, which might be produced from the vectoriser. The old variants are removed in the process. Differential Revision: https://reviews.llvm.org/D69350
*	NeonEmitter: remove special case on casting polymorphic builtins.	Tim Northover	2019-11-20	1	-0/+5
\| \| \| \| \| \| \|	For some reason we were not casting a fairly obscure class of builtin calls we expected to be polymorphic to vectors of char. It worked because the only affected intrinsics weren't actually polymorphic after all, but is unnecessarily complicated.
*	[ARM,MVE] Add intrinsics for vector comparisons.	Simon Tatham	2019-11-18	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the `vcmp` family of ACLE MVE intrinsics: vector/vector, vector/scalar, and the predicated forms of both. All are represented using standard existing IR: vector/scalar comparisons are represented by making a vector out of the scalar first, and predicated forms are represented by taking the bitwise AND of the input predicate and the output of the comparison. Existing LLVM-side tests demonstrate that ISel will pattern-match all of that back down to single MVE VCMPs. The idiom of handling a vector/scalar operation by generating IR to expand the scalar into a second vector is going to be needed for a lot of MVE intrinsics, so to make that easy, I've provided a helper function that automatically works out the element count. The comparison intrinsics are the first ones that have to //return// a predicate, in the user-facing `mve_pred16_t` format. This means we have to use the `arm_mve_pred_v2i` low-level intrinsic to convert it back from the logical `<n x i1>` form used in IR. I've done that explicitly in the code gen specification for the builtins, because it happens much more rarely in the ACLE API than passing a Predicate as input, so it didn't seem worth automating in MveEmitter. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70297
*	[ARM,MVE] Add intrinsics for contiguous load/stores.	Simon Tatham	2019-11-13	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the ACLE intrinsics for all the MVE load and store instructions not already handled by D69791. These ones don't need new IR intrinsics, because they can be implemented in terms of standard LLVM IR constructions. Some of the load and store instructions access less than 128 bits of memory, sign/zero extending each value to a wider vector lane on load or truncating it on store. These are represented in IR by a load of a shorter vector followed by a zext/sext, and conversely, a trunc followed by a short store. Existing ISel patterns already recognize those combinations and turn them into the right MVE instructions. The predicated forms of all these instructions are represented in the same way, except that the ordinary load/store operation is replaced with the existing intrinsics @llvm.masked.{load,store}. These are currently only code-generated as predicated MVE load/store instructions if you give LLVM the `-enable-arm-maskedldst` option; so I've done that in the LLVM codegen test. When we make that the default, that option can be removed. In the Tablegen backend, I've had to add a handful of extra support features: * We need to be able to make clang::Address objects out of a pointer and an alignment (previously we only needed these when the user passed us an existing one). * We can now specify vector types that aren't 128 bits wide (for use in those intermediate values in IR), the parametrized type system can make one starting from two existing vector types (using the lane count of one and the element type of the other). * I've added support for code generation of pointer casts, and for specifying LLVM types as operands to IRBuilder operations (for zext and sext, though I think they'll come in useful again). * Now not all IR construction operations need to be specified as Builder.CreateFoo; some don't involve a Builder at all, and one passes it as a parameter to a tiny static helper function in CGBuiltin.cpp. Reviewers: ostannard, MarkMurrayARM, dmgreen Subscribers: kristof.beyls, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70088
*	AArch64: add arm64_32 support to Clang.	Tim Northover	2019-11-12	1	-1/+3
\|
*	[WebAssembly] Add experimental SIMD dot product instruction	Thomas Lively	2019-11-01	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This instruction is not merged to the spec proposal, but we need it to be implemented in the toolchain to experiment with it. It is available only on an opt-in basis through a clang builtin. Defined in https://github.com/WebAssembly/simd/pull/127. Depends on D69696. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69697
*	[WebAssembly] SIMD integer min and max instructions	Thomas Lively	2019-10-31	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Introduces a clang builtins and LLVM intrinsics representing integer min/max instructions. These instructions have not been merged to the SIMD spec proposal yet, so they are currently opt-in only via builtins and not produced by general pattern matching. If these instructions are accepted into the spec proposal the builtins and intrinsics will be replaced with normal pattern matching. Defined in https://github.com/WebAssembly/simd/pull/27. Reviewers: aheejin Reviewed By: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69696
*	[ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE	vhscampos	2019-10-28	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Writing support for three ACLE functions: unsigned int __cls(uint32_t x) unsigned int __clsl(unsigned long x) unsigned int __clsll(uint64_t x) CLS stands for "Count number of leading sign bits". In AArch64, these two intrinsics can be translated into the 'cls' instruction directly. In AArch32, on the other hand, this functionality is achieved by implementing it in terms of clz (count number of leading zeros). Reviewers: compnerd Reviewed By: compnerd Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69250
*	Fix compilation warning. NFC.	Michael Liao	2019-10-25	1	-0/+1
\|
*	[clang]Fixup clang -Werror,,-Wcovered-switch-default build failures	Jinsong Ji	2019-10-24	1	-3/+0
\| \| \| \| \| \| \| \| \|	llvm/clang/lib/CodeGen/CGBuiltin.cpp:6877:3: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] Similar to https://reviews.llvm.org/rG7b3de1e811972b874d91554642ccb2ef5b32eed6
*	[clang,ARM] Initial ACLE intrinsics for MVE.	Simon Tatham	2019-10-24	1	-5/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit sets up the infrastructure for auto-generating <arm_mve.h> and doing clang-side code generation for the builtins it relies on, and demonstrates that it works by implementing a representative sample of the ACLE intrinsics, more or less matching the ones introduced in LLVM IR by D67158,D68699,D68700. Like NEON, that header file will provide a set of vector types like uint16x8_t and C functions with names like vaddq_u32(). Unlike NEON, the ACLE spec for <arm_mve.h> includes a polymorphism system, so that you can write plain vaddq() and disambiguate by the vector types you pass to it. Unlike the corresponding NEON code, I've arranged to make every user- facing ACLE intrinsic into a clang builtin, and implement all the code generation inside clang. So <arm_mve.h> itself contains nothing but typedefs and function declarations, with the latter all using the new `__attribute__((__clang_builtin))` system to arrange that the user- facing function names correspond to the right internal BuiltinIDs. So the new MveEmitter tablegen system specifies the full sequence of IRBuilder operations that each user-facing ACLE intrinsic should translate into. Where possible, the ACLE intrinsics map to standard IR operations such as vector-typed `add` and `fadd`; where no standard representation exists, I call down to the sample IR intrinsics introduced in an earlier commit. Doing it like this means that you get the polymorphism for free just by using __attribute__((overloadable)): the clang overload resolution decides which function declaration is the relevant one, and _then_ its BuiltinID is looked up, so by the time we're doing code generation, that's all been resolved by the standard system. It also means that you get really nice error messages if the user passes the wrong combination of types: clang will show the declarations from the header file and explain why each one doesn't match. (The obvious alternative approach would be to have wrapper functions in <arm_mve.h> which pass their arguments to the underlying builtins. But that doesn't work in the case where one of the arguments has to be a constant integer: the wrapper function can't pass the constantness through. So you'd have to do that case using a macro instead, and then use C11 `_Generic` to handle the polymorphism. Then you have to add horrible workarounds because `_Generic` requires even the untaken branches to type-check successfully, and //then// if the user gets the types wrong, the error message is totally unreadable!) Reviewers: dmgreen, miyuki, ostannard Subscribers: mgorny, javed.absar, kristof.beyls, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D67161
*	CGBuiltin - silence static analyzer getAs<> null dereference warnings. NFCI.	Simon Pilgrim	2019-10-16	1	-5/+4
\| \| \| \| \| \|	The static analyzer is warning about potential null dereferences, but in these cases we should be able to use castAs<> directly and if not assert will fire for us. llvm-svn: 374987
*	[WebAssembly] Trapping fptoint builtins and intrinsics	Thomas Lively	2019-10-15	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The WebAssembly backend lowers fptoint instructions to a code sequence that checks for overflow to avoid traps because fptoint is supposed to be speculatable. These new builtins and intrinsics give users a way to depend on the trapping semantics of the underlying instructions and avoid the extra code generated normally. Patch by coffee and tlively. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68902 llvm-svn: 374856
*	Improve __builtin_constant_p lowering	Joerg Sonnenberger	2019-10-13	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__builtin_constant_p used to be short-cut evaluated to false when building with -O0. This is undesirable as it means that constant folding in the front-end can give different results than folding in the back-end. It can also create conditional branches on constant conditions that don't get folded away. With the pending improvements to the llvm.is.constant handling on the LLVM side, the short-cut is no longer useful. Adjust various codegen tests to not depend on the short-cut or the backend optimisations. Differential Revision: https://reviews.llvm.org/D67638 llvm-svn: 374742
*	Reland r374450 with Richard Smith's comments and test fixed.	Erich Keane	2019-10-11	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	The behavior from the original patch has changed, since we're no longer allowing LLVM to just ignore the alignment. Instead, we're just assuming the maximum possible alignment. Differential Revision: https://reviews.llvm.org/D68824 llvm-svn: 374562
*	Revert 374450 "Fix __builtin_assume_aligned with too large values."	Nico Weber	2019-10-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	The test fails on Windows, with error: 'warning' diagnostics expected but not seen: File builtin-assume-aligned.c Line 62: requested alignment must be 268435456 bytes or smaller; assumption ignored error: 'warning' diagnostics seen but not expected: File builtin-assume-aligned.c Line 62: requested alignment must be 8192 bytes or smaller; assumption ignored llvm-svn: 374456
*	Fix __builtin_assume_aligned with too large values.	Erich Keane	2019-10-10	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code to handle __builtin_assume_aligned was allowing larger values, but would convert this to unsigned along the way. This patch removes the EmitAssumeAligned overloads that take unsigned to do away with this problem. Additionally, it adds a warning that values greater than 1 <<29 are ignored by LLVM. Differential Revision: https://reviews.llvm.org/D68824 llvm-svn: 374450
*	[WebAssembly] Add builtin and intrinsic for v8x16.swizzle	Thomas Lively	2019-10-09	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This clang builtin and corresponding LLVM intrinsic are necessary to expose the exact semantics of the underlying WebAssembly instruction to users. LLVM produces a poison value if the dynamic swizzle indices are greater than the vector size, but the WebAssembly instruction sets the corresponding output lane to zero. Users who depend on this behavior can safely use this builtin. Depends on D68527. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68531 llvm-svn: 374189
*	[BPF] do compile-once run-everywhere relocation for bitfields	Yonghong Song	2019-10-08	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A bpf specific clang intrinsic is introduced: u32 __builtin_preserve_field_info(member_access, info_kind) Depending on info_kind, different information will be returned to the program. A relocation is also recorded for this builtin so that bpf loader can patch the instruction on the target host. This clang intrinsic is used to get certain information to facilitate struct/union member relocations. The offset relocation is extended by 4 bytes to include relocation kind. Currently supported relocation kinds are enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; for __builtin_preserve_field_info. The old access offset relocation is covered by FIELD_BYTE_OFFSET = 0. An example: struct s { int a; int b1:9; int b2:4; }; enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; void bpf_probe_read(void , unsigned, const void ); int field_read(struct s arg) { unsigned long long ull = 0; unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET); unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE); #ifdef USE_PROBE_READ bpf_probe_read(&ull, size, (const void )arg + offset); unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ lshift = lshift + (size << 3) - 64; #endif #else switch(size) { case 1: ull = (unsigned char )((void )arg + offset); break; case 2: ull = (unsigned short )((void )arg + offset); break; case 4: ull = (unsigned int )((void )arg + offset); break; case 8: ull = (unsigned long long )((void )arg + offset); break; } unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #endif ull <<= lshift; if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS)) return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); } There is a minor overhead for bpf_probe_read() on big endian. The code and relocation generated for field_read where bpf_probe_read() is used to access argument data on little endian mode: r3 = r1 r1 = 0 r1 = 4 <=== relocation (FIELD_BYTE_OFFSET) r3 += r1 r1 = r10 r1 += -8 r2 = 4 <=== relocation (FIELD_BYTE_SIZE) call bpf_probe_read r2 = 51 <=== relocation (FIELD_LSHIFT_U64) r1 = (u64 )(r10 - 8) r1 <<= r2 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) r0 = r1 r0 >>= r2 r3 = 1 <=== relocation (FIELD_SIGNEDNESS) if r3 == 0 goto LBB0_2 r1 s>>= r2 r0 = r1 LBB0_2: exit Compare to the above code between relocations FIELD_LSHIFT_U64 and FIELD_LSHIFT_U64, the code with big endian mode has four more instructions. r1 = 41 <=== relocation (FIELD_LSHIFT_U64) r6 += r1 r6 += -64 r6 <<= 32 r6 >>= 32 r1 = (u64 )(r10 - 8) r1 <<= r6 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) The code and relocation generated when using direct load. r2 = 0 r3 = 4 r4 = 4 if r4 s> 3 goto LBB0_3 if r4 == 1 goto LBB0_5 if r4 == 2 goto LBB0_6 goto LBB0_9 LBB0_6: # %sw.bb1 r1 += r3 r2 = (u16 )(r1 + 0) goto LBB0_9 LBB0_3: # %entry if r4 == 4 goto LBB0_7 if r4 == 8 goto LBB0_8 goto LBB0_9 LBB0_8: # %sw.bb9 r1 += r3 r2 = (u64 )(r1 + 0) goto LBB0_9 LBB0_5: # %sw.bb r1 += r3 r2 = (u8 )(r1 + 0) goto LBB0_9 LBB0_7: # %sw.bb5 r1 += r3 r2 = (u32 )(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 r2 <<= r1 r1 = 60 r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit Considering verifier is able to do limited constant propogation following branches. The following is the code actually traversed. r2 = 0 r3 = 4 <=== relocation r4 = 4 <=== relocation if r4 s> 3 goto LBB0_3 LBB0_3: # %entry if r4 == 4 goto LBB0_7 LBB0_7: # %sw.bb5 r1 += r3 r2 = (u32 )(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 <=== relocation r2 <<= r1 r1 = 60 <=== relocation r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit For native load case, the load size is calculated to be the same as the size of load width LLVM otherwise used to load the value which is then used to extract the bitfield value. Differential Revision: https://reviews.llvm.org/D67980 llvm-svn: 374099
*	Properly handle instantiation-dependent array bounds.	Richard Smith	2019-10-04	1	-1/+1
\| \| \| \| \| \| \| \| \|	We previously failed to treat an array with an instantiation-dependent but not value-dependent bound as being an instantiation-dependent type. We now track the array bound expression as part of a constant array type if it's an instantiation-dependent expression. llvm-svn: 373685
*	[Alignment][NFC] Remove StoreInst::setAlignment(unsigned)	Guillaume Chatelet	2019-10-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, bollu, jdoerfert Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68268 llvm-svn: 373595
*	[Alignment][NFC] Remove AllocaInst::setAlignment(unsigned)	Guillaume Chatelet	2019-09-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, jvesely, nhaehnle, eraman, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68141 llvm-svn: 373207
*	[WebAssembly] Narrowing and widening SIMD ops	Thomas Lively	2019-09-13	1	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Implements target-specific LLVM intrinsics and clang builtins for these new SIMD operations, as described at https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#integer-to-integer-narrowing. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D67425 llvm-svn: 371906
*	[WebAssembly] Add SIMD QFMA/QFMS	Thomas Lively	2019-08-31	1	-1/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds clang builtins and LLVM intrinsics for these experimental instructions. They are not implemented in engines yet, but that is ok because the user must opt into using them by calling the builtins. Reviewers: aheejin, dschuff Reviewed By: aheejin Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D67020 llvm-svn: 370556
*	Builtins: Start adding half versions of math builtins	Matt Arsenault	2019-08-06	1	-0/+21
\| \| \| \| \| \| \| \| \| \|	The implementation of the OpenCL builtin currently library uses 2 different hacks to get to the corresponding IR intrinsics from the source. This will allow removal of those. This is the set that is currently used (minus a few vector ones). llvm-svn: 367973