bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[RISCV] Fix passing two floating-point values in complex separately by two ↵	Jim Lin	2020-06-25	2	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GPRs on RV64 Summary: This patch fixed the error of counting the remaining FPRs. Complex floating-point values should be passed by two FPRs for the hard-float ABI. If no two FPRs are available, it should be passed via a 64-bit GPR (fp+fp). `ArgFPRsLeft` is only decreased one while the type is complex floating-point. It causes two floating-point values in the complex are passed separately by two GPRs. Reviewers: asb, luismarques, lenary Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, s.egerton, pzheng, sameer.abuasal, apazos, evandro, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D79770 (cherry picked from commit 7ee479a760e0a4402b4eb7fb6168768a44f66945)
*	[PowerPC] Treat 'Z' inline asm constraint as a true memory constraint	Nemanja Ivanovic	2020-06-22	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \|	We currently emit incorrect codegen for this constraint because we set it as a constraint that allows registers. This will cause the value to be copied to the stack and that address to be passed as the address. This is not what we want. Fixes: https://bugs.llvm.org/show_bug.cgi?id=42762 Differential revision: https://reviews.llvm.org/D77542 (cherry picked from commit aede24ecaa08db806fb173faf2de9cff95df8cee)
*	[CodeGen] fix inline builtin-related breakage from D78162	George Burgess IV	2020-05-07	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \|	In cases where we have multiple decls of an inline builtin, we may need to go hunting for the one with a definition when setting function attributes. An additional test-case was provided on https://github.com/ClangBuiltLinux/linux/issues/979 (cherry picked from commit 94908088a831141cfbdd15fc5837dccf30cfeeb6)
*	[CodeGen] only add nobuiltin to inline builtins if we'll emit them	George Burgess IV	2020-05-07	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are some inline builtin definitions that we can't emit (isTriviallyRecursive & callers go into why). Marking these nobuiltin is only useful if we actually emit the body, so don't mark these as such unless we _do_ plan on emitting that. This suboptimality was encountered in Linux (see some discussion on D71082, and https://github.com/ClangBuiltLinux/linux/issues/979). Differential Revision: https://reviews.llvm.org/D78162 (cherry picked from commit 2dd17ff08165e6118e70f00e22b2c36d2d4e0a9a)
*	[remark][diagnostics] [codegen] Fix PR44896	Rong Xu	2020-02-26	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes PR44896. For IR input files, option fdiscard-value-names should be ignored as we need named values in loadModule(). Commit 60d3947922 sets this option after loadModule() where valued names already created. This creates an inconsistent state in setNameImpl() that leads to a seg fault. This patch forces fdiscard-value-names to be false for IR input files. This patch also emits a warning of "ignoring -fdiscard-value-names" if option fdiscard-value-names is explictly enabled in the commandline for IR input files. Differential Revision: https://reviews.llvm.org/D74878 (cherry picked from commit 11857d49948b845dcfd7c7f78595095e3add012d)
*	[RISCV] Pass target-abi via module flag metadata	Zakk Chen	2020-01-27	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: lenary, asb Reviewed By: lenary Tags: #clang Differential Revision: https://reviews.llvm.org/D72755 (cherry picked from commit e15fb06e2d0a068de549464d72081811e7fac612)
*	[Driver][CodeGen] Support -fpatchable-function-entry=N,M and ↵	Fangrui Song	2020-01-24	1	-4/+14
\| \| \| \| \| \| \| \| \| \|	__attribute__((patchable_function_entry(N,M))) where M>0 Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D73072 (cherry picked from commit 69bf40c45fd7f6dfe11b47de42571d8bff5ef94f)
*	Revert 9007f06af0e "Revert "Allow system header to provide their own ↵	Hans Wennborg	2020-01-17	2	-0/+34
\| \| \| \| \| \| \|	implementation of some builtin"" This should no longer be necessary after cd4c65f91d5 "Add __warn_memset_zero_len builtin as a workaround for glibc issue"
*	Revert "Allow system header to provide their own implementation of some builtin"	Amy Huang	2020-01-17	2	-34/+0
\| \| \| \| \| \| \| \| \|	This reverts commit 921f871ac438175ca8fcfcafdfcfac4d7ddf3905 because it causes libc++ code to trigger __warn_memset_zero_len. See https://reviews.llvm.org/D71082. (cherry picked from commit 3d210ed3d1880c615776b07d1916edb400c245a6)
*	Add __warn_memset_zero_len builtin as a workaround for glibc issue	serge-sans-paille	2020-01-17	1	-0/+7
\| \| \| \| \| \| \| \| \|	Glibc issue: https://sourceware.org/bugzilla/show_bug.cgi?id=25399 The fix consist in considering the missing function as a builtin lowered to a nop. Differential Revision: https://reviews.llvm.org/D72869 (cherry picked from commit d293417931d3a9d46799b42795988ca3b5cfd766)
*	[X86] ABI compat bugfix for MSVC vectorcall	Reid Kleckner	2020-01-14	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Before this change, X86_32ABIInfo::classifyArgument would be called twice on vector arguments to vectorcall functions. This function has side effects to track GPR register usage, and this would lead to incorrect GPR usage in some cases. The specific case I noticed is from running out of XMM registers with mixed FP and vector arguments and no aggregates of any kind. Consider this prototype: void __vectorcall vectorcall_indirect_vec( double xmm0, double xmm1, double xmm2, double xmm3, double xmm4, __m128 xmm5, __m128 ecx, int edx, __m128 mem); classifyArgument has no effects when called on a plain FP type, but when called on a vector type, it modifies FreeRegs to model GPR consumption. However, this should not happen during the vector call first pass. I refactored the code to unify vectorcall HVA logic with regcall HVA logic. The conventions pass HVAs in registers differently (expanded vs. not expanded), but if they do not fit in registers, they both pass them indirectly by address. Reviewers: erichkeane, craig.topper Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D72110
*	Fix windows bot failures in c410adb092c9cb51ddb0b55862b70f2aa8c5b16f	Rong Xu	2020-01-14	1	-1/+1
\| \| \| \|	(clang diagnostic handler for IR input files)
*	[remark][diagnostics] Using clang diagnostic handler for IR input files	Rong Xu	2020-01-14	4	-1/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For IR input files, we currently use LLVM diagnostic handler even the compilation is from clang. As a result, we are not able to use -Rpass to get the transformation reports. Some warnings are not handled properly either: We found many mysterious warnings in our ThinLTO backend compilations in SamplePGO and CSPGO. An example of the warning: "warning: net/proto2/public/metadata_lite.h:51:21: 0.02% (1 / 4999)" This turns out to be a warning by Wmisexpect, which is supposed to be filtered out by default. But since the filter is in clang's diagnostic hander, we emit these incomplete warnings from LLVM's diagnostic handler. This patch uses clang diagnostic handler for IR input files. We create a fake backendconsumer just to install the diagnostic handler. With this change, we will have proper handling of all the warnings and we can use -Rpass* options in IR input files compilation. Also note that with is patch, LLVM's diagnostic options, like "-mllvm -pass-remarks=*", are no longer be able to get optimization remarks. Differential Revision: https://reviews.llvm.org/D72523
*	[RISCV] Fix ILP32D lowering for double+double/double+int return types	James Clarke	2020-01-14	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously, since these aggregates are > 2*XLen, Clang would think they were being returned indirectly and thus would decrease the number of available GPRs available by 1. For long argument lists this could lead to a struct argument incorrectly being passed indirectly. Reviewers: asb, lenary Reviewed By: asb, lenary Subscribers: luismarques, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, lenary, s.egerton, pzheng, sameer.abuasal, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D69590
*	Revert "[ThinLTO] Add additional ThinLTO pipeline testing with new PM"	Teresa Johnson	2020-01-13	1	-236/+0
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit 2af97be8027a0823b88d4b6a07fc5eedb440bc1f. After attempting to fix bot failures from matching issues (mostly due to inconsistent printing of "llvm::" prefixes on objects, and AnalysisManager objects being printed differntly, I am now seeing some differences I don't understand (real differences in the passes being printed). Giving up at this point to allow the bots to recover. Will revisit later.
*	Add a couple of missed wildcards in debug-pass-manager output checking	Teresa Johnson	2020-01-13	1	-1/+1
\| \| \| \| \| \| \|	Along with the previous fix for bot failures from 2af97be8027a0823b88d4b6a07fc5eedb440bc1f, need to add a wildcard in a couple of places where my local output did not print "llvm::" but the bot is.
*	Hopefully last fix for bot failures	Teresa Johnson	2020-01-13	1	-43/+43
\| \| \| \| \| \| \| \| \|	Hopefully final bot fix for last few failures from 2af97be8027a0823b88d4b6a07fc5eedb440bc1f. Looks like sometimes the "llvm::" preceeding objects get printed in the debug pass manager output and sometimes they don't. Replace with wildcard matching.
*	Try number 2 for fixing bot failures	Teresa Johnson	2020-01-13	1	-8/+8
\| \| \| \| \| \| \| \|	Additional fixes for bot failures from 2af97be8027a0823b88d4b6a07fc5eedb440bc1f. Remove more exact matching on AnalyisManagers, as they can vary. Also allow different orders between LoopAnalysis and BranchProbabilityAnalysis as that can vary due to both being accessed in the parameter list of a call.
*	Fix tests for builtbot failures	Teresa Johnson	2020-01-13	1	-10/+10
\| \| \| \| \| \| \| \| \|	Should fix most of the buildbot failures from 2af97be8027a0823b88d4b6a07fc5eedb440bc1f, by loosening up the matching on the AnalysisProxy output. Added in --dump-input=fail on the one test that appears to be something different, so I can hopefully debug it better.
*	[ThinLTO] Add additional ThinLTO pipeline testing with new PM	Teresa Johnson	2020-01-13	1	-0/+236
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I've added some more extensive ThinLTO pipeline testing with the new PM, motivated by the bug fixed in D72386. I beefed up llvm/test/Other/new-pm-pgo.ll a little so that it tests ThinLTO pre and post link with PGO, similar to the testing for the default pipelines with PGO. Added new pre and post link PGO tests for both instrumentation and sample PGO that exhaustively test the pipelines at different optimization levels via opt. Added a clang test to exhaustively test the post link pipeline invoked for distributed builds. I am currently only testing O2 and O3 since these are the most important for performance. It would be nice to add similar exhaustive testing for full LTO, and for the old PM, but I don't have the bandwidth now and this is a start to cover some of the situations that are not currently default and were under tested. Reviewers: wmi Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, jfb, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72538
*	driver: Allow -fdebug-compilation-dir=foo in joined form.	Nico Weber	2020-01-10	1	-0/+1
\| \| \| \| \| \| \|	All 130+ f_Group flags that take an argument allow it after a '=', except for fdebug-complation-dir. Add a Joined<> alias so that it behaves consistently with all the other f_Group flags. (Keep the old Separate flag for backwards compat.)
*	[Driver][CodeGen] Add -fpatchable-function-entry=N[,0]	Fangrui Song	2020-01-10	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	In the backend, this feature is implemented with the function attribute "patchable-function-entry". Both the attribute and XRay use TargetOpcode::PATCHABLE_FUNCTION_ENTER, so the two features are incompatible. Reviewed By: ostannard, MaskRay Differential Revision: https://reviews.llvm.org/D72222
*	Support function attribute patchable_function_entry	Fangrui Song	2020-01-10	1	-0/+21
\| \| \| \| \| \| \| \| \|	This feature is generic. Make it applicable for AArch64 and X86 because the backend has only implemented NOP insertion for AArch64 and X86. Reviewed By: nickdesaulniers, aaron.ballman Differential Revision: https://reviews.llvm.org/D72221
*	Add support for __declspec(guard(nocf))	Andrew Paverd	2020-01-10	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Avoid using the `nocf_check` attribute with Control Flow Guard. Instead, use a new `"guard_nocf"` function attribute to indicate that checks should not be added on indirect calls within that function. Add support for `__declspec(guard(nocf))` following the same syntax as MSVC. Reviewers: rnk, dmajor, pcc, hans, aaron.ballman Reviewed By: aaron.ballman Subscribers: aaron.ballman, tomrittervg, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72167
*	[FPEnv] Generate constrained FP comparisons from clang	Ulrich Weigand	2020-01-10	2	-0/+302
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the IRBuilder to generate constrained FP comparisons in CreateFCmp when IsFPConstrained is true, similar to the other places in the IRBuilder. Also, add a new CreateFCmpS to emit signaling FP comparisons, and use it in clang where comparisons are supposed to be signaling (currently, only when emitting code for the <, <=, >, >= operators). Note that there is currently no way to add fast-math flags to a constrained FP comparison, since this is implemented as an intrinsic call that returns a boolean type, and FMF are only allowed for calls returning a floating-point type. However, given the discussion around https://bugs.llvm.org/show_bug.cgi?id=42179, it seems that FCmp itself really shouldn't have any FMF either, so this is probably OK. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D71467
*	[ARM,MVE] Make `vqrshrun` generate the right instruction.	Simon Tatham	2020-01-10	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A copy-paste error in `arm_mve.td` meant that the MVE `vqrshrun` intrinsic family was generating the `vqshrun` machine instruction, because in the IR intrinsic call, the rounding flag argument was set to 0 rather than 1. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D72496
*	Allow system header to provide their own implementation of some builtin	serge-sans-paille	2020-01-10	2	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a system header provides an (inline) implementation of some of their function, clang still matches on the function name and generate the appropriate llvm builtin, e.g. memcpy. This behavior is in line with glibc recommendation « users may not provide their own version of symbols » but doesn't account for the fact that glibc itself can provide inline version of some functions. It is the case for the memcpy function when -D_FORTIFY_SOURCE=1 is on. In that case an inline version of memcpy calls __memcpy_chk, a function that performs extra runtime checks. Clang currently ignores the inline version and thus provides no runtime check. This code fixes the issue by detecting functions whose name is a builtin name but also have an inline implementation. Differential Revision: https://reviews.llvm.org/D71082
*	[ThinLTO] Pass CodeGenOpts like UnrollLoops/VectorizeLoop/VectorizeSLP	Wei Mi	2020-01-09	1	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \|	down to pass builder in ltobackend. Currently CodeGenOpts like UnrollLoops/VectorizeLoop/VectorizeSLP in clang are not passed down to pass builder in ltobackend when new pass manager is used. This is inconsistent with the behavior when new pass manager is used and thinlto is not used. Such inconsistency causes slp vectorization pass not being enabled in ltobackend for O3 + thinlto right now. This patch fixes that. Differential Revision: https://reviews.llvm.org/D72386
*	Add builtins for aligning and checking alignment of pointers and integers	Alex Richardson	2020-01-09	3	-0/+217
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change introduces three new builtins (which work on both pointers and integers) that can be used instead of common bitwise arithmetic: __builtin_align_up(x, alignment), __builtin_align_down(x, alignment) and __builtin_is_aligned(x, alignment). I originally added these builtins to the CHERI fork of LLVM a few years ago to handle the slightly different C semantics that we use for CHERI [1]. Until recently these builtins (or sequences of other builtins) were required to generate correct code. I have since made changes to the default C semantics so that they are no longer strictly necessary (but using them does generate slightly more efficient code). However, based on our experience using them in various projects over the past few years, I believe that adding these builtins to clang would be useful. These builtins have the following benefit over bit-manipulation and casts via uintptr_t: - The named builtins clearly convey the semantics of the operation. While checking alignment using __builtin_is_aligned(x, 16) versus ((x & 15) == 0) is probably not a huge win in readably, I personally find __builtin_align_up(x, N) a lot easier to read than (x+(N-1))&~(N-1). - They preserve the type of the argument (including const qualifiers). When using casts via uintptr_t, it is easy to cast to the wrong type or strip qualifiers such as const. - If the alignment argument is a constant value, clang can check that it is a power-of-two and within the range of the type. Since the semantics of these builtins is well defined compared to arbitrary bit-manipulation, it is possible to add a UBSAN checker that the run-time value is a valid power-of-two. I intend to add this as a follow-up to this change. - The builtins avoids int-to-pointer casts both in C and LLVM IR. In the future (i.e. once most optimizations handle it), we could use the new llvm.ptrmask intrinsic to avoid the ptrtoint instruction that would normally be generated. - They can be used to round up/down to the next aligned value for both integers and pointers without requiring two separate macros. - In many projects the alignment operations are already wrapped in macros (e.g. roundup2 and rounddown2 in FreeBSD), so by replacing the macro implementation with a builtin call, we get improved diagnostics for many call-sites while only having to change a few lines. - Finally, the builtins also emit assume_aligned metadata when used on pointers. This can improve code generation compared to the uintptr_t casts. [1] In our CHERI compiler we have compilation mode where all pointers are implemented as capabilities (essentially unforgeable 128-bit fat pointers). In our original model, casts from uintptr_t (which is a 128-bit capability) to an integer value returned the "offset" of the capability (i.e. the difference between the virtual address and the base of the allocation). This causes problems for cases such as checking the alignment: for example, the expression `if ((uintptr_t)ptr & 63) == 0` is generally used to check if the pointer is aligned to a multiple of 64 bytes. The problem with offsets is that any pointer to the beginning of an allocation will have an offset of zero, so this check always succeeds in that case (even if the address is not correctly aligned). The same issues also exist when aligning up or down. Using the alignment builtins ensures that the address is used instead of the offset. While I have since changed the default C semantics to return the address instead of the offset when casting, this offset compilation mode can still be used by passing a command-line flag. Reviewers: rsmith, aaron.ballman, theraven, fhahn, lebedev.ri, nlopes, aqjune Reviewed By: aaron.ballman, lebedev.ri Differential Revision: https://reviews.llvm.org/D71499
*	[clang] Enforce triple in mempcpy test	serge-sans-paille	2020-01-09	1	-1/+1
\| \| \| \|	Fixes http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/2597
*	[ms] [X86] Use "P" modifier on all branch-target operands in inline X86 ↵	Eric Astor	2020-01-09	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	assembly. Summary: Extend D71677 to apply to all branch-target operands, rather than special-casing call instructions. Also add a regression test for llvm.org/PR44272, since this finishes fixing it. Reviewers: thakis, rnk Reviewed By: thakis Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72417
*	[Clang] Handle target-specific builtins returning aggregates.	Simon Tatham	2020-01-09	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A few of the ARM MVE builtins directly return a structure type. This causes an assertion failure at code-gen time if you try to assign the result of the builtin to a variable, because the `RValue` created in `EmitBuiltinExpr` from the `llvm::Value` produced by codegen is always made by `RValue::get()`, which creates a non-aggregate `RValue` that will fail an assertion when `AggExprEmitter::withReturnValueSlot` calls `Src.getAggregatePointer()`. A similar failure occurs if you try to use the struct return value directly to extract one field, e.g. `vld2q(address).val[0]`. The existing code-gen tests for those MVE builtins pass the returned structure type directly to the C `return` statement, which apparently managed to avoid that particular code path, so we didn't notice the crash. Now `EmitBuiltinExpr` checks the evaluation kind of the builtin's return value, and does the necessary handling for aggregate returns. I've added two extra test cases, both of which crashed before this change. Reviewers: dmgreen, rjmccall Reviewed By: rjmccall Subscribers: kristof.beyls, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D72271
*	Improve support of GNU mempcpy	serge-sans-paille	2020-01-09	1	-0/+12
\| \| \| \| \| \| \|	- Lower to the memcpy intrinsic - Raise warnings when size/bounds are known Differential Revision: https://reviews.llvm.org/D71374
*	[ARM][MVE] MVE-I should not be disabled by -mfpu=none	Momchil Velikov	2020-01-09	1	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Architecturally, it's allowed to have MVE-I without an FPU, thus -mfpu=none should not disable MVE-I, or moves to/from FP-registers. This patch removes `+/-fpregs` from features unconditionally added to target feature list, depending on FPU and moves the logic to Clang driver, where the negative form (`-fpregs`) is conditionally added to the target features list for the cases of `-mfloat-abi=soft`, or `-mfpu=none` without either `+mve` or `+mve.fp`. Only the negative form is added by the driver, the positive one is derived from other features in the backend. Differential Revision: https://reviews.llvm.org/D71843
*	[ARM,MVE] Intrinsics for variable shift instructions.	Simon Tatham	2020-01-08	1	-0/+1638
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This batch of intrinsics fills in all the shift instructions that take a variable shift distance in a register, instead of an immediate. Some of these instructions take a single shift distance in a scalar register and apply it to all lanes; others take a vector of per-lane distances. These instructions are all basically one family, varying in whether they saturate out-of-range values, and whether they round when bits are shifted off the bottom. I've implemented them at the IR level by a much smaller family of IR intrinsics, which take flag parameters to indicate saturating and/or rounding (along with the usual one to specify signed/unsigned integers). An oddity is that all of them are //left// shift instructions – but if you pass a negative shift count, they'll shift right. So the vector shift distances are always vectors of //signed// integers, regardless of whether you're considering the other input vector to be of signed or unsigned. Also, even the simplest `vshlq` instruction in this family (neither saturating nor rounding) has to be implemented as an IR intrinsic, because the ordinary LLVM IR `shl` operation would consider an out-of-range shift count to be undefined behavior. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72329
*	[ARM,MVE] Intrinsics for partial-overwrite imm shifts.	Simon Tatham	2020-01-08	1	-0/+1565
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This batch of intrinsics covers two sets of immediate shift instructions, which have in common that they only overwrite part of their output register and so they need an extra input giving its previous value. The VSLI and VSRI instructions shift each lane of the input vector left or right just as if they were normal immediate VSHL/VSHR, but then they only overwrite the output bits that correspond to actual shifted bits of the input. So VSLI will leave the low n bits of each output lane unchanged, and VSRI the same with the top n bits. The V[Q][R]SHR[U]N family are all narrowing shifts: they take an input vector of 2n-bit integers, shift each lane right by a constant, and then narrowing the shifted result to only n bits. So they only overwrite half of the n-bit lanes in the output register, and the B/T suffix indicates whether it's the bottom or top half of each 2n-bit lane. I've implemented the whole of the latter family using a single IR intrinsic `vshrn`, which takes a lot of i32 parameters indicating which instruction it expands to (by specifying signedness of the input and output types, whether it saturates and/or rounds, etc). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72328
*	Disallow an empty string literal in an asm label	Aaron Ballman	2020-01-08	1	-12/+0
\| \| \| \| \| \| \| \|	An empty string literal in an asm label does not make a whole lot of sense. GCC does not diagnose such a construct, but it also generates code that cannot be assembled by gas should two symbols have an empty asm label within the same TU. This does not affect an asm statement with an empty string literal, which is still a useful construct.
*	[ARM,MVE] Fix many signedness errors in MVE intrinsics.	Simon Tatham	2020-01-06	14	-103/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Running an end-to-end test last week I noticed that a lot of the ACLE intrinsics that operate differently on vectors of signed and unsigned integers were ending up generating the signed version of the instruction unconditionally. This is because the IR intrinsics had no way to distinguish signed from unsigned: the LLVM type system just calls them both `v8i16` (or whatever), so you need either separate intrinsics for signed and unsigned, or a flag parameter that tells ISel which one to choose. This patch fixes all the problems of that kind that I've noticed, by adding an i32 flag parameter to many of the IR intrinsics which is set to 1 for unsigned (matching the existing practice in cases where we got it right), and conditioning all the isel patterns on that flag. So the fundamental change is in `IntrinsicsARM.td`, changing the low-level IR intrinsics API; there are knock-on changes in `arm_mve.td` (adjusting code gen for the ACLE intrinsics to use the modified API) and in `ARMInstrMVE.td` (adjusting isel to expect the new unsigned flags). The rest of this patch is boringly updating tests. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72270
*	[ARM,MVE] Support -ve offsets in gather-load intrinsics.	Simon Tatham	2020-01-06	1	-30/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The ACLE intrinsics with `gather_base` or `scatter_base` in the name are wrappers on the MVE load/store instructions that take a vector of base addresses and an immediate offset. The immediate offset can be up to 127 times the alignment unit, and it can be positive or negative. At the MC layer, we got that right. But in the Sema error checking for the wrapping intrinsics, the offset was erroneously constrained to be positive. To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that defines the intrinsics. But that causes integer literals like `0xfffffffffffffe04` to appear in the autogenerated calls to `SemaBuiltinConstantArgRange`, which provokes a compiler warning because that's out of the non-overflowing range of an `int64_t`. So I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead. Updated the tests of the Sema checks themselves, and also adjusted a random sample of the CodeGen tests to actually use negative offsets and prove they get all the way through code generation without causing a crash. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72268
*	[SystemZ] Use FNeg in s390x clang builtins	Kevin P. Neal	2020-01-02	4	-22/+22
\| \| \| \|	The s390x builtins are still using FSub instead of FNeg. Correct that.
*	[CodeGen] Emit conj/conjf/confjl libcalls as fneg instructions if possible.	Craig Topper	2019-12-31	2	-7/+25
\| \| \| \| \| \| \|	We already recognize the __builtin versions of these, might as well recognize the libcall version. Differential Revision: https://reviews.llvm.org/D72028
*	[CodeGen] Use IRBuilder::CreateFNeg for __builtin_conj	Craig Topper	2019-12-30	1	-0/+20
\| \| \| \| \| \|	This replaces the fsub -0.0 idiom with an fneg instruction. We didn't see to have a test that showed the current codegen. Just some tests for constant folding and a test that was only checking the declare lines for libcalls. The latter just checked that we did not have a declare for @conj when using __builtin_conj. Differential Revision: https://reviews.llvm.org/D72012
*	[CodeGen] Use CreateFNeg in buildFMulAdd	Craig Topper	2019-12-30	1	-0/+15
\| \| \| \| \| \|	We have an fneg instruction now and should use it instead of the fsub -0.0 idiom. Looks like we had no test that showed that we handled the negation cases here so I've added new tests. Differential Revision: https://reviews.llvm.org/D72010
*	[X86][AsmParser] re-introduce 'offset' operator	Eric Astor	2019-12-30	3	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Amend MS offset operator implementation, to more closely fit with its MS counterpart: 1. InlineAsm: evaluate non-local source entities to their (address) location 2. Provide a mean with which one may acquire the address of an assembly label via MS syntax, rather than yielding a memory reference (i.e. "offset asm_label" and "$asm_label" should be synonymous 3. address PR32530 Based on http://llvm.org/D37461 Fix broken test where the break appears unrelated. - Set up appropriate memory-input rewrites for variable references. - Intel-dialect assembly printing now correctly handles addresses by adding "offset". - Pass offsets as immediate operands (using "r" constraint for offsets of locals). Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D71436
*	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵	Fangrui Song	2019-12-24	1	-1/+1
\| \| \| \|	as cleanups after D56351
*	[Sema][X86] Consider target attribute into the checks in validateOutputSize ↵	Craig Topper	2019-12-23	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \|	and validateInputSize. The validateOutputSize and validateInputSize need to check whether AVX or AVX512 are enabled. But this can be affected by the target attribute so we need to factor that in. This patch moves some of the code from CodeGen to create an appropriate feature map that we can pass to the function. Differential Revision: https://reviews.llvm.org/D68627
*	reland "[DebugInfo] Support to emit debugInfo for extern variables"	Yonghong Song	2019-12-22	4	-0/+84
\| \| \| \| \| \| \| \| \| \| \| \| \|	Commit d77ae1552fc21a9f3877f3ed7e13d631f517c825 ("[DebugInfo] Support to emit debugInfo for extern variables") added deebugInfo for extern variables for BPF target. The commit is reverted by 891e25b02d760d0de18c7d46947913b3166047e7 as the committed tests using %clang instead of %clang_cc1 causing test failed in certain scenarios as reported by Reid Kleckner. This patch fixed the tests by using %clang_cc1. Differential Revision: https://reviews.llvm.org/D71818
*	Revert "[DebugInfo] Support to emit debugInfo for extern variables"	Reid Kleckner	2019-12-22	4	-88/+0
\| \| \| \| \| \| \|	This reverts commit d77ae1552fc21a9f3877f3ed7e13d631f517c825. The tests committed along with this change do not pass, and should be changed to use %clang_cc1.
*	[ms] [X86] Use "P" modifier on operands to call instructions in inline X86 ↵	Eric Astor	2019-12-22	3	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	assembly. Summary: This is documented as the appropriate template modifier for call operands. Fixes PR44272, and adds a regression test. Also adds support for operand modifiers in Intel-style inline assembly. Reviewers: rnk Reviewed By: rnk Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71677
*	[Driver] Verify -mrecord-mcount in Driver, instead of CodeGen after D71627	Fangrui Song	2019-12-21	2	-6/+0
\| \| \| \| \| \| \| \| \| \| \|	GCC's x86 and s390 ports support -mrecord-mcount. Other ports reject the option. aarch64-linux-gnu-gcc: error: unrecognized command line option ‘-mrecord-mcount’ Allowing this option can cause failures when building Linux kernel for aarch64, powerpc64, etc, which will think the feature is available if the clang command returns 0.