| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
structure contents at runtime in circumstances where debuggers may not be easily available (such as in kernel work).
Patch by Paul Semel.
llvm-svn: 329762
|
|
|
|
|
|
|
|
| |
I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts.
Differential Revision: https://reviews.llvm.org/D45421
llvm-svn: 329605
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:
archtype
cas
classs
checkk
compres
definit
frome
iff
inteval
ith
lod
methode
nd
optin
ot
pres
statics
te
thru
Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)
Differential revision: https://reviews.llvm.org/D44188
llvm-svn: 329399
|
|
|
|
| |
llvm-svn: 329394
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined
coroutine noop_coroutine that does nothing. To implement this feature, we implemented
an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that
does nothing when resumed or destroyed.
This patch adds a builtin __builtin_coro_noop() that maps to llvm.coro.noop intrinsic.
Related llvm change: https://reviews.llvm.org/D45114
llvm-svn: 328993
|
|
|
|
|
|
|
|
|
|
|
| |
The conversion of operatios to bitcode helps to eliminate an additional
store in certain cases. We used to lower these load intrinsics in DAG to
DAG conversion by which time, the "Dead Store Elimination" pass is
already run. There is an associated LLVM patch.
Patch by Sumanth Gundapaneni.
llvm-svn: 328776
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These instructions have been around for a long time, but we
haven't supported intrinsics for them. The "new" vesrions use
the CSx register for the start of the buffer instead of the K
field in the Mx register.
There is a related llvm patch.
Patch by Brendon Cahoon.
llvm-svn: 328725
|
|
|
|
|
|
|
|
| |
Putting back the code in commit r327189 that was reverted in r322737. The code is being committed in three stages and this one is the last stage: 1) r327455 fp16 feature flags, 2) r327836 pass half type or i16 based on FullFP16, and 3) the code here which the front-end fp16 vector intrinsic for ARM.
Differential revision https://reviews.llvm.org/D43650
llvm-svn: 328277
|
|
|
|
|
|
|
|
|
|
| |
This is needed for the upcoming implementation of the
new 8x32x16 and 32x8x16 variants of WMMA instructions
introduced in CUDA 9.1.
Differential Revision: https://reviews.llvm.org/D44719
llvm-svn: 328158
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
usual allocation/deallocation functions.
Summary:
Libc++'s default allocator uses `__builtin_operator_new` and `__builtin_operator_delete` in order to allow the calls to new/delete to be ellided. However, libc++ now needs to support over-aligned types in the default allocator. In order to support this without disabling the existing optimization Clang needs to support calling the aligned new overloads from the builtins.
See llvm.org/PR22634 for more information about the libc++ bug.
This patch changes `__builtin_operator_new`/`__builtin_operator_delete` to call any usual `operator new`/`operator delete` function. It does this by performing overload resolution with the arguments passed to the builtin to determine which allocation function to call. If the selected function is not a usual allocation function a diagnostic is issued.
One open issue is if the `align_val_t` overloads should be considered "usual" when `LangOpts::AlignedAllocation` is disabled.
In order to allow libc++ to detect this new behavior the value for `__has_builtin(__builtin_operator_new)` has been updated to `201802`.
Reviewers: rsmith, majnemer, aaron.ballman, erik.pilkington, bogner, ahatanak
Reviewed By: rsmith
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D43047
llvm-svn: 328134
|
|
|
|
|
|
| |
https://reviews.llvm.org/D44591
llvm-svn: 328038
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This way we can support address-space specific variants without explicitly
encoding the space in the name of the intrinsic. Less intrinsics to deal with ->
less boilerplate.
Added a bit of tablegen magic to match/replace an intrinsics with a pointer
argument in particular address space with the space-specific instruction
variant.
Updated tests to use non-default address spaces.
Differential Revision: https://reviews.llvm.org/D43268
llvm-svn: 328006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For generating NEON intrinsics, this determines the NEON data type, and whether
it should be a half type or an i16 type. I.e., we always pass a half type for
AArch64, this hasn't changed, but now also for ARM but only when FullFP16 is
enabled, and i16 otherwise.
This is intended to be non-functional change, but together with the backend
work in D44538 which adds support for f16 vectors, this enables adding the
AArch32 FP16 (vector) intrinsics.
Differential Revision: https://reviews.llvm.org/D44561
llvm-svn: 327836
|
|
|
|
|
|
|
| |
This is causing problems in testing, and PR36683 was raised.
Reverting it until we have sorted out how to pass f16 vectors.
llvm-svn: 327437
|
|
|
|
|
|
|
|
| |
Add the fp16 neon vector intrinsic for ARM as described in the ARM ACLE document.
Reviews in https://reviews.llvm.org/D43650
llvm-svn: 327189
|
|
|
|
|
|
|
|
| |
The second operand needs to be in the lower bits of the concatenation. This matches llvm 5.0, gcc, and icc behavior.
Fixes PR36360.
llvm-svn: 324954
|
|
|
|
|
|
| |
https://reviews.llvm.org/D42993
llvm-svn: 324940
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
return vXi1 mask. Make bitcasts to scalar explicit in IR
Summary: This is the clang equivalent of r324827
Reviewers: zvi, delena, RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43143
llvm-svn: 324828
|
|
|
|
| |
llvm-svn: 324647
|
|
|
|
|
|
|
|
|
| |
The MSVC runtime library does not provide a definition of wmemcmp,
so we need an inline implementation.
Differential Revision: https://reviews.llvm.org/D42441
llvm-svn: 323362
|
|
|
|
|
|
|
| |
This corresponds to r323222 in LLVM. The new names are not yet
finalized, so use them at your own risk.
llvm-svn: 323224
|
|
|
|
|
|
| |
https://reviews.llvm.org/D41792
llvm-svn: 323006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
integer shift/and/or
Summary:
kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation.
I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations.
Reviewers: RKSimon, spatel, zvi, jina.nahias
Reviewed By: RKSimon
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D42016
llvm-svn: 322461
|
|
|
|
|
|
| |
zeroinitializer.
llvm-svn: 322038
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These just overloads for _Float128. They're supported by GCC 7 and used
by glibc. APFloat support is already there so just add the overloads.
__builtin_copysignf128
__builtin_fabsf128
__builtin_huge_valf128
__builtin_inff128
__builtin_nanf128
__builtin_nansf128
This is the same support that GCC has, according to the documentation,
but limited to _Float128.
llvm-svn: 321948
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r320902 fixed the IRGen for some types of checked multiplications. It
did not handle unsigned overflow correctly in the case where the signed
operand is negative (PR35750).
Eli pointed out that on overflow, the result must be equal to the unique
value that is equivalent to the mathematically-correct result modulo two
raised to the k power, where k is the number of bits in the result type.
This patch fixes the specialized IRGen from r320902 accordingly.
Testing: Apart from check-clang, I modified the test harness from
r320902 to validate the results of all multiplications -- not just the
ones which don't overflow:
https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081
llvm.org/PR35750, rdar://34963321
Differential Revision: https://reviews.llvm.org/D41717
llvm-svn: 321771
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
added bitalg feature recognition
added intrinsics support for bitalg instructions
_mm512_popcnt_epi16
_mm512_mask_popcnt_epi16
_mm512_maskz_popcnt_epi16
_mm512_popcnt_epi8
_mm512_mask_popcnt_epi8
_mm512_maskz_popcnt_epi8
_mm512_mask_bitshuffle_epi64_mask
_mm512_bitshuffle_epi64_mask
_mm256_popcnt_epi16
_mm256_mask_popcnt_epi16
_mm256_maskz_popcnt_epi16
_mm128_popcnt_epi16
_mm128_mask_popcnt_epi16
_mm128_maskz_popcnt_epi16
_mm256_popcnt_epi8
_mm256_mask_popcnt_epi8
_mm256_maskz_popcnt_epi8
_mm128_popcnt_epi8
_mm128_mask_popcnt_epi8
_mm128_maskz_popcnt_epi8
_mm256_mask_bitshuffle_epi32_mask
_mm256_bitshuffle_epi32_mask
_mm128_mask_bitshuffle_epi16_mask
_mm128_bitshuffle_epi16_mask
matching a similar work on the backend (D40222)
Differential Revision: https://reviews.llvm.org/D41564
llvm-svn: 321483
|
|
|
|
|
|
|
|
| |
accept bit 2 which is supposed to indicate the prefetched addresses will be written to
Add the appropriate _MM_HINT_ET0/ET1 defines to match gcc.
llvm-svn: 321325
|
|
|
|
|
|
| |
Differential Revision: https:://reviews.llvm.org/D41360
llvm-svn: 321301
|
|
|
|
|
|
|
|
| |
Putting back the code that was reverted few weeks ago.
Differential Revision: https://reviews.llvm.org/D34161
llvm-svn: 321294
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Diagnose 'unreachable' UB when a noreturn function returns.
1. Insert a check at the end of functions marked noreturn.
2. A decl may be marked noreturn in the caller TU, but not marked in
the TU where it's defined. To diagnose this scenario, strip away the
noreturn attribute on the callee and insert check after calls to it.
Testing: check-clang, check-ubsan, check-ubsan-minimal, D40700
rdar://33660464
Differential Revision: https://reviews.llvm.org/D40698
llvm-svn: 321231
|
|
|
|
| |
llvm-svn: 321115
|
|
|
|
| |
llvm-svn: 320919
|
|
|
|
| |
llvm-svn: 320915
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces a specialized way to lower overflow-checked
multiplications with mixed-sign operands. This fixes link failures and
ICEs on code like this:
void mul(int64_t a, uint64_t b) {
int64_t res;
__builtin_mul_overflow(a, b, &res);
}
The generic checked-binop irgen would use a 65-bit multiplication
intrinsic here, which requires runtime support for _muloti4 (128-bit
multiplication), and therefore fails to link on i386. To get an ICE
on x86_64, change the example to use __int128_t / __uint128_t.
Adding runtime and backend support for 65-bit or 129-bit checked
multiplication on all of our supported targets is infeasible.
This patch solves the problem by using simpler, specialized irgen for
the mixed-sign case.
llvm.org/PR34920, rdar://34963321
Testing: Apart from check-clang, I compared the output from this fairly
comprehensive test driver using unpatched & patched clangs:
https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081
Differential Revision: https://reviews.llvm.org/D41149
llvm-svn: 320902
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
InterlockedCompareExchange128 is a bit more complicated than the other
InterlockedCompareExchange functions, so it requires a bit more work. It
doesn't directly refer to 128bit ints, instead it takes pointers to
64bit ints for Destination and ComparandResult, and exchange is taken as
two 64bit ints (high & low). The previous value is written to
ComparandResult, and success is returned. This implementation does the
following in order to produce a cmpxchg instruction:
1. Cast everything to 128bit ints or int pointers, and glues together
the Exchange values
2. Reads from CompareandResult to get the comparand
3. Calls cmpxchg volatile (on X86 this will produce a lock cmpxchg16b
instruction)
1. Result 0 (previous value) is written back to ComparandResult
2. Result 1 (success bool) is zext'ed to a uchar and returned
Resolves bug https://llvm.org/PR35251
Patch by Colden Cullen!
Reviewers: rnk, agutowski
Reviewed By: rnk
Subscribers: majnemer, cfe-commits
Differential Revision: https://reviews.llvm.org/D41032
llvm-svn: 320730
|
|
|
|
| |
llvm-svn: 320609
|
|
|
|
|
|
| |
Similar to D40044 and discussed in D40594.
llvm-svn: 319619
|
|
|
|
|
|
|
| |
The libm functions with LLVM intrinsic twins were moved above this blob with:
https://reviews.llvm.org/rL319593
llvm-svn: 319618
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are 20 LLVM math intrinsics that correspond to mathlib calls according to the LangRef:
http://llvm.org/docs/LangRef.html#standard-c-library-intrinsics
We were only converting 3 mathlib calls (sqrt, fma, pow) and 12 builtin calls (ceil, copysign,
fabs, floor, fma, fmax, fmin, nearbyint, pow, rint, round, trunc) to their intrinsic-equivalents.
This patch pulls the transforms together and handles all 20 cases. The switch is guarded by a
check for const-ness to make sure we're not doing the transform if errno could possibly be set by
the libcall or builtin.
Differential Revision: https://reviews.llvm.org/D40044
llvm-svn: 319593
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The -fxray-always-emit-customevents flag instructs clang to always emit
the LLVM IR for calls to the `__xray_customevent(...)` built-in
function. The default behaviour currently respects whether the function
has an `[[clang::xray_never_instrument]]` attribute, and thus not lower
the appropriate IR code for the custom event built-in.
This change allows users calling through to the
`__xray_customevent(...)` built-in to always see those calls lowered to
the corresponding LLVM IR to lay down instrumentation points for these
custom event calls.
Using this flag enables us to emit even just the user-provided custom
events even while never instrumenting the start/end of the function
where they appear. This is useful in cases where "phase markers" using
__xray_customevent(...) can have very few instructions, must never be
instrumented when entered/exited.
Reviewers: rnk, dblaikie, kpw
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D40601
llvm-svn: 319388
|
|
|
|
| |
llvm-svn: 318815
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM exposes a file in the backend (X86TargetParser.def) that
contains information about the correct list of CpuIs values.
This patch removes 2 of the copied and pasted versions of this
list from clang and instead includes the data from the .def file.
Differential Revision: https://reviews.llvm.org/D40054
llvm-svn: 318234
|
|
|
|
|
|
|
|
|
|
|
| |
cbrt() is always constant because it can't overflow or underflow. Therefore, it can't set errno.
fma() is not always constant because it can overflow or underflow. Therefore, it can set errno.
But we know that it never sets errno on GNU / MSVC, so make it constant in those environments.
Differential Revision: https://reviews.llvm.org/D39641
llvm-svn: 318093
|
|
|
|
|
|
| |
Patch by Bharathi Seshadri!
llvm-svn: 317776
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This just seems to have been an oversight. We already supported the f64
atomic add with an explicit scope (e.g. "cta"), but not the scopeless
version.
Reviewers: tra
Subscribers: jholewinski, sanjoy, cfe-commits, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39638
llvm-svn: 317623
|
|
|
|
|
|
|
|
| |
macros that just pass the right comparison predicate value to the regular cmp intrinsic. Remove mask cmpeq/cmpgt builtins that are now unused.
This shortens the intrinsic headers a little and allows us to get rid of the cmpeq and cmpgt handling from CGBuiltin.cpp.
llvm-svn: 317506
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LLVM sqrt intrinsic definition changed with:
D28797
...so we don't have to use any relaxed FP settings other than errno handling.
This patch sidesteps a question raised in PR27435:
https://bugs.llvm.org/show_bug.cgi?id=27435
Is a programmer using __builtin_sqrt() invoking the compiler's intrinsic definition of sqrt or the mathlib definition of sqrt?
But we have an answer now: the builtin should match the behavior of the libm function including errno handling.
Differential Revision: https://reviews.llvm.org/D39204
llvm-svn: 317031
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In OpenCL the kernel function and non-kernel function has different calling conventions.
For certain targets they have different argument ABIs. Also kernels have special function
attributes and metadata for runtime to launch them.
The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such,
the block invoke function should be emitted as kernel with proper calling convention and
argument ABI.
This patch emits enqueued block as kernel. If a block is both called directly and passed
to enqueue_kernel, separate functions will be generated.
Differential Revision: https://reviews.llvm.org/D38134
llvm-svn: 315804
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38742
llvm-svn: 315624
|