| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 275622
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D22380
llvm-svn: 275577
|
|
|
|
|
|
|
|
| |
Reviewers: tstellardAMD
Differential Revision: http://reviews.llvm.org/D20299
llvm-svn: 275030
|
|
|
|
|
|
| |
llvm::VectorType and calling getNumElements. This is equivalent and shorter.
llvm-svn: 274823
|
|
|
|
|
|
| |
builtin handling. NFC
llvm-svn: 274821
|
|
|
|
|
|
| |
handling. Just get the type from the operand of the builtin instead. NFC
llvm-svn: 274820
|
|
|
|
|
|
| |
makes them the same as what is done when using the SSE builtins for these same encodings.
llvm-svn: 274608
|
|
|
|
| |
llvm-svn: 274603
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
s6.13.17.
- Added new Builtins: enqueue_kernel, get_kernel_work_group_size
and get_kernel_preferred_work_group_size_multiple.
These Builtins use custom check to diagnose parameters of the passed Blocks
i. e. variable number of 'local void*' type params, and check different
overloads specified in Table 6.31 of OpenCL v2.0.
- IR is generated as an internal library call for each OpenCL Builtin,
reusing ObjC Block implementation.
Review: http://reviews.llvm.org/D20249
llvm-svn: 274540
|
|
|
|
|
|
|
|
|
|
| |
Currently we only have OpenCL 2.0 Builtins i.e. pipes or address space conversions.
They have to be added only in the version 2.0 compilation mode to make the identifiers
available for use in the other versions.
Review: http://reviews.llvm.org/D20249
llvm-svn: 274509
|
|
|
|
|
|
| |
extension of the result of a v2i1 or v4i1 masked compare. This way we emit something that the backend easily interprets as a concatenation rather than a true shuffle. This delivers slightly better codegen with the current backend capabilities.
llvm-svn: 274484
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is important for building libclc. Since r273039 tests are failing
due to now emitting calls to these functions instead of emitting the
DAG node. The libm function names are implemented for OpenCL, and should
call the locally defined versions, so -fno-builtin is used. The IR
Some functions use the __builtins and expect the intrinsics to be
emitted. Without this we end up with nobuiltin calls to intrinsics
or to unsupported library calls.
llvm-svn: 274370
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D21746
llvm-svn: 274110
|
|
|
|
| |
llvm-svn: 273965
|
|
|
|
| |
llvm-svn: 273378
|
|
|
|
|
|
| |
Depends on llvm side commit r273002.
llvm-svn: 273003
|
|
|
|
|
|
|
|
|
|
| |
Reapplying patch in r272777 which was reverted
because the llvm patch which added support
for generating the mcrr/mcrr2 instructions
from the intrinsic was causing an assertion
failure. This has now been fixed in llvm.
llvm-svn: 272983
|
|
|
|
|
|
|
| |
Sibling patch to r272932:
http://reviews.llvm.org/rL272932
llvm-svn: 272933
|
|
|
|
|
|
|
|
| |
This is now supported for ARM, AArch64, PowerPC, SystemZ, SPARC, Mips.
Differential Revision: http://reviews.llvm.org/D19589
llvm-svn: 272893
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in the code comment, a potential follow-on would be to remove
the builtins themselves. Other than ord/unord, this already works as
expected. Eg:
typedef float v4sf __attribute__((__vector_size__(16)));
v4sf fcmpgt(v4sf a, v4sf b) { return a > b; }
Differential Revision: http://reviews.llvm.org/D21268
llvm-svn: 272840
|
|
|
|
|
|
|
| |
Sibling patch to r272806:
http://reviews.llvm.org/rL272806
llvm-svn: 272807
|
|
|
|
|
|
|
| |
added in the llvm patch is causing an assertion
to fail.
llvm-svn: 272790
|
|
|
|
| |
llvm-svn: 272787
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch adds intrinsics for mrrc/mrrc2. The
intrinsics for mrrc/mrrc2 return a single
uint64_t to represent two 32 bit values.
The mcrr/mcrr2 intrinsic was changed to
accept a single uint64_t instead of two
32 bit values as the input for consistency.
Differential Revision: http://reviews.llvm.org/D21179
llvm-svn: 272777
|
|
|
|
| |
llvm-svn: 272541
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
__builtin_nontemporal_store in headers
We can now use __builtin_nontemporal_store instead of target specific builtins for naturally aligned nontemporal stores which avoids the need for handling in CGBuiltin.cpp
The scalar integer nontemporal (unaligned) store builtins will have to wait as __builtin_nontemporal_store currently assumes natural alignment and doesn't accept the 'packed struct' trick that we use for normal unaligned load/stores.
The nontemporal loads require further backend support before we can safely convert them to __builtin_nontemporal_load
Differential Revision: http://reviews.llvm.org/D21272
llvm-svn: 272540
|
|
|
|
|
|
| |
CreateShuffleVector to match llvm interface change.
llvm-svn: 272492
|
|
|
|
|
|
| |
directly in the header file instead of in CGBuiltin.cpp. Simplify the sse2 equivalents as well.
llvm-svn: 272246
|
|
|
|
|
|
| |
palignr too.
llvm-svn: 272245
|
|
|
|
|
|
|
|
| |
This will allow us to remove the x86 instrinics from the backend.
Differential Revision: http://reviews.llvm.org/D21060
llvm-svn: 272141
|
|
|
|
|
|
| |
the other palignr builtins, but with a select to handle masking.
llvm-svn: 271873
|
|
|
|
|
|
|
|
| |
instead of the x86 specific ones.
This will allow the x86 intrinsics to be removed from the backend.
llvm-svn: 271253
|
|
|
|
|
|
|
|
| |
intrinsics.
This will allow us to remove the x86 instrinics from the backend.
llvm-svn: 271246
|
|
|
|
|
|
| |
always 16. NFC
llvm-svn: 271176
|
|
|
|
|
|
| |
ConstantVectors or ConstantDataVectors and calling the other form.
llvm-svn: 271165
|
|
|
|
| |
llvm-svn: 271080
|
|
|
|
|
|
|
|
|
|
| |
_InterlockedIncrement and _InterlockedDecrement have 'long' in their
prototypes. We assumed 'long' was the same size as an i32 which is
incorrect for other targets.
This fixes PR27892.
llvm-svn: 270953
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenCL builtin functions to_{global|local|private} accepts argument of pointer type to arbitrary pointee type, and return a pointer to the same pointee type in different addr space, i.e.
global gentype *to_global(gentype *p);
It is not desirable to declare it as
global void *to_global(void *);
in opencl header file since it misses diagnostics.
This patch implements these builtin functions as Clang builtin functions. In the builtin def file they are defined to have signature void*(void*). When handling call expressions, their declarations are re-written to have correct parameter type and return type corresponding to the call argument.
In codegen call to addr void *to_addr(void*) is generated with addrcasts or bitcasts to facilitate implementation in builtin library.
Differential Revision: http://reviews.llvm.org/D19932
llvm-svn: 270261
|
|
|
|
|
|
|
|
| |
This is matching what trunk gcc is accepting. Also adds a missing ssse3
case. PR27779. The amount of duplication here is annoying, maybe it
should be factored into a separate .def file?
llvm-svn: 270224
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously it was implemented as inline asm in the CUDA headers.
This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions. This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.
Reviewers: tra, rsmith
Subscribers: jhen, cfe-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D19990
llvm-svn: 270150
|
|
|
|
|
|
| |
This follows the recent change in the wasm spec.
llvm-svn: 268256
|
|
|
|
|
|
|
|
|
|
| |
The intrinsic is now called llvm.thread.pointer, not
llvm.aarch64.thread.pointer. Also, the code handling it in CGBuiltin.cpp
is dead - it's already covered by GCCBuiltin. Remove it.
Differential Revision: http://reviews.llvm.org/D19099
llvm-svn: 266817
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r259537 added vfma/vfms to armv7, but the builtin was only lowered
on the AArch64 side. Instead of supporting it on ARM, get rid of it.
The vfms builtin lowered to:
%nb = fsub float -0.0, %b
%r = @llvm.fma.f32(%a, %nb, %c)
Instead, define the operation in terms of vfma, and swap the
multiplicands. It now lowers to:
%na = fsub float -0.0, %a
%r = @llvm.fma.f32(%na, %b, %c)
This matches the instruction more closely, and lets current LLVM
generate the "natural" operand ordering:
fmls.2s v0, v1, v2
instead of the crooked (but equivalent):
fmls.2s v0, v2, v1
Except for theses changes, assembly is identical.
LLVM accepts both commutations, and the LLVM tests in:
test/CodeGen/AArch64/arm64-fmadd.ll
test/CodeGen/AArch64/fp-dp3.ll
test/CodeGen/AArch64/neon-fma.ll
test/CodeGen/ARM/fusedMAC.ll
already check either the new one only, or both.
Also verified against the test-suite unittests.
llvm-svn: 266807
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
isinf (is infinite) and isfinite should be implemented with the same function
except we change the comparison operator.
See PR27145 for more details:
https://llvm.org/bugs/show_bug.cgi?id=27145
Ref: forked off of the discussion in D18513.
Differential Revision: http://reviews.llvm.org/D18648
llvm-svn: 265675
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: See LLVM change D18775 for details, this change depends on it.
Reviewers: jyknight, reames
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18776
llvm-svn: 265569
|
|
|
|
| |
llvm-svn: 264960
|
|
|
|
|
|
| |
"C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC.
llvm-svn: 264932
|
|
|
|
|
|
| |
Also add documentation for bitreverse builtins
llvm-svn: 264203
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These functions cannot be implemented as atomicrmw or cmpxchg
instructions, so they are implemented as a call to the NVVM intrinsics
@llvm.nvvm.atomic.load.inc.32.p0i32 and
@llvm.nvvm.atomic.load.dec.32.p0i32.
Patch by Jason Henline.
Reviewers: jlebar
Differential Revision: http://reviews.llvm.org/D18322
llvm-svn: 264009
|
|
|
|
|
|
|
|
|
| |
As part of this, make the function-arrangement interfaces
a little simpler and more semantic.
NFC.
llvm-svn: 263191
|