summaryrefslogtreecommitdiffstats
path: root/clang/lib/Headers/__clang_cuda_intrinsics.h
Commit message (Collapse)AuthorAgeFilesLines
* NVPTX: Rename __builtin_ptx_shfl -> __nvvm_shflJustin Bogner2016-07-061-8/+4
| | | | | | | To match "NVPTX: Make the llvm.nvvm.shfl intrinsics and builtin names consistent" in LLVM. llvm-svn: 274663
* [CUDA] Implement __shfl* intrinsics in clang headers.Justin Lebar2016-06-091-0/+70
| | | | | | | | | | | | Summary: Clang changes to make use of the LLVM intrinsics added in D21160. Reviewers: tra Subscribers: jholewinski, cfe-commits Differential Revision: http://reviews.llvm.org/D21162 llvm-svn: 272299
* [CUDA] Fix order of vectorized ldg intrinsics' elements.Justin Lebar2016-05-301-28/+28
| | | | | | | | | | Summary: The order is [x, y, z, w], not [w, x, y, z]. Subscribers: cfe-commits, tra Differential Revision: http://reviews.llvm.org/D20794 llvm-svn: 271215
* [CUDA] Implement __ldg using intrinsics.Justin Lebar2016-05-191-0/+256
Summary: Previously it was implemented as inline asm in the CUDA headers. This change allows us to use the [addr+imm] addressing mode when executing ld.global.nc instructions. This translates into a 1.3x speedup on some benchmarks that call this instruction from within an unrolled loop. Reviewers: tra, rsmith Subscribers: jhen, cfe-commits, jholewinski Differential Revision: http://reviews.llvm.org/D19990 llvm-svn: 270150
OpenPOWER on IntegriCloud