summaryrefslogtreecommitdiffstats
path: root/clang/lib/Headers/__clang_cuda_intrinsics.h
Commit message (Collapse)AuthorAgeFilesLines
* [CUDA] Fix order of memcpy arguments in __shfl_*(<64-bit type>).Artem Belevich2020-01-241-2/+2
| | | | | | Wrong argument order resulted in broken shfl ops for 64-bit types. (cherry picked from commit cc14de88da27a8178976972bdc8211c31f7ca9ae)
* [CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+Artem Belevich2019-09-031-1/+9
| | | | | | | | | | | vote.ballot instruction is gone in recent CUDA versions and vote.sync.ballot can not be used because it needs a thread mask parameter. Fortunately PTX 6.2 (introduced with CUDA-9.2) provides activemask.b32 instruction for this. Differential Revision: https://reviews.llvm.org/D66665 llvm-svn: 370792
* Move the builtin headers to use the new license file header.Chandler Carruth2019-04-081-17/+3
| | | | | | | | | | | | | | | | | | Summary: These all had somewhat custom file headers with different text from the ones I searched for previously, and so I missed them. Thanks to Hal and Kristina and others who prompted me to fix this, and sorry it took so long. Reviewers: hfinkel Subscribers: mcrosier, javed.absar, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D60406 llvm-svn: 357941
* [CUDA] added missing __ldg(const signed char *)Artem Belevich2018-04-181-0/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D45780 llvm-svn: 330280
* [CUDA] More fixes for __shfl_* intrinsics.Artem Belevich2017-12-211-28/+49
| | | | | | | | | * __shfl_{up,down}* uses unsigned int for the third parameter. * added [unsigned] long overloads for non-sync shuffles. Differential Revision: https://reviews.llvm.org/D41521 llvm-svn: 321326
* [NVPTX,CUDA] Added llvm.nvvm.fns intrinsic and matching __nvvm_fns builtin ↵Artem Belevich2017-12-061-0/+4
| | | | | | | | in clang. Differential Revision: https://reviews.llvm.org/D40872 llvm-svn: 319909
* [CUDA] Added overloads for '[unsigned] long' variants of shfl builtins.Artem Belevich2017-12-061-0/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D40871 llvm-svn: 319908
* [CUDA] Fix name of __activemask()Jonas Hahnfeld2017-10-021-1/+1
| | | | | | | | | The name has two underscores in the official CUDA documentation: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-vote-functions Differential Revision: https://reviews.llvm.org/D38468 llvm-svn: 314691
* [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.Artem Belevich2017-09-261-3/+29
| | | | | | Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314223
* Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & ↵Justin Lebar2017-09-251-29/+3
| | | | | | | | | | | | | | | builtins.", rL314135. Causing assertion failures on macos: > Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"), > function getOperand, file > /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h, > line 835. http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/ llvm-svn: 314142
* [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.Artem Belevich2017-09-251-3/+29
| | | | | | Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314135
* [CUDA] Fix names of __nvvm_vote* intrinsics.Artem Belevich2017-09-251-5/+5
| | | | | | | | Also fixed a syntax error in activemask(). Differential Revision: https://reviews.llvm.org/D38188 llvm-svn: 314129
* [CUDA] Fixed order of words in the names of shfl builtins.Artem Belevich2017-09-211-4/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D38147 llvm-svn: 313899
* [NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} ↵Artem Belevich2017-09-211-0/+31
| | | | | | | | instructions/intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38148 llvm-svn: 313898
* [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.Artem Belevich2017-09-201-0/+68
| | | | | | Differential Revision: https://reviews.llvm.org/D38090 llvm-svn: 313820
* [CUDA] Rename keywords used in macro so they don't conflict with MSVC.Justin Lebar2017-01-051-21/+21
| | | | | | | | | | | | | | Summary: MSVC seems to use "__in" and "__out" for its own purposes, so we have to pick different names in this macro. Reviewers: tra Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D28325 llvm-svn: 291138
* NVPTX: Rename __builtin_ptx_shfl -> __nvvm_shflJustin Bogner2016-07-061-8/+4
| | | | | | | To match "NVPTX: Make the llvm.nvvm.shfl intrinsics and builtin names consistent" in LLVM. llvm-svn: 274663
* [CUDA] Implement __shfl* intrinsics in clang headers.Justin Lebar2016-06-091-0/+70
| | | | | | | | | | | | Summary: Clang changes to make use of the LLVM intrinsics added in D21160. Reviewers: tra Subscribers: jholewinski, cfe-commits Differential Revision: http://reviews.llvm.org/D21162 llvm-svn: 272299
* [CUDA] Fix order of vectorized ldg intrinsics' elements.Justin Lebar2016-05-301-28/+28
| | | | | | | | | | Summary: The order is [x, y, z, w], not [w, x, y, z]. Subscribers: cfe-commits, tra Differential Revision: http://reviews.llvm.org/D20794 llvm-svn: 271215
* [CUDA] Implement __ldg using intrinsics.Justin Lebar2016-05-191-0/+256
Summary: Previously it was implemented as inline asm in the CUDA headers. This change allows us to use the [addr+imm] addressing mode when executing ld.global.nc instructions. This translates into a 1.3x speedup on some benchmarks that call this instruction from within an unrolled loop. Reviewers: tra, rsmith Subscribers: jhen, cfe-commits, jholewinski Differential Revision: http://reviews.llvm.org/D19990 llvm-svn: 270150
OpenPOWER on IntegriCloud