| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Wrong argument order resulted in broken shfl ops for 64-bit types.
(cherry picked from commit cc14de88da27a8178976972bdc8211c31f7ca9ae)
|
|
|
|
|
|
|
|
|
|
|
| |
vote.ballot instruction is gone in recent CUDA versions and
vote.sync.ballot can not be used because it needs a thread mask parameter.
Fortunately PTX 6.2 (introduced with CUDA-9.2) provides activemask.b32
instruction for this.
Differential Revision: https://reviews.llvm.org/D66665
llvm-svn: 370792
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These all had somewhat custom file headers with different text from the
ones I searched for previously, and so I missed them. Thanks to Hal and
Kristina and others who prompted me to fix this, and sorry it took so
long.
Reviewers: hfinkel
Subscribers: mcrosier, javed.absar, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60406
llvm-svn: 357941
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D45780
llvm-svn: 330280
|
|
|
|
|
|
|
|
|
| |
* __shfl_{up,down}* uses unsigned int for the third parameter.
* added [unsigned] long overloads for non-sync shuffles.
Differential Revision: https://reviews.llvm.org/D41521
llvm-svn: 321326
|
|
|
|
|
|
|
|
| |
in clang.
Differential Revision: https://reviews.llvm.org/D40872
llvm-svn: 319909
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D40871
llvm-svn: 319908
|
|
|
|
|
|
|
|
|
| |
The name has two underscores in the official CUDA documentation:
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-vote-functions
Differential Revision: https://reviews.llvm.org/D38468
llvm-svn: 314691
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38191
llvm-svn: 314223
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
builtins.", rL314135.
Causing assertion failures on macos:
> Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"),
> function getOperand, file
> /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h,
> line 835.
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/
llvm-svn: 314142
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38191
llvm-svn: 314135
|
|
|
|
|
|
|
|
| |
Also fixed a syntax error in activemask().
Differential Revision: https://reviews.llvm.org/D38188
llvm-svn: 314129
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38147
llvm-svn: 313899
|
|
|
|
|
|
|
|
| |
instructions/intrinsics/builtins.
Differential Revision: https://reviews.llvm.org/D38148
llvm-svn: 313898
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38090
llvm-svn: 313820
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
MSVC seems to use "__in" and "__out" for its own purposes, so we have to
pick different names in this macro.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D28325
llvm-svn: 291138
|
|
|
|
|
|
|
| |
To match "NVPTX: Make the llvm.nvvm.shfl intrinsics and builtin names
consistent" in LLVM.
llvm-svn: 274663
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Clang changes to make use of the LLVM intrinsics added in D21160.
Reviewers: tra
Subscribers: jholewinski, cfe-commits
Differential Revision: http://reviews.llvm.org/D21162
llvm-svn: 272299
|
|
|
|
|
|
|
|
|
|
| |
Summary: The order is [x, y, z, w], not [w, x, y, z].
Subscribers: cfe-commits, tra
Differential Revision: http://reviews.llvm.org/D20794
llvm-svn: 271215
|
|
Summary:
Previously it was implemented as inline asm in the CUDA headers.
This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions. This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.
Reviewers: tra, rsmith
Subscribers: jhen, cfe-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D19990
llvm-svn: 270150
|