summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGen/builtins-nvptx-mma.py
Commit message (Collapse)AuthorAgeFilesLines
* [CUDA] Implemented _[bi]mma* builtins.Artem Belevich2019-04-251-0/+343
These builtins provide access to the new integer and sub-integer variants of MMA (matrix multiply-accumulate) instructions provided by CUDA-10.x on sm_75 (AKA Turing) GPUs. Also added a feature for PTX 6.4. While Clang/LLVM does not generate any PTX instructions that need it, we still need to pass it through to ptxas in order to be able to compile code that uses the new 'mma' instruction as inline assembly (e.g used by NVIDIA's CUTLASS library https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101) Differential Revision: https://reviews.llvm.org/D60279 llvm-svn: 359248
OpenPOWER on IntegriCloud