summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGenCUDA
Commit message (Collapse)AuthorAgeFilesLines
* [HIP] Add option --gpu-max-threads-per-block=nYaxun (Sam) Liu2020-01-071-6/+16
| | | | | | Add this option to change the default launch bounds. Differential Revision: https://reviews.llvm.org/D71221
* clang: Add -fconvergent-functions flagMatt Arsenault2019-11-191-2/+3
| | | | | | | | The CUDA builtin library is apparently compiled in C++ mode, so the assumption of convergent needs to be made in a typically non-SPMD language. The functions in the library should still be assumed convergent. Currently they are not, which is potentially incorrect and this happens to work after the library is linked.
* [HIP] Fix visibility for 'extern' device variables.Michael Liao2019-11-051-0/+10
| | | | | | | | | | | | | | Summary: - Fix a bug which misses the change for a variable to be set with target-specific attributes. Reviewers: yaxunl Subscribers: jvesely, nhaehnle, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D63020
* [hip] Enable pointer argument lowering through coercing type.Michael Liao2019-11-051-0/+69
| | | | | | | | | | Reviewers: tra, rjmccall, yaxunl Subscribers: jvesely, nhaehnle, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D69826
* [CUDA][HIP] Disable emitting llvm.linker.options in device compilationYaxun (Sam) Liu2019-11-041-0/+19
| | | | | | | The linker options (e.g. pragma detect_mismatch) are intended for host compilation only, therefore disable it for device compilation. Differential Revision: https://reviews.llvm.org/D57829
* [HIP] Add option -fgpu-allow-device-initYaxun (Sam) Liu2019-10-221-0/+19
| | | | | | | | | | | | | | Add this option to allow device side class type global variables with non-trivial ctor/dtor. device side init/fini functions will be emitted, which will be executed by HIP runtime when the fat binary is loaded/unloaded. This feature is to facilitate implementation of device side sanitizer which requires global vars with non-trival ctors. By default this option is disabled. Differential Revision: https://reviews.llvm.org/D69268
* [hip][cuda] Fix the extended lambda name mangling issue.Michael Liao2019-10-191-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - HIP/CUDA host side needs to use device kernel symbol name to match the device side binaries. Without a consistent naming between host- and device-side compilations, it's risky that wrong device binaries are executed. Consistent naming is usually not an issue until unnamed types are used, especially the lambda. In this patch, the consistent name mangling is addressed for the extended lambdas, i.e. the lambdas annotated with `__device__`. - In [Itanium C++ ABI][1], the mangling of the lambda is generally unspecified unless, in certain cases, ODR rule is required to ensure consisent naming cross TUs. The extended lambda is such a case as its name may be part of a device kernel function, e.g., the extended lambda is used as a template argument and etc. Thus, we need to force ODR for extended lambdas as they are referenced in both device- and host-side TUs. Furthermore, if a extended lambda is nested in other (extended or not) lambdas, those lambdas are required to follow ODR naming as well. This patch revises the current lambda mangle numbering to force ODR from an extended lambda to all its parent lambdas. - On the other side, the aforementioned ODR naming should not change those lambdas' original linkages, i.e., we cannot replace the original `internal` with `linkonce_odr`; otherwise, we may violate ODR in general. This patch introduces a new field `HasKnownInternalLinkage` in lambda data to decouple the current linkage calculation based on mangling number assigned. [1]: https://itanium-cxx-abi.github.io/cxx-abi/abi.html Reviewers: tra, rsmith, yaxunl, martong, shafik Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D68818 llvm-svn: 375309
* [CUDA][HIP] Fix host/device check with -fopenmpYaxun Liu2019-10-091-0/+20
| | | | | | | | | | | | | | | | CUDA/HIP program may be compiled with -fopenmp. In this case, -fopenmp is only passed to host compilation to take advantages of multi-threads computation. CUDA/HIP and OpenMP both use Sema::DeviceCallGraph to store functions to be analyzed and remove them once they decide the function is sure to be emitted. CUDA/HIP and OpenMP have different functions to determine if a function is sure to be emitted. To check host/device correctly for CUDA/HIP when -fopenmp is enabled, there needs a unified logic to determine whether a function is to be emitted. The logic needs to be aware of both CUDA and OpenMP logic. Differential Revision: https://reviews.llvm.org/D67837 llvm-svn: 374263
* [HIP] Support new kernel launching APIYaxun Liu2019-09-242-8/+21
| | | | | | Differential Revision: https://reviews.llvm.org/D67947 llvm-svn: 372773
* [AMDGPU] Set default flat work group size to (1,256) for HIPYaxun Liu2019-09-031-2/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D67048 llvm-svn: 370808
* IR: print value numbers for unnamed function argumentsTim Northover2019-08-031-11/+11
| | | | | | | | | | For consistency with normal instructions and clarity when reading IR, it's best to print the %0, %1, ... names of function arguments in definitions. Also modifies the parser to accept IR in that form for obvious reasons. llvm-svn: 367755
* [clang] Preserve names of addrspacecast'ed values.Vyacheslav Zakharin2019-07-101-3/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D63846 llvm-svn: 365666
* [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and ↵Christudasan Devadasan2019-07-101-1/+1
| | | | | | | | | | | | | HIP (CLANG). To enable a new implicit kernel argument, increased the number of argument bytes from 48 to 56. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D63756 llvm-svn: 365643
* [HIP] Support attribute hip_pinned_shadowYaxun Liu2019-06-261-0/+23
| | | | | | | | | | | | | | | This patch introduces support of hip_pinned_shadow variable for HIP. A hip_pinned_shadow variable is a global variable with attribute hip_pinned_shadow. It has external linkage on device side and has no initializer. It has internal linkage on host side and has initializer or static constructor. It can be accessed in both device code and host code. This allows HIP runtime to implement support of HIP texture reference. Differential Revision: https://reviews.llvm.org/D62738 llvm-svn: 364381
* [AMDGPU] Enable the implicit arguments for HIP (CLANG)Yaxun Liu2019-06-141-0/+8
| | | | | | | | Enable 48-bytes of implicit arguments for HIP as well. Earlier it was enabled for OpenCL. This code is specific to AMDGPU target. Differential Revision: https://reviews.llvm.org/D62244 llvm-svn: 363414
* LLVM IR: Generate new-style byval-with-Type from ClangTim Northover2019-06-052-5/+5
| | | | | | | | | | | LLVM IR recently added a Type parameter to the byval Attribute, so that when pointers become opaque and no longer have an element type the information will still be present in IR. For now the Type parameter is optional (which is why Clang didn't need this change at the time), but it will become mandatory soon. llvm-svn: 362652
* [CUDA][HIP] Skip setting `externally_initialized` for static device variables.Michael Liao2019-05-291-0/+10
| | | | | | | | | | | | | | | | | Summary: - By declaring device variables as `static`, we assume they won't be addressable from the host side. Thus, no `externally_initialized` is required. Reviewers: yaxunl Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D62603 llvm-svn: 361994
* Fix failure of lit test dependent-libs.cuYaxun Liu2019-05-291-2/+4
| | | | llvm-svn: 361905
* [CUDA][HIP] Emit dependent libs for host onlyYaxun Liu2019-05-281-0/+6
| | | | | | | | | | | | | Recently D60274 was introduced to allow lld to handle dependent libs. However current usage of dependent libs (e.g. pragma comment(lib, *) in windows header files) are intended for host only. Emitting the metadata in device IR causes link error in device path. Until there is a way to different it dependent libs for device or host, metadata for dependent libs should be emitted for host only. This patch enforces that. Differential Revision: https://reviews.llvm.org/D62483 llvm-svn: 361880
* [HIP] Fix visibility of `__constant__` variables.Michael Liao2019-04-261-0/+21
| | | | | | | | | | | | | | | | Summary: - `__constant__` variables should not be `hidden` as the linker may turn them into `LOCAL` symbols. Reviewers: yaxunl Subscribers: jvesely, nhaehnle, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D61194 llvm-svn: 359344
* [HIP-Clang] Fat binary should not be produced for non GPU code 2Aaron Enye Shi2019-04-021-11/+5
| | | | | | | | | | Also for CUDA, we need to disable producing these fat binary functions when there is no GPU code. Reviewers: yaxunl, tra Differential Revision: https://reviews.llvm.org/D60141 llvm-svn: 357526
* [HIP-Clang] Fat binary should not be produced for non GPU codeAaron Enye Shi2019-04-021-5/+11
| | | | | | | | | | Skip producing the fat binary functions for HIP when no device code is present. Reviewers: yaxunl Differential Review: https://reviews.llvm.org/D60141 llvm-svn: 357520
* [CUDA][HIP][DebugInfo] Skip reference device functionMichael Liao2019-03-061-0/+10
| | | | | | | | | | | | | | | | Summary: - A device functions could be used as a non-type template parameter in a global/host function template. However, we should not try to retrieve that device function and reference it in the host-side debug info as it's only valid at device side. Subscribers: aprantl, jdoerfert, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58992 llvm-svn: 355551
* [HIP] change kernel stub nameYaxun Liu2019-02-271-0/+20
| | | | | | | | | | Add .stub to kernel stub function name so that it is different from kernel name in device code. This is necessary to let debugger find correct symbol for kernel. Differential Revision: https://reviews.llvm.org/D58518 llvm-svn: 354948
* revert r354615: [HIP] change kernel stub nameYaxun Liu2019-02-221-2/+1
| | | | | | | | It caused regressions. Differential Revision: https://reviews.llvm.org/D58518 llvm-svn: 354651
* [HIP] change kernel stub nameYaxun Liu2019-02-211-1/+2
| | | | | | | | | | Add .stub to kernel stub function name so that it is different from kernel name in device code. This is necessary to let debugger find correct symbol for kernel Differential Revision: https://reviews.llvm.org/D58518 llvm-svn: 354615
* [CUDA][HIP] Use device side kernel and variable names when registering themYaxun Liu2019-02-141-30/+51
| | | | | | | | | | | | | | | | | | __hipRegisterFunction and __hipRegisterVar need to accept device side kernel and variable names so that HIP runtime can associate kernel stub functions in host code with kernel symbols in fat binaries, and associate shadow variables in host code with device variables in fat binaries. Currently, clang assumes kernel functions and device variables have the same name as the kernel stub functions and shadow variables. However, when host is compiled in windows with MSVC C++ ABI and device is compiled with Itanium C++ ABI (e.g. AMDGPU), kernels and device symbols in fat binary are mangled differently than host. This patch gets the device side kernel and variable name by mangling them in the mangle context of aux target. Differential Revision: https://reviews.llvm.org/D58163 llvm-svn: 354004
* [DEBUG_INFO][NVPTX] Generate correct data about variable address class.Alexey Bataev2019-02-051-0/+25
| | | | | | | | | | | | | | | | | | | | | Summary: Added ability to generate correct debug info data about the variable address class. Currently, for all the locals and globals the default values are used, ADDR_local_space(6) for locals and ADDR_global_space(5) for globals. The values are taken from the table in https://docs.nvidia.com/cuda/archive/10.0/ptx-writers-guide-to-interoperability/index.html#cuda-specific-dwarf. We need to emit correct data for address classes of, at least, shared and constant globals. Currently, all these variables are treated by the cuda-gdb debugger as the variables in the global address space and, thus, it require manual data type casting. Reviewers: echristo, probinson Subscribers: jholewinski, aprantl, cfe-commits Differential Revision: https://reviews.llvm.org/D57162 llvm-svn: 353204
* Do not copy long double and 128-bit fp format from aux target for AMDGPUYaxun Liu2019-01-311-0/+10
| | | | | | | | | | | | | | | rC352620 caused regressions because it copied floating point format from aux target. floating point format decides whether extended long double is supported. It is x86_fp80 on x86 but IEEE double on amdgcn. Document usage of long doubel type in HIP programming guide https://github.com/ROCm-Developer-Tools/HIP/pull/890 Differential Revision: https://reviews.llvm.org/D57527 llvm-svn: 352801
* [CUDA] add support for the new kernel launch API in CUDA-9.2+.Artem Belevich2019-01-314-26/+85
| | | | | | | | | | | | | Instead of calling CUDA runtime to arrange function arguments, the new API constructs arguments in a local array and the kernels are launched with __cudaLaunchKernel(). The old API has been deprecated and is expected to go away in the next CUDA release. Differential Revision: https://reviews.llvm.org/D57488 llvm-svn: 352799
* [CUDA] Treat extern global variable shadows same as regular extern vars.Artem Belevich2018-12-221-6/+13
| | | | | | | | | | | This fixes compiler crash when we attempted to compile this code: extern __device__ int data; __device__ int data = 1; Differential Revision: https://reviews.llvm.org/D56033 llvm-svn: 349981
* [CUDA] Make all host-side shadows of device-side variables undef.Artem Belevich2018-12-131-83/+120
| | | | | | | | | | The host-side code can't (and should not) access the values that may only exist on the device side. E.g. address of a __device__ function does not exist on the host side as we don't generate the code for it there. Differential Revision: https://reviews.llvm.org/D55663 llvm-svn: 349087
* Revert "[CodeGenCXX] Treat 'this' as noalias in constructors"Sean Fertile2018-10-151-18/+18
| | | | | | | This reverts commit https://reviews.llvm.org/rL344150 which causes MachineOutliner related failures on the ppc64le multistage buildbot. llvm-svn: 344526
* [CodeGenCXX] Treat 'this' as noalias in constructorsAnton Bikineev2018-10-101-18/+18
| | | | | | | | | This is currently a clang extension and a resolution of the defect report in the C++ Standard. Differential Revision: https://reviews.llvm.org/D46441 llvm-svn: 344150
* [HIP] Support early finalization of device code for -fno-gpu-rdcYaxun Liu2018-10-021-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original options as aliases. When -fgpu-rdc is off, clang will assume the device code in each translation unit does not call external functions except those in the device library, therefore it is possible to compile the device code in each translation unit to self-contained kernels and embed them in the host object, so that the host object behaves like usual host object which can be linked by lld. The benefits of this feature is: 1. allow users to create static libraries which can be linked by host linker; 2. amortized device code linking time. This patch modifies HIP action builder to insert actions for linking device code and generating HIP fatbin, and pass HIP fatbin to host backend action. It extracts code for constructing command for generating HIP fatbin as a function so that it can be reused by early finalization. It also modifies codegen of HIP host constructor functions to embed the device fatbin when it is available. Differential Revision: https://reviews.llvm.org/D52377 llvm-svn: 343611
* [CUDA] Ignore uncallable functions when we check for usual deallocators.Artem Belevich2018-09-211-0/+133
| | | | | | | | | Previously clang considered function variants from both sides of compilation and that resulted in picking up wrong deallocation function. Differential Revision: https://reviews.llvm.org/D51808 llvm-svn: 342749
* Rename -mlink-cuda-bitcode to -mlink-builtin-bitcodeMatt Arsenault2018-08-202-6/+12
| | | | | | | The same semantics work for OpenCL, and probably any offload language. Keep the old name around as an alias. llvm-svn: 340193
* [HIP] Make __hip_gpubin_handle hidden to avoid being merged across different ↵Yaxun Liu2018-08-171-1/+1
| | | | | | | | | | | | | | | | shared libraries Different shared libraries contain different fat binary, which is stored in a global variable __hip_gpubin_handle. Since different compilation units share the same fat binary, this variable has linkonce linkage. However, it should not be merged across different shared libraries. This patch set the visibility of the global variable to be hidden, which will make it invisible in the shared library, therefore preventing it from being merged. Differential Revision: https://reviews.llvm.org/D50596 llvm-svn: 340056
* Try to make builtin address space declarations not uselessMatt Arsenault2018-08-021-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The way address space declarations for builtins currently work is nearly useless. The code assumes the address spaces used for builtins is a confusingly named "target address space" from user code using __attribute__((address_space(N))) that matches the builtin declaration. There's no way to use this to declare a builtin that returns a language specific address space. The terminology used is highly cofusing since it has nothing to do with the the address space selected by the target to use for a language address space. This feature is essentially unused as-is. AMDGPU and NVPTX are the only in-tree targets attempting to use this. The AMDGPU builtins certainly do not behave as intended (i.e. all of the builtins returning pointers can never compile because the numbered address space never matches the expected named address space). The NVPTX builtins are missing tests for some, and the others seem to rely on an implicit addrspacecast. Change the used address space for builtins based on a target hook to allow using a language address space for a builtin. This allows the same builtin declaration to be used for multiple languages with similarly purposed address spaces (e.g. the same AMDGPU builtin can be used in OpenCL and CUDA even though the constant address spaces are arbitarily different). This breaks the possibility of using arbitrary numbered address spaces alongside the named address spaces for builtins. If this is an issue we probably need to introduce another builtin declaration character to distinguish language address spaces from so-called "target address spaces". llvm-svn: 338707
* [CUDA][HIP] Allow function-scope static const variableYaxun Liu2018-07-281-0/+6
| | | | | | | | | | | | | | | | | | | | | | CUDA 8.0 E.3.9.4 says: Within the body of a __device__ or __global__ function, only __shared__ variables or variables without any device memory qualifiers may be declared with static storage class. It is unclear how a function-scope non-const static variable without device memory qualifier is implemented, therefore only static const variable without device memory qualifier is allowed, which can be emitted as a global variable in constant address space. Currently clang only allows function-scope static variable with __shared__ qualifier. This patch also allows function-scope static const variable without device memory qualifier and emits it as a global variable in constant address space. Differential Revision: https://reviews.llvm.org/D49931 llvm-svn: 338188
* [HIP] Support -fcuda-flush-denormals-to-zero for amdgcnYaxun Liu2018-07-211-0/+15
| | | | | | Differential Revision: https://reviews.llvm.org/D48287 llvm-svn: 337639
* [HIP] Register/unregister device fat binary only onceYaxun Liu2018-07-201-4/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | HIP generates one fat binary for all devices after linking. However, for each compilation unit a ctor function is emitted which register the same fat binary. Measures need to be taken to make sure the fat binary is only registered once. Currently each ctor function calls __hipRegisterFatBinary and stores the returned value to __hip_gpubin_handle. This patch changes the linkage of __hip_gpubin_handle to be linkonce so that they are shared between LLVM modules. Then this patch adds check of value of __hip_gpubin_handle to make sure __hipRegisterFatBinary is only called once. The code is equivalent to void *_gpubin_handle; void ctor() { if (__hip_gpubin_handle == 0) { __hip_gpubin_handle = __hipRegisterFatBinary(...); } // register kernels and variables. } The patch also does similar change to dtors so that __hipUnregisterFatBinary is called once. Differential Revision: https://reviews.llvm.org/D49083 llvm-svn: 337631
* [FileCheck] Add -allow-deprecated-dag-overlap to failing clang testsJoel E. Denny2018-07-112-10/+10
| | | | | | | | | | See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47172 llvm-svn: 336844
* [CUDA] Use atexit() to call module destructor.Artem Belevich2018-06-271-2/+2
| | | | | | | | | | This matches the way NVCC does it. Doing module cleanup at global destructor phase used to work, but is, apparently, too late for the CUDA runtime in CUDA-9.2, which ends up crashing with double-free. Differential Revision: https://reviews.llvm.org/D48613 llvm-svn: 335763
* [CUDA][HIP] Allow CUDA __global__ functions to have amdgpu kernel attributesYaxun Liu2018-06-121-0/+37
| | | | | | | | | | | | There are HIP applications e.g. Tensorflow 1.3 using amdgpu kernel attributes, however currently they are only allowed on OpenCL kernel functions. This patch will allow amdgpu kernel attributes to be applied to CUDA/HIP __global__ functions. Differential Revision: https://reviews.llvm.org/D47958 llvm-svn: 334561
* [CUDA][HIP] Set kernel calling convention before arrange functionYaxun Liu2018-06-121-0/+39
| | | | | | | | | | | | | Currently clang set kernel calling convention for CUDA/HIP after arranging function, which causes incorrect kernel function type since it depends on calling convention. This patch moves setting kernel convention before arranging function. Differential Revision: https://reviews.llvm.org/D47733 llvm-svn: 334457
* [CUDA] Fix emission of constant strings in sectionsJonas Hahnfeld2018-06-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | CGM.GetAddrOfConstantCString() sets the adress of the created GlobalValue to unnamed. When emitting the object file LLVM will mark the surrounding section as SHF_MERGE iff the string is nul-terminated and contains no other nuls (see IsNullTerminatedString). This results in problems when saving temporaries because LLVM doesn't set an EntrySize, so reading in the serialized assembly file fails. This never happened for the GPU binaries because they usually contain a nul-character somewhere. Instead this only affected the module ID when compiling relocatable device code. However, this points to a potentially larger problem: If we put a constant string into a named section, we really want the data to end up in that section in the object file. To avoid LLVM merging sections this patch unmarks the GlobalVariable's address as unnamed which also fixes the problem of invalid serialized assembly files when saving temporaries. Differential Revision: https://reviews.llvm.org/D47902 llvm-svn: 334281
* [CUDA][HIP] Do not emit type info when compiling for deviceYaxun Liu2018-06-051-2/+9
| | | | | | | | | | | | | CUDA/HIP does not support RTTI on device side, therefore there is no point of emitting type info when compiling for device. Emitting type info for device not only clutters the IR with useless global variables, but also causes undefined symbol at linking since vtable for cxxabiv1::class_type_info has external linkage. Differential Revision: https://reviews.llvm.org/D47694 llvm-svn: 334021
* [HIP] Support offloading by linker scriptYaxun Liu2018-05-181-17/+23
| | | | | | | | | | | | To support linking device code in different source files, it is necessary to embed fat binary at host linking stage. This patch emits an external symbol for fat binary in host codegen, then embed the fat binary by lld through a linker script. Differential Revision: https://reviews.llvm.org/D46472 llvm-svn: 332724
* Fix failure in lit test kernel-call.cu due to name manglingYaxun Liu2018-04-251-2/+2
| | | | llvm-svn: 330821
OpenPOWER on IntegriCloud