summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen/CGCUDANV.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [Alignment][Clang][NFC] Add CharUnits::getAsAlignGuillaume Chatelet2019-10-031-3/+3
| | | | | | | | | | | | | | | | | | Summary: This is a prerequisite to removing `llvm::GlobalObject::setAlignment(unsigned)`. This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D68274 llvm-svn: 373592
* [HIP] Support new kernel launching APIYaxun Liu2019-09-241-6/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D67947 llvm-svn: 372773
* [clang][CodeGen] Remove std::move on temporaryKadir Cetinkaya2019-06-171-1/+1
| | | | llvm-svn: 363563
* [HIP] Add the interface deriving the stub name of device kernels.Michael Liao2019-06-171-4/+22
| | | | | | | | | | | | | | | | Summary: - Revise the interface to derive the stub name and simplify the assertion of it. Reviewers: yaxunl, tra Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D63335 llvm-svn: 363553
* [HIP-Clang] Fat binary should not be produced for non GPU codeAaron Enye Shi2019-04-021-2/+2
| | | | | | | | | | clang-format the changes to CUDA and HIP fat binary. Reviewers: yaxunl, tra Differential Revision: https://reviews.llvm.org/D60141 llvm-svn: 357532
* [HIP-Clang] Fat binary should not be produced for non GPU code 2Aaron Enye Shi2019-04-021-1/+3
| | | | | | | | | | Also for CUDA, we need to disable producing these fat binary functions when there is no GPU code. Reviewers: yaxunl, tra Differential Revision: https://reviews.llvm.org/D60141 llvm-svn: 357526
* [HIP-Clang] Fat binary should not be produced for non GPU codeAaron Enye Shi2019-04-021-0/+2
| | | | | | | | | | Skip producing the fat binary functions for HIP when no device code is present. Reviewers: yaxunl Differential Review: https://reviews.llvm.org/D60141 llvm-svn: 357520
* [HIP] change kernel stub nameYaxun Liu2019-02-271-0/+1
| | | | | | | | | | Add .stub to kernel stub function name so that it is different from kernel name in device code. This is necessary to let debugger find correct symbol for kernel. Differential Revision: https://reviews.llvm.org/D58518 llvm-svn: 354948
* revert r354615: [HIP] change kernel stub nameYaxun Liu2019-02-221-6/+0
| | | | | | | | It caused regressions. Differential Revision: https://reviews.llvm.org/D58518 llvm-svn: 354651
* [HIP] change kernel stub nameYaxun Liu2019-02-211-0/+6
| | | | | | | | | | Add .stub to kernel stub function name so that it is different from kernel name in device code. This is necessary to let debugger find correct symbol for kernel Differential Revision: https://reviews.llvm.org/D58518 llvm-svn: 354615
* [CUDA][HIP] Use device side kernel and variable names when registering themYaxun Liu2019-02-141-15/+53
| | | | | | | | | | | | | | | | | | __hipRegisterFunction and __hipRegisterVar need to accept device side kernel and variable names so that HIP runtime can associate kernel stub functions in host code with kernel symbols in fat binaries, and associate shadow variables in host code with device variables in fat binaries. Currently, clang assumes kernel functions and device variables have the same name as the kernel stub functions and shadow variables. However, when host is compiled in windows with MSVC C++ ABI and device is compiled with Itanium C++ ABI (e.g. AMDGPU), kernels and device symbols in fat binary are mangled differently than host. This patch gets the device side kernel and variable name by mangling them in the mangle context of aux target. Differential Revision: https://reviews.llvm.org/D58163 llvm-svn: 354004
* Basic CUDA-10 support.Artem Belevich2019-02-051-0/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D57771 llvm-svn: 353232
* [opaque pointer types] Pass function types for runtime function calls.James Y Knight2019-02-051-14/+14
| | | | | | | | | | | | | Emit{Nounwind,}RuntimeCall{,OrInvoke} have been modified to take a FunctionCallee as an argument, and CreateRuntimeFunction has been modified to return a FunctionCallee. All callers have been updated. Additionally, CreateBuiltinFunction is removed, as it was redundant with CreateRuntimeFunction after some previous changes. Differential Revision: https://reviews.llvm.org/D57668 llvm-svn: 353184
* Remove redundant FunctionDecl argument from a couple functions.James Y Knight2019-02-021-1/+1
| | | | | | | | | | | | | | | | | | This argument was added in r254554 in order to support the pass_object_size attribute. However, in r296076, the attribute's presence is now also represented in FunctionProtoType's ExtParameterInfo, and thus it's unnecessary to pass along a separate FunctionDecl. The functions modified are: RequiredArgs::forPrototype{,Plus}, and CodeGenTypes::ConvertFunctionType. After this, it's also (again) unnecessary to have a separate ConvertFunctionType function ConvertType, so convert callers back to the latter, leaving the former as an internal helper function. llvm-svn: 352946
* [CUDA] add support for the new kernel launch API in CUDA-9.2+.Artem Belevich2019-01-311-4/+106
| | | | | | | | | | | | | Instead of calling CUDA runtime to arrange function arguments, the new API constructs arguments in a local array and the kernels are launched with __cudaLaunchKernel(). The old API has been deprecated and is expected to go away in the next CUDA release. Differential Revision: https://reviews.llvm.org/D57488 llvm-svn: 352799
* Cleanup: replace uses of CallSite with CallBase.James Y Knight2019-01-301-4/+3
| | | | llvm-svn: 352595
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [CUDA] Use all 64 bits of GUID in __nv_module_idArtem Belevich2018-10-051-1/+1
| | | | | | | | | getGUID() returns an uint64_t and "%x" only prints 32 bits of it. Use PRIx64 format string to print all 64 bits. Differential Revision: https://reviews.llvm.org/D52938 llvm-svn: 343875
* [HIP] Support early finalization of device code for -fno-gpu-rdcYaxun Liu2018-10-021-10/+21
| | | | | | | | | | | | | | | | | | | | | | | | This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original options as aliases. When -fgpu-rdc is off, clang will assume the device code in each translation unit does not call external functions except those in the device library, therefore it is possible to compile the device code in each translation unit to self-contained kernels and embed them in the host object, so that the host object behaves like usual host object which can be linked by lld. The benefits of this feature is: 1. allow users to create static libraries which can be linked by host linker; 2. amortized device code linking time. This patch modifies HIP action builder to insert actions for linking device code and generating HIP fatbin, and pass HIP fatbin to host backend action. It extracts code for constructing command for generating HIP fatbin as a function so that it can be reused by early finalization. It also modifies codegen of HIP host constructor functions to embed the device fatbin when it is available. Differential Revision: https://reviews.llvm.org/D52377 llvm-svn: 343611
* [HIP] Make __hip_gpubin_handle hidden to avoid being merged across different ↵Yaxun Liu2018-08-171-0/+2
| | | | | | | | | | | | | | | | shared libraries Different shared libraries contain different fat binary, which is stored in a global variable __hip_gpubin_handle. Since different compilation units share the same fat binary, this variable has linkonce linkage. However, it should not be merged across different shared libraries. This patch set the visibility of the global variable to be hidden, which will make it invisible in the shared library, therefore preventing it from being merged. Differential Revision: https://reviews.llvm.org/D50596 llvm-svn: 340056
* [HIP] Register/unregister device fat binary only onceYaxun Liu2018-07-201-17/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | HIP generates one fat binary for all devices after linking. However, for each compilation unit a ctor function is emitted which register the same fat binary. Measures need to be taken to make sure the fat binary is only registered once. Currently each ctor function calls __hipRegisterFatBinary and stores the returned value to __hip_gpubin_handle. This patch changes the linkage of __hip_gpubin_handle to be linkonce so that they are shared between LLVM modules. Then this patch adds check of value of __hip_gpubin_handle to make sure __hipRegisterFatBinary is only called once. The code is equivalent to void *_gpubin_handle; void ctor() { if (__hip_gpubin_handle == 0) { __hip_gpubin_handle = __hipRegisterFatBinary(...); } // register kernels and variables. } The patch also does similar change to dtors so that __hipUnregisterFatBinary is called once. Differential Revision: https://reviews.llvm.org/D49083 llvm-svn: 337631
* [CUDA] Place all CUDA sections in __NV_CUDA segment on Mac.Artem Belevich2018-06-281-4/+6
| | | | | | | | That's where CUDA binaries appear to put them. Differential Revision: https://reviews.llvm.org/D48615 llvm-svn: 335880
* [CUDA] Use atexit() to call module destructor.Artem Belevich2018-06-271-0/+13
| | | | | | | | | | This matches the way NVCC does it. Doing module cleanup at global destructor phase used to work, but is, apparently, too late for the CUDA runtime in CUDA-9.2, which ends up crashing with double-free. Differential Revision: https://reviews.llvm.org/D48613 llvm-svn: 335763
* [CUDA] Fix emission of constant strings in sectionsJonas Hahnfeld2018-06-081-1/+5
| | | | | | | | | | | | | | | | | | | | | | | CGM.GetAddrOfConstantCString() sets the adress of the created GlobalValue to unnamed. When emitting the object file LLVM will mark the surrounding section as SHF_MERGE iff the string is nul-terminated and contains no other nuls (see IsNullTerminatedString). This results in problems when saving temporaries because LLVM doesn't set an EntrySize, so reading in the serialized assembly file fails. This never happened for the GPU binaries because they usually contain a nul-character somewhere. Instead this only affected the module ID when compiling relocatable device code. However, this points to a potentially larger problem: If we put a constant string into a named section, we really want the data to end up in that section in the object file. To avoid LLVM merging sections this patch unmarks the GlobalVariable's address as unnamed which also fixes the problem of invalid serialized assembly files when saving temporaries. Differential Revision: https://reviews.llvm.org/D47902 llvm-svn: 334281
* [HIP] Support offloading by linker scriptYaxun Liu2018-05-181-38/+77
| | | | | | | | | | | | To support linking device code in different source files, it is necessary to embed fat binary at host linking stage. This patch emits an external symbol for fat binary in host codegen, then embed the fat binary by lld through a linker script. Differential Revision: https://reviews.llvm.org/D46472 llvm-svn: 332724
* [HIP] Add hip input kind and codegen for kernel launchingYaxun Liu2018-04-251-16/+42
| | | | | | | | | | | | | | | | | | | | | | | HIP is a language similar to CUDA (https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md ). The language syntax is very similar, which allows a hip program to be compiled as a CUDA program by Clang. The main difference is the host API. HIP has a set of vendor neutral host API which can be implemented on different platforms. Currently there is open source implementation of HIP runtime on amdgpu target (https://github.com/ROCm-Developer-Tools/HIP). This patch adds support of input kind and language standard hip. When hip file is compiled, both LangOpts.CUDA and LangOpts.HIP is turned on. This allows compilation of hip program as CUDA in most cases and only special handling of hip program is needed LangOpts.HIP is checked. This patch also adds support of kernel launching of HIP program using HIP host API. When -x hip is not specified, there is no behaviour change for CUDA. Patch by Greg Rodgers. Revised and lit test added by Yaxun Liu. Differential Revision: https://reviews.llvm.org/D44984 llvm-svn: 330790
* [CUDA] Register relocatable GPU binariesJonas Hahnfeld2018-04-201-19/+102
| | | | | | | | | | nvcc generates a unique registration function for each object file that contains relocatable device code. Unique names are achieved with a module id that is also reflected in the function's name. Differential Revision: https://reviews.llvm.org/D42922 llvm-svn: 330425
* [CUDA] Include single GPU binary, NFCI.Jonas Hahnfeld2018-02-281-75/+60
| | | | | | | | | Binaries for multiple architectures are combined by fatbinary, so the current code was effectively not needed. Differential Revision: https://reviews.llvm.org/D43461 llvm-svn: 326342
* Suppress all uses of LLVM_END_WITH_NULL. NFC.Serge Guelton2017-05-091-1/+1
| | | | | | | | | | Use variadic templates instead of relying on <cstdarg> + sentinel. This enforces better type checking and makes code more readable. Differential revision: https://reviews.llvm.org/D32550 llvm-svn: 302572
* Promote ConstantInitBuilder to be a public CodeGen API; it'sJohn McCall2017-03-021-1/+1
| | | | | | a generally useful utility for other frontends. NFC. llvm-svn: 296806
* ConstantBuilder -> ConstantInitBuilder for clarity, andJohn McCall2016-11-281-1/+1
| | | | | | | move the member classes up to top level to allow forward declarations to name them. NFC. llvm-svn: 288079
* Introduce a helper class for building complex constant initializers. NFC.John McCall2016-11-191-14/+21
| | | | | | | I've adopted this in most of the places it makes sense, but v-tables and CGObjCMac will need a second pass. llvm-svn: 287437
* [CUDA] Use the right section and constant names for fatbins when compiling ↵Justin Lebar2016-11-181-3/+8
| | | | | | | | | | | | for macos. Reviewers: tra Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D26777 llvm-svn: 287287
* [CUDA] Place GPU binary into .nv_fatbin section and align it by 8.Artem Belevich2016-08-121-1/+10
| | | | | | | | | This matches the way nvcc encapsulates GPU binaries into host object file. Now cuobjdump can deal with clang-compiled object files. Differential Revision: https://reviews.llvm.org/D23429 llvm-svn: 278549
* [CUDA] Align kernel launch args correctly when the LLVM type's alignment is ↵Justin Lebar2016-07-271-25/+16
| | | | | | | | | | | | | | | | | | | | | | | different from the clang type's alignment. Summary: Before this patch, we computed the offsets in memory of args passed to GPU kernel functions by throwing all of the args into an LLVM struct. clang emits packed llvm structs basically whenever it feels like it, and packed structs have alignment 1. So we cannot rely on the llvm type's alignment matching the C++ type's alignment. This patch fixes our codegen so we always respect the clang types' alignments. Reviewers: rnk Subscribers: cfe-commits, tra Differential Revision: https://reviews.llvm.org/D22879 llvm-svn: 276927
* [CUDA] Move argument type lists to the stack. NFC.Benjamin Kramer2016-07-021-4/+4
| | | | llvm-svn: 274433
* Use arrays or initializer lists to feed ArrayRefs instead of SmallVector ↵Benjamin Kramer2016-07-021-4/+1
| | | | | | | | where possible. No functionality change intended llvm-svn: 274432
* [CUDA] Do not generate unnecessary runtime init code.Artem Belevich2016-03-021-1/+14
| | | | | | Differential Revision: http://reviews.llvm.org/D17780 llvm-svn: 262499
* [CUDA] Emit host-side 'shadows' for device-side global variablesArtem Belevich2016-03-021-15/+51
| | | | | | | | | | | | | ... and register them with CUDA runtime. This is needed for commonly used cudaMemcpy*() APIs that use address of host-side shadow to access their counterparts on device side. Fixes PR26340 Differential Revision: http://reviews.llvm.org/D17779 llvm-svn: 262498
* [CUDA] Invoke ptxas and fatbinary during compilation.Justin Lebar2016-01-141-0/+2
| | | | | | | | | | | | | | | | | | | | Summary: Previously we compiled CUDA device code to PTX assembly and embedded that asm as text in our host binary. Now we compile to PTX assembly and then invoke ptxas to assemble the PTX into a cubin file. We gather the ptx and cubin files for each of our --cuda-gpu-archs and combine them using fatbinary, and then embed that into the host binary. Adds two new command-line flags, -Xcuda_ptxas and -Xcuda_fatbinary, which pass args down to the external tools. Reviewers: tra, echristo Subscribers: cfe-commits, jhen Differential Revision: http://reviews.llvm.org/D16082 llvm-svn: 257809
* Compute and preserve alignment more faithfully in IR-generation.John McCall2015-09-081-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce an Address type to bundle a pointer value with an alignment. Introduce APIs on CGBuilderTy to work with Address values. Change core APIs on CGF/CGM to traffic in Address where appropriate. Require alignments to be non-zero. Update a ton of code to compute and propagate alignment information. As part of this, I've promoted CGBuiltin's EmitPointerWithAlignment helper function to CGF and made use of it in a number of places in the expression emitter. The end result is that we should now be significantly more correct when performing operations on objects that are locally known to be under-aligned. Since alignment is not reliably tracked in the type system, there are inherent limits to this, but at least we are no longer confused by standard operations like derived-to-base conversions and array-to-pointer decay. I've also fixed a large number of bugs where we were applying the complete-object alignment to a pointer instead of the non-virtual alignment, although most of these were hidden by the very conservative approach we took with member alignment. Also, because IRGen now reliably asserts on zero alignments, we should no longer be subject to an absurd but frustrating recurring bug where an incomplete type would report a zero alignment and then we'd naively do a alignmentAtOffset on it and emit code using an alignment equal to the largest power-of-two factor of the offset. We should also now be emitting much more aggressive alignment attributes in the presence of over-alignment. In particular, field access now uses alignmentAtOffset instead of min. Several times in this patch, I had to change the existing code-generation pattern in order to more effectively use the Address APIs. For the most part, this seems to be a strict improvement, like doing pointer arithmetic with GEPs instead of ptrtoint. That said, I've tried very hard to not change semantics, but it is likely that I've failed in a few places, for which I apologize. ABIArgInfo now always carries the assumed alignment of indirect and indirect byval arguments. In order to cut down on what was already a dauntingly large patch, I changed the code to never set align attributes in the IR on non-byval indirect arguments. That is, we still generate code which assumes that indirect arguments have the given alignment, but we don't express this information to the backend except where it's semantically required (i.e. on byvals). This is likely a minor regression for those targets that did provide this information, but it'll be trivial to add it back in a later patch. I partially punted on applying this work to CGBuiltin. Please do not add more uses of the CreateDefaultAligned{Load,Store} APIs; they will be going away eventually. llvm-svn: 246985
* Revert r240270 ("Fixed/added namespace ending comments using clang-tidy").Alexander Kornienko2015-06-221-1/+1
| | | | llvm-svn: 240353
* Fixed/added namespace ending comments using clang-tidy. NFCAlexander Kornienko2015-06-221-1/+1
| | | | | | | | | | | | The patch is generated using this command: $ tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \ work/llvm/tools/clang To reduce churn, not touching namespaces spanning less than 10 lines. llvm-svn: 240270
* [cuda] Include GPU binary into host object file and generate init/deinit code.Artem Belevich2015-05-071-13/+205
| | | | | | | | | | | | - added -fcuda-include-gpubinary option to incorporate results of device-side compilation into host-side one. - generate code to register GPU binaries and associated kernels with CUDA runtime and clean-up on exit. - added test case for init/deinit code generation. Differential Revision: http://reviews.llvm.org/D9507 llvm-svn: 236765
* [C++11] Add 'override' keyword to virtual methods that override their base ↵Craig Topper2014-03-121-1/+1
| | | | | | class. llvm-svn: 203643
* [Modules] Update to reflect the move of CallSite into the IR library inChandler Carruth2014-03-041-1/+1
| | | | | | LLVM r202816. llvm-svn: 202817
* Use the actual ABI-determined C calling convention for runtimeJohn McCall2013-02-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | calls and declarations. LLVM has a default CC determined by the target triple. This is not always the actual default CC for the ABI we've been asked to target, and so we sometimes find ourselves annotating all user functions with an explicit calling convention. Since these calling conventions usually agree for the simple set of argument types passed to most runtime functions, using the LLVM-default CC in principle has no effect. However, the LLVM optimizer goes into histrionics if it sees this kind of formal CC mismatch, since it has no concept of CC compatibility. Therefore, if this module happens to define the "runtime" function, or got LTO'ed with such a definition, we can miscompile; so it's quite important to get this right. Defining runtime functions locally is quite common in embedded applications. llvm-svn: 176286
* Remove useless 'llvm::' qualifier from names like StringRef and others that areDmitri Gribenko2013-01-121-1/+1
| | | | | | brought into 'clang' namespace by clang/Basic/LLVM.h llvm-svn: 172323
* Rewrite #includes for llvm/Foo.h to llvm/IR/Foo.h as appropriate toChandler Carruth2013-01-021-3/+3
| | | | | | | | reflect the migration in r171366. Re-sort the #include lines to reflect the new paths. llvm-svn: 171369
* Sort all of Clang's files under 'lib', and fix up the broken headersChandler Carruth2012-12-041-1/+0
| | | | | | | | | | | | | uncovered. This required manually correcting all of the incorrect main-module headers I could find, and running the new llvm/utils/sort_includes.py script over the files. I also manually added quite a few missing headers that were uncovered by shuffling the order or moving headers up to be main-module-headers. llvm-svn: 169237
OpenPOWER on IntegriCloud