summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGenCUDA
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix lit test kernel-call.cu failure on ps4 due to dso_localYaxun Liu2018-04-251-2/+2
| | | | llvm-svn: 330795
* Fix failure in lit test kernel-call.cuYaxun Liu2018-04-251-1/+1
| | | | | | There is signext on ppc64. Just remove check for function argument. llvm-svn: 330793
* [HIP] Add hip input kind and codegen for kernel launchingYaxun Liu2018-04-253-45/+74
| | | | | | | | | | | | | | | | | | | | | | | HIP is a language similar to CUDA (https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md ). The language syntax is very similar, which allows a hip program to be compiled as a CUDA program by Clang. The main difference is the host API. HIP has a set of vendor neutral host API which can be implemented on different platforms. Currently there is open source implementation of HIP runtime on amdgpu target (https://github.com/ROCm-Developer-Tools/HIP). This patch adds support of input kind and language standard hip. When hip file is compiled, both LangOpts.CUDA and LangOpts.HIP is turned on. This allows compilation of hip program as CUDA in most cases and only special handling of hip program is needed LangOpts.HIP is checked. This patch also adds support of kernel launching of HIP program using HIP host API. When -x hip is not specified, there is no behaviour change for CUDA. Patch by Greg Rodgers. Revised and lit test added by Yaxun Liu. Differential Revision: https://reviews.llvm.org/D44984 llvm-svn: 330790
* [CUDA] Set LLVM calling convention for CUDA kernelYaxun Liu2018-04-201-0/+41
| | | | | | | | | | | | | | Some targets need special LLVM calling convention for CUDA kernel. This patch does that through a TargetCodeGenInfo hook. It only affects amdgcn target. Patch by Greg Rodgers. Revised and lit tests added by Yaxun Liu. Differential Revision: https://reviews.llvm.org/D45223 llvm-svn: 330447
* [CUDA] Register relocatable GPU binariesJonas Hahnfeld2018-04-201-45/+66
| | | | | | | | | | nvcc generates a unique registration function for each object file that contains relocatable device code. Unique names are achieved with a module id that is also reflected in the function's name. Differential Revision: https://reviews.llvm.org/D42922 llvm-svn: 330425
* Remove -cc1 option "-backend-option".Eli Friedman2018-04-121-1/+1
| | | | | | | | | It means the same thing as -mllvm; there isn't any reason to have two options which do the same thing. Differential Revision: https://reviews.llvm.org/D45109 llvm-svn: 329965
* Fix typos in clangAlexander Kornienko2018-04-062-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Found via codespell -q 3 -I ../clang-whitelist.txt Where whitelist consists of: archtype cas classs checkk compres definit frome iff inteval ith lod methode nd optin ot pres statics te thru Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few files that have dubious fixes reverted.) Differential revision: https://reviews.llvm.org/D44188 llvm-svn: 329399
* Revert "Set calling convention for CUDA kernel"Artem Belevich2018-04-031-29/+0
| | | | | | | This reverts r328795 which introduced an issue with referencing __global__ function templates. More details in the original review D44747. llvm-svn: 329099
* [CUDA] Let device-side shared variables be initialized with undefYaxun Liu2018-04-022-36/+93
| | | | | | | | | | | CUDA shared variable should be initialized with undef. Patch by Greg Rodgers. Revised and lit test added by Yaxun Liu. Differential Revision: https://reviews.llvm.org/D44985 llvm-svn: 328994
* Set calling convention for CUDA kernelYaxun Liu2018-03-291-0/+29
| | | | | | | | | | | This patch sets target specific calling convention for CUDA kernels in IR. Patch by Greg Rodgers. Revised and lit test added by Yaxun Liu. Differential Revision: https://reviews.llvm.org/D44747 llvm-svn: 328795
* Disable emitting static extern C aliases for amdgcn target for CUDAYaxun Liu2018-03-291-0/+3
| | | | | | | | | Patch by Greg Rodgers. Revised and lit test added by Yaxun Liu. Differential Revision: https://reviews.llvm.org/D44987 llvm-svn: 328793
* Really fix test on windows.Rafael Espindola2018-02-231-3/+3
| | | | | | Sorry for the noise. llvm-svn: 325943
* Fix one last test on a windows host.Rafael Espindola2018-02-231-1/+1
| | | | llvm-svn: 325942
* [CUDA] CUDA has no device-side library builtins.Artem Belevich2018-01-231-0/+22
| | | | | | | | | | We should (almost) never consider a device-side declaration to match a library builtin functio. Otherwise clang may ignore the implementation provided by the CUDA headers and emit clang's idea of the builtin. Differential Revision: https://reviews.llvm.org/D42319 llvm-svn: 323239
* CodeGenModule: Always output wchar_size, check LLVM assumptions.Matthias Braun2017-05-201-2/+2
| | | | | | | | | | | | | | Re-commit r303463 now that LLVM is fixed and adjust some lit tests. llvm::TargetLibraryInfo needs to know the size of wchar_t to work on functions like `wcslen`. This patch changes clang to always emit the wchar_size module flag (it would only do so for ARM previously). This also adds an `assert()` to ensure the LLVM defaults based on the target triple are in sync with clang. Differential Revision: https://reviews.llvm.org/D32982 llvm-svn: 303478
* Use FPContractModeKind universallyAdam Nemet2017-03-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | FPContractModeKind is the codegen option flag which is already ternary (off, on, fast). This makes it universally the type for the contractable info across the front-end: * In FPOptions (i.e. in the Sema + in the expression nodes). * In LangOpts::DefaultFPContractMode which is the option that initializes FPOptions in the Sema. Another way to look at this change is that before fp-contractable on/off were the only states handled to the front-end: * For "on", FMA folding was performed by the front-end * For "fast", we simply forwarded the flag to TargetOptions to handle it in LLVM Now off/on/fast are all exposed because for fast we will generate fast-math-flags during CodeGen. This is toward moving fp-contraction=fast from an LLVM TargetOption to a FastMathFlag in order to fix PR25721. --- This is a recommit of r299027 with an adjustment to the test CodeGenCUDA/fp-contract.cu. The test assumed that even though -ffp-contract=on is passed FE-based folding of FMA won't happen. This is obviously wrong since the user is asking for this explicitly with the option. CUDA is different that -ffp-contract=fast is on by default. The test used to "work" because contract=fast and contract=on were maintained separately and we didn't fold in the FE because contract=fast was on due to the target-default. This patch consolidates the contract=on/fast/off state into a ternary state hence the change in behavior. --- Differential Revision: https://reviews.llvm.org/D31167 llvm-svn: 299033
* [CodeGen] [CUDA] Add the ability set default attrs on functions in linked ↵Justin Lebar2017-01-251-0/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modules. Summary: Now when you ask clang to link in a bitcode module, you can tell it to set attributes on that module's functions to match what we would have set if we'd emitted those functions ourselves. This is particularly important for fast-math attributes in CUDA compilations. Each CUDA compilation links in libdevice, a bitcode library provided by nvidia as part of the CUDA distribution. Without this patch, if we have a user-function F that is compiled with -ffast-math that calls a function G from libdevice, F will have the unsafe-fp-math=true (etc.) attributes, but G will have no attributes. Since F calls G, the inliner will merge G's attributes into F's. It considers the lack of an unsafe-fp-math=true attribute on G to be tantamount to unsafe-fp-math=false, so it "merges" these by setting unsafe-fp-math=false on F. This then continues up the call graph, until every function that (transitively) calls something in libdevice gets unsafe-fp-math=false set, thus disabling fastmath in almost all CUDA code. Reviewers: echristo Subscribers: hfinkel, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D28538 llvm-svn: 293097
* [CUDA] Improve target attribute checking for function templates.Artem Belevich2016-12-071-3/+3
| | | | | | | | | | | | * __host__ __device__ functions are no longer considered to be redeclarations of __host__ or __device__ functions. This prevents unintentional merging of target attributes across them. * Function target attributes are not considered (and must match) during explicit instantiation and specialization of function templates. Differential Revision: https://reviews.llvm.org/D25809 llvm-svn: 288962
* [CUDA] Rename cuda_builtin_vars.h to __clang_cuda_builtin_vars.h.Justin Lebar2016-10-081-1/+1
| | | | | | | | | | | | Summary: This matches the idiom we use for our other CUDA wrapper headers. Reviewers: tra Subscribers: beanz, mgorny, cfe-commits Differential Revision: https://reviews.llvm.org/D24978 llvm-svn: 283679
* [CUDA] Add missing ':' to noexcept.cu test.Justin Lebar2016-10-051-1/+1
| | | | llvm-svn: 283280
* [CUDA] Mark device functions as nounwind.Justin Lebar2016-10-043-5/+44
| | | | | | | | | | | | | | | | | | | Summary: This prevents clang from emitting 'invoke's and catch statements. Things previously mostly worked thanks to TryToMarkNoThrow() in CodeGenFunction. But this is not a proper IPO, and it doesn't properly handle cases like mutual recursion. Fixes bug 30593. Reviewers: tra Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D25166 llvm-svn: 283272
* [CUDA] Disallow overloading destructors.Justin Lebar2016-10-031-8/+0
| | | | | | | | | | | | | | | | | | | | | | Summary: We'd attempted to allow this, but turns out we were doing a very bad job. :) Making this work properly would be a giant change in clang. For example, we'd need to make CXXRecordDecl::getDestructor() context-sensitive, because the destructor you end up with depends on where you're calling it from. For now (and hopefully for ever), just disallow overloading of destructors in CUDA. Reviewers: rsmith Subscribers: cfe-commits, tra Differential Revision: https://reviews.llvm.org/D24571 llvm-svn: 283120
* [CUDA] Raise an error if a wrong-side call is codegen'ed.Justin Lebar2016-08-151-32/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Some function calls in CUDA are allowed to appear in semantically-correct programs but are an error if they're ever codegen'ed. Specifically, a host+device function may call a host function, but it's an error if such a function is ever codegen'ed in device mode (and vice versa). Previously, clang made no attempt to catch these errors. For the most part, they would be caught by ptxas, and reported as "call to unknown function 'foo'". Now we catch these errors and report them the same as we report other illegal calls (e.g. a call from a host function to a device function). This has a small change in error-message behavior for calls that were previously disallowed (e.g. calls from a host to a device function). Previously, we'd catch disallowed calls fairly early, before doing additional semantic checking e.g. of the call's arguments. Now we catch these illegal calls at the very end of our semantic checks, so we'll only emit a "illegal CUDA call" error if the call is otherwise well-formed. Reviewers: tra, rnk Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D23242 llvm-svn: 278759
* [CUDA] Place GPU binary into .nv_fatbin section and align it by 8.Artem Belevich2016-08-121-2/+4
| | | | | | | | | This matches the way nvcc encapsulates GPU binaries into host object file. Now cuobjdump can deal with clang-compiled object files. Differential Revision: https://reviews.llvm.org/D23429 llvm-svn: 278549
* [CUDA] Align kernel launch args correctly when the LLVM type's alignment is ↵Justin Lebar2016-07-271-0/+36
| | | | | | | | | | | | | | | | | | | | | | | different from the clang type's alignment. Summary: Before this patch, we computed the offsets in memory of args passed to GPU kernel functions by throwing all of the args into an LLVM struct. clang emits packed llvm structs basically whenever it feels like it, and packed structs have alignment 1. So we cannot rely on the llvm type's alignment matching the C++ type's alignment. This patch fixes our codegen so we always respect the clang types' alignments. Reviewers: rnk Subscribers: cfe-commits, tra Differential Revision: https://reviews.llvm.org/D22879 llvm-svn: 276927
* NVPTX: Use the nvvm builtins to read SRegs rather than the legacy ptx onesJustin Bogner2016-07-071-12/+12
| | | | | | The ptx spellings were removed from LLVM in r274769. llvm-svn: 274770
* [CUDA] Give templated device functions internal linkage, templated kernels ↵Justin Lebar2016-06-301-2/+2
| | | | | | | | | | | | | | | | | external linkage. Summary: This lets LLVM perform IPO over these functions. In particular, it allows LLVM to emit ld.global.nc for loads to __restrict pointers in kernels that are never written to. Reviewers: rsmith Subscribers: cfe-commits, tra Differential Revision: http://reviews.llvm.org/D21337 llvm-svn: 274261
* [CUDA] Add implicit conversion of __launch_bounds__ arguments to rvalue.Artem Belevich2016-06-061-0/+5
| | | | | | | | Fixes clang crash reported in PR27778. Differential Revision: http://reviews.llvm.org/D20985 llvm-svn: 271951
* [CUDA] Conservatively mark inline asm as convergent.Justin Lebar2016-05-311-0/+6
| | | | | | | | | | | | | | Summary: This is particularly important because a some convergent CUDA intrinsics (e.g. __shfl_down) are implemented in terms of inline asm. Reviewers: tra Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D20836 llvm-svn: 271336
* Avoid depending on test inputes that aren't in InputsReid Kleckner2016-05-202-133/+157
| | | | | | | | | | Some people have weird CI systems that run each test subdirectory independently without access to other parallel trees. Unfortunately, this means we have to suffer some duplication until Art can sort out how to share these types. llvm-svn: 270164
* [CUDA] Do not allow non-empty destructors for global device-side variables.Artem Belevich2016-05-191-3/+79
| | | | | | | | | | | | | | According to Cuda Programming guide (v7.5, E2.3.1): > __device__, __constant__ and __shared__ variables defined in namespace > scope, that are of class type, cannot have a non-empty constructor or a > non-empty destructor. Clang already deals with device-side constructors (see D15305). This patch enforces similar rules for destructors. Differential Revision: http://reviews.llvm.org/D20140 llvm-svn: 270108
* [CUDA] Split device-var-init.cu tests into separate Sema and CodeGen parts.Artem Belevich2016-05-191-199/+33
| | | | | | | | | | | Codegen tests for device-side variable initialization are subset of test cases used to verify Sema's part of the job. Including CodeGenCUDA/device-var-init.cu from SemaCUDA makes it easier to keep both sides in sync. Differential Revision: http://reviews.llvm.org/D20139 llvm-svn: 270107
* [CUDA] Enable fusing FP ops (-ffp-contract=fast) for CUDA by default.Artem Belevich2016-05-191-0/+32
| | | | | | | | | This matches default nvcc behavior and gives substantial performance boost on GPU where fmad is much cheaper compared to add+mul. Differential Revision: http://reviews.llvm.org/D20341 llvm-svn: 270094
* [CUDA] Fix flush-denormals.cu test so that it checks what it intends to CHECK.Justin Lebar2016-05-101-3/+5
| | | | | | | FileCheck does not evaluate plain CHECKs if you pass -check-prefix; you have to ask for it explicitly. llvm-svn: 269000
* [CUDA] Restrict init of local __shared__ variables to empty constructors only.Artem Belevich2016-05-092-33/+10
| | | | | | | | | | Allow only empty constructors for local __shared__ variables in a way identical to restrictions imposed on dynamic initializers for global variables on device. Differential Revision: http://reviews.llvm.org/D20039 llvm-svn: 268982
* [CUDA] Only __shared__ variables can be static local on device side.Artem Belevich2016-05-092-8/+8
| | | | | | | | | | According to CUDA programming guide (v7.5): > E.2.9.4: Within the body of a device or global function, only > shared variables may be declared with static storage class. Differential Revision: http://reviews.llvm.org/D20034 llvm-svn: 268962
* [CUDA] Make sure device-side __global__ functions are always visible.Artem Belevich2016-05-021-2/+11
| | | | | | | | | | | | __global__ functions are a special case in CUDA. Even when the symbol would normally not be externally visible according to C++ rules, they still must be visible in CUDA GPU object so host-side stub can launch them. Differential Revision: http://reviews.llvm.org/D19748 llvm-svn: 268299
* [CUDA] Add -fcuda-flush-denormals-to-zero.Justin Lebar2016-04-051-0/+23
| | | | | | | | | | | | | | | | | | Summary: Setting this flag causes all functions are annotated with the "nvvm-f32ftz" = "true" attribute. In addition, we annotate the module with "nvvm-reflect-ftz" set to 0 or 1, depending on whether -cuda-flush-denormals-to-zero is set. This is read by the NVVMReflect pass. Reviewers: tra, rnk Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D18671 llvm-svn: 265435
* [CUDA] Add -disable-llvm-passes to CodeGenCUDA/link-device-bitcode.cu. NFCJustin Lebar2016-03-301-4/+4
| | | | | | | | We already have this flag in most of the file, but we need it everywhere else, to disable the NVVMReflect pass, which we're explicitly checking doesn't run here. (Upcoming changes to llvm will cause it to be run.) llvm-svn: 264969
* [CUDA] Remove three obsolete CUDA cc1 flags.Justin Lebar2016-03-292-18/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: * -fcuda-target-overloads Previously unconditionally set to true by the driver. Necessary for correct functioning of the compiler -- our CUDA headers wrapper won't compile without this. * -fcuda-disable-target-call-checks Previously unconditionally set to true by the driver. Necessary to compile almost any external CUDA code -- almost all libraries assume that host+device code can call host or device functions. * -fcuda-allow-host-calls-from-host-device No effect when target overloading is enabled. Reviewers: tra Subscribers: rsmith, cfe-commits Differential Revision: http://reviews.llvm.org/D18416 llvm-svn: 264739
* [CUDA] Merge most of CodeGenCUDA/function-overload.cu into ↵Justin Lebar2016-03-231-230/+4
| | | | | | | | | | | | | | | | | | | | SemaCUDA/function-overload.cu. Summary: Previously we were using the codegen test to ensure that we choose the right overload. But we can do this within sema, with a bit of cleverness. I left the constructor/destructor checks in CodeGen, because these overloads (particularly on the destructors) are hard to check in Sema. Reviewers: tra Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D18386 llvm-svn: 264207
* Fixed test failure platforms with name mangling different from Linux.Artem Belevich2016-03-021-3/+4
| | | | | | | | * Run cc with -triple x86_64-linux-gnu to make symbol mangling predictable. * Use temporary file as a fake GPU input so its content does not interfere with pattern matching. llvm-svn: 262516
* [CUDA] Do not generate unnecessary runtime init code.Artem Belevich2016-03-021-0/+20
| | | | | | Differential Revision: http://reviews.llvm.org/D17780 llvm-svn: 262499
* [CUDA] Emit host-side 'shadows' for device-side global variablesArtem Belevich2016-03-022-7/+46
| | | | | | | | | | | | | ... and register them with CUDA runtime. This is needed for commonly used cudaMemcpy*() APIs that use address of host-side shadow to access their counterparts on device side. Fixes PR26340 Differential Revision: http://reviews.llvm.org/D17779 llvm-svn: 262498
* [CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.Justin Lebar2016-02-242-1/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is important for e.g. the following case: void sync() { __syncthreads(); } void foo() { do_something(); sync(); do_something_else(): } Without this change, if the optimizer does not inline sync() (which it won't because __syncthreads is also marked as noduplicate, for now anyway), it is free to perform optimizations on sync() that it would not be able to perform on __syncthreads(), because sync() is not marked as convergent. Similarly, we need a notion of convergent calls, since in the case when we can't statically determine a call's target(s), we need to know whether it's safe to perform optimizations around the call. This change is conservative; the optimizer will remove these attrs where it can, see r260318, r260319. Reviewers: majnemer Subscribers: cfe-commits, jhen, echristo, tra Differential Revision: http://reviews.llvm.org/D17056 llvm-svn: 261779
* [CUDA] Tweak attribute-based overload resolution to match nvcc behavior.Artem Belevich2016-02-121-17/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | This is an artefact of split-mode CUDA compilation that we need to mimic. HD functions are sometimes allowed to call H or D functions. Due to split compilation mode device-side compilation will not see host-only function and thus they will not be considered at all. For clang both H and D variants will become function overloads visible to compiler. Normally target attribute is considered only if C++ rules can not determine which function is better. However in this case we need to ignore functions that would not be present during current compilation phase before we apply normal overload resolution rules. Changes: * introduced another level of call preference to better describe possible call combinations. * removed WrongSide functions from consideration if the set contains SameSide function. * disabled H->D, D->H and G->H calls. These combinations are not allowed by CUDA and we were reluctantly allowing them to work around device-side calls to math functions in std namespace. We no longer need it after r258880. Differential Revision: http://reviews.llvm.org/D16870 llvm-svn: 260697
* [CUDA] Don't crash when trying to printf a non-scalar object.Justin Lebar2016-02-111-0/+17
| | | | | | | | | | | | | | Summary: We can't do the right thing, since there's no right thing to do, but at least we can not crash the compiler. Reviewers: majnemer, rnk Subscribers: cfe-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D17103 llvm-svn: 260479
* [CUDA] Do not allow dynamic initialization of global device side variables.Artem Belevich2016-02-021-0/+393
| | | | | | | | | | | | | | In general CUDA does not allow dynamic initialization of global device-side variables. One exception is that CUDA allows records with empty constructors as described in section E2.2.1 of CUDA 7.5 Programming guide. This patch applies initializer checks for all device-side variables. Empty constructors are accepted, but no code is generated for them. Differential Revision: http://reviews.llvm.org/D15305 llvm-svn: 259592
* [CUDA] Generate CUDA's printf alloca in its function's entry block.Justin Lebar2016-01-281-33/+23
| | | | | | | | | | | | | | | Summary: This is necessary to prevent llvm from generating stacksave intrinsics around this alloca. NVVM doesn't have a stack, and we don't handle said intrinsics. Reviewers: rnk, echristo Subscribers: cfe-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D16664 llvm-svn: 259122
* [CUDA] Don't generate aliases for static extern "C" functions.Justin Lebar2016-01-251-0/+17
| | | | | | | | | | | | | | Summary: These aliases are done to support inline asm, but there's nothing we can do: NVPTX doesn't support aliases. Reviewers: tra Subscribers: cfe-commits, jhen, echristo Differential Revision: http://reviews.llvm.org/D16501 llvm-svn: 258734
OpenPOWER on IntegriCloud