summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGenCUDA
Commit message (Collapse)AuthorAgeFilesLines
* [CUDA] Raise an error if a wrong-side call is codegen'ed.Justin Lebar2016-08-151-32/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Some function calls in CUDA are allowed to appear in semantically-correct programs but are an error if they're ever codegen'ed. Specifically, a host+device function may call a host function, but it's an error if such a function is ever codegen'ed in device mode (and vice versa). Previously, clang made no attempt to catch these errors. For the most part, they would be caught by ptxas, and reported as "call to unknown function 'foo'". Now we catch these errors and report them the same as we report other illegal calls (e.g. a call from a host function to a device function). This has a small change in error-message behavior for calls that were previously disallowed (e.g. calls from a host to a device function). Previously, we'd catch disallowed calls fairly early, before doing additional semantic checking e.g. of the call's arguments. Now we catch these illegal calls at the very end of our semantic checks, so we'll only emit a "illegal CUDA call" error if the call is otherwise well-formed. Reviewers: tra, rnk Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D23242 llvm-svn: 278759
* [CUDA] Place GPU binary into .nv_fatbin section and align it by 8.Artem Belevich2016-08-121-2/+4
| | | | | | | | | This matches the way nvcc encapsulates GPU binaries into host object file. Now cuobjdump can deal with clang-compiled object files. Differential Revision: https://reviews.llvm.org/D23429 llvm-svn: 278549
* [CUDA] Align kernel launch args correctly when the LLVM type's alignment is ↵Justin Lebar2016-07-271-0/+36
| | | | | | | | | | | | | | | | | | | | | | | different from the clang type's alignment. Summary: Before this patch, we computed the offsets in memory of args passed to GPU kernel functions by throwing all of the args into an LLVM struct. clang emits packed llvm structs basically whenever it feels like it, and packed structs have alignment 1. So we cannot rely on the llvm type's alignment matching the C++ type's alignment. This patch fixes our codegen so we always respect the clang types' alignments. Reviewers: rnk Subscribers: cfe-commits, tra Differential Revision: https://reviews.llvm.org/D22879 llvm-svn: 276927
* NVPTX: Use the nvvm builtins to read SRegs rather than the legacy ptx onesJustin Bogner2016-07-071-12/+12
| | | | | | The ptx spellings were removed from LLVM in r274769. llvm-svn: 274770
* [CUDA] Give templated device functions internal linkage, templated kernels ↵Justin Lebar2016-06-301-2/+2
| | | | | | | | | | | | | | | | | external linkage. Summary: This lets LLVM perform IPO over these functions. In particular, it allows LLVM to emit ld.global.nc for loads to __restrict pointers in kernels that are never written to. Reviewers: rsmith Subscribers: cfe-commits, tra Differential Revision: http://reviews.llvm.org/D21337 llvm-svn: 274261
* [CUDA] Add implicit conversion of __launch_bounds__ arguments to rvalue.Artem Belevich2016-06-061-0/+5
| | | | | | | | Fixes clang crash reported in PR27778. Differential Revision: http://reviews.llvm.org/D20985 llvm-svn: 271951
* [CUDA] Conservatively mark inline asm as convergent.Justin Lebar2016-05-311-0/+6
| | | | | | | | | | | | | | Summary: This is particularly important because a some convergent CUDA intrinsics (e.g. __shfl_down) are implemented in terms of inline asm. Reviewers: tra Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D20836 llvm-svn: 271336
* Avoid depending on test inputes that aren't in InputsReid Kleckner2016-05-202-133/+157
| | | | | | | | | | Some people have weird CI systems that run each test subdirectory independently without access to other parallel trees. Unfortunately, this means we have to suffer some duplication until Art can sort out how to share these types. llvm-svn: 270164
* [CUDA] Do not allow non-empty destructors for global device-side variables.Artem Belevich2016-05-191-3/+79
| | | | | | | | | | | | | | According to Cuda Programming guide (v7.5, E2.3.1): > __device__, __constant__ and __shared__ variables defined in namespace > scope, that are of class type, cannot have a non-empty constructor or a > non-empty destructor. Clang already deals with device-side constructors (see D15305). This patch enforces similar rules for destructors. Differential Revision: http://reviews.llvm.org/D20140 llvm-svn: 270108
* [CUDA] Split device-var-init.cu tests into separate Sema and CodeGen parts.Artem Belevich2016-05-191-199/+33
| | | | | | | | | | | Codegen tests for device-side variable initialization are subset of test cases used to verify Sema's part of the job. Including CodeGenCUDA/device-var-init.cu from SemaCUDA makes it easier to keep both sides in sync. Differential Revision: http://reviews.llvm.org/D20139 llvm-svn: 270107
* [CUDA] Enable fusing FP ops (-ffp-contract=fast) for CUDA by default.Artem Belevich2016-05-191-0/+32
| | | | | | | | | This matches default nvcc behavior and gives substantial performance boost on GPU where fmad is much cheaper compared to add+mul. Differential Revision: http://reviews.llvm.org/D20341 llvm-svn: 270094
* [CUDA] Fix flush-denormals.cu test so that it checks what it intends to CHECK.Justin Lebar2016-05-101-3/+5
| | | | | | | FileCheck does not evaluate plain CHECKs if you pass -check-prefix; you have to ask for it explicitly. llvm-svn: 269000
* [CUDA] Restrict init of local __shared__ variables to empty constructors only.Artem Belevich2016-05-092-33/+10
| | | | | | | | | | Allow only empty constructors for local __shared__ variables in a way identical to restrictions imposed on dynamic initializers for global variables on device. Differential Revision: http://reviews.llvm.org/D20039 llvm-svn: 268982
* [CUDA] Only __shared__ variables can be static local on device side.Artem Belevich2016-05-092-8/+8
| | | | | | | | | | According to CUDA programming guide (v7.5): > E.2.9.4: Within the body of a device or global function, only > shared variables may be declared with static storage class. Differential Revision: http://reviews.llvm.org/D20034 llvm-svn: 268962
* [CUDA] Make sure device-side __global__ functions are always visible.Artem Belevich2016-05-021-2/+11
| | | | | | | | | | | | __global__ functions are a special case in CUDA. Even when the symbol would normally not be externally visible according to C++ rules, they still must be visible in CUDA GPU object so host-side stub can launch them. Differential Revision: http://reviews.llvm.org/D19748 llvm-svn: 268299
* [CUDA] Add -fcuda-flush-denormals-to-zero.Justin Lebar2016-04-051-0/+23
| | | | | | | | | | | | | | | | | | Summary: Setting this flag causes all functions are annotated with the "nvvm-f32ftz" = "true" attribute. In addition, we annotate the module with "nvvm-reflect-ftz" set to 0 or 1, depending on whether -cuda-flush-denormals-to-zero is set. This is read by the NVVMReflect pass. Reviewers: tra, rnk Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D18671 llvm-svn: 265435
* [CUDA] Add -disable-llvm-passes to CodeGenCUDA/link-device-bitcode.cu. NFCJustin Lebar2016-03-301-4/+4
| | | | | | | | We already have this flag in most of the file, but we need it everywhere else, to disable the NVVMReflect pass, which we're explicitly checking doesn't run here. (Upcoming changes to llvm will cause it to be run.) llvm-svn: 264969
* [CUDA] Remove three obsolete CUDA cc1 flags.Justin Lebar2016-03-292-18/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: * -fcuda-target-overloads Previously unconditionally set to true by the driver. Necessary for correct functioning of the compiler -- our CUDA headers wrapper won't compile without this. * -fcuda-disable-target-call-checks Previously unconditionally set to true by the driver. Necessary to compile almost any external CUDA code -- almost all libraries assume that host+device code can call host or device functions. * -fcuda-allow-host-calls-from-host-device No effect when target overloading is enabled. Reviewers: tra Subscribers: rsmith, cfe-commits Differential Revision: http://reviews.llvm.org/D18416 llvm-svn: 264739
* [CUDA] Merge most of CodeGenCUDA/function-overload.cu into ↵Justin Lebar2016-03-231-230/+4
| | | | | | | | | | | | | | | | | | | | SemaCUDA/function-overload.cu. Summary: Previously we were using the codegen test to ensure that we choose the right overload. But we can do this within sema, with a bit of cleverness. I left the constructor/destructor checks in CodeGen, because these overloads (particularly on the destructors) are hard to check in Sema. Reviewers: tra Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D18386 llvm-svn: 264207
* Fixed test failure platforms with name mangling different from Linux.Artem Belevich2016-03-021-3/+4
| | | | | | | | * Run cc with -triple x86_64-linux-gnu to make symbol mangling predictable. * Use temporary file as a fake GPU input so its content does not interfere with pattern matching. llvm-svn: 262516
* [CUDA] Do not generate unnecessary runtime init code.Artem Belevich2016-03-021-0/+20
| | | | | | Differential Revision: http://reviews.llvm.org/D17780 llvm-svn: 262499
* [CUDA] Emit host-side 'shadows' for device-side global variablesArtem Belevich2016-03-022-7/+46
| | | | | | | | | | | | | ... and register them with CUDA runtime. This is needed for commonly used cudaMemcpy*() APIs that use address of host-side shadow to access their counterparts on device side. Fixes PR26340 Differential Revision: http://reviews.llvm.org/D17779 llvm-svn: 262498
* [CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.Justin Lebar2016-02-242-1/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is important for e.g. the following case: void sync() { __syncthreads(); } void foo() { do_something(); sync(); do_something_else(): } Without this change, if the optimizer does not inline sync() (which it won't because __syncthreads is also marked as noduplicate, for now anyway), it is free to perform optimizations on sync() that it would not be able to perform on __syncthreads(), because sync() is not marked as convergent. Similarly, we need a notion of convergent calls, since in the case when we can't statically determine a call's target(s), we need to know whether it's safe to perform optimizations around the call. This change is conservative; the optimizer will remove these attrs where it can, see r260318, r260319. Reviewers: majnemer Subscribers: cfe-commits, jhen, echristo, tra Differential Revision: http://reviews.llvm.org/D17056 llvm-svn: 261779
* [CUDA] Tweak attribute-based overload resolution to match nvcc behavior.Artem Belevich2016-02-121-17/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | This is an artefact of split-mode CUDA compilation that we need to mimic. HD functions are sometimes allowed to call H or D functions. Due to split compilation mode device-side compilation will not see host-only function and thus they will not be considered at all. For clang both H and D variants will become function overloads visible to compiler. Normally target attribute is considered only if C++ rules can not determine which function is better. However in this case we need to ignore functions that would not be present during current compilation phase before we apply normal overload resolution rules. Changes: * introduced another level of call preference to better describe possible call combinations. * removed WrongSide functions from consideration if the set contains SameSide function. * disabled H->D, D->H and G->H calls. These combinations are not allowed by CUDA and we were reluctantly allowing them to work around device-side calls to math functions in std namespace. We no longer need it after r258880. Differential Revision: http://reviews.llvm.org/D16870 llvm-svn: 260697
* [CUDA] Don't crash when trying to printf a non-scalar object.Justin Lebar2016-02-111-0/+17
| | | | | | | | | | | | | | Summary: We can't do the right thing, since there's no right thing to do, but at least we can not crash the compiler. Reviewers: majnemer, rnk Subscribers: cfe-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D17103 llvm-svn: 260479
* [CUDA] Do not allow dynamic initialization of global device side variables.Artem Belevich2016-02-021-0/+393
| | | | | | | | | | | | | | In general CUDA does not allow dynamic initialization of global device-side variables. One exception is that CUDA allows records with empty constructors as described in section E2.2.1 of CUDA 7.5 Programming guide. This patch applies initializer checks for all device-side variables. Empty constructors are accepted, but no code is generated for them. Differential Revision: http://reviews.llvm.org/D15305 llvm-svn: 259592
* [CUDA] Generate CUDA's printf alloca in its function's entry block.Justin Lebar2016-01-281-33/+23
| | | | | | | | | | | | | | | Summary: This is necessary to prevent llvm from generating stacksave intrinsics around this alloca. NVVM doesn't have a stack, and we don't handle said intrinsics. Reviewers: rnk, echristo Subscribers: cfe-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D16664 llvm-svn: 259122
* [CUDA] Don't generate aliases for static extern "C" functions.Justin Lebar2016-01-251-0/+17
| | | | | | | | | | | | | | Summary: These aliases are done to support inline asm, but there's nothing we can do: NVPTX doesn't support aliases. Reviewers: tra Subscribers: cfe-commits, jhen, echristo Differential Revision: http://reviews.llvm.org/D16501 llvm-svn: 258734
* [CUDA] Make printf work.Justin Lebar2016-01-232-0/+55
| | | | | | | | | | | | | | | | | | Summary: The code in CGCUDACall is largely based on a patch written by Eli Bendersky: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140324/210218.html That patch implemented an LLVM pass lowering printf to vprintf; this one does something similar, but in Clang codegen. Reviewers: echristo Subscribers: cfe-commits, jhen, tra, majnemer Differential Revision: http://reviews.llvm.org/D16372 llvm-svn: 258642
* [CUDA] Make vtable construction aware of host/device side of CUDA compilation.Artem Belevich2015-12-171-0/+61
| | | | | | | | | | | | | | | | | | | C++ emits vtables for classes that have key function present in the current TU. While we compile CUDA the fact that key function was found in this TU does not mean that we are going to generate code for it. E.g. vtable for a class with host-only methods should not (and can not) be generated on device side, because we'll never generate code for them during device-side compilation. This patch adds an extra CUDA-specific check during key method computation and filters out potential key methods that are not suitable for this side of CUDA compilation. When we codegen vtable, entries for unsuitable methods are set to null. Differential Revision: http://reviews.llvm.org/D15309 llvm-svn: 255911
* Allow linking multiple bitcode files.Artem Belevich2015-10-272-2/+32
| | | | | | | | | | | | | | | | | | Linking options for particular file depend on the option that specifies the file. Currently there are two: * -mlink-bitcode-file links in complete content of the specified file. * -mlink-cuda-bitcode links in only the symbols needed by current TU. Linked symbols are internalized. This bitcode linking mode is used to link device-specific bitcode provided by CUDA. Files are linked in order they are specified on command line. -mlink-cuda-bitcode replaces -fcuda-uses-libdevice flag. Differential Revision: http://reviews.llvm.org/D13913 llvm-svn: 251427
* [CUDA] __global__ functions should always be visible externally.Artem Belevich2015-09-231-6/+1
| | | | | | | | | Adjust __global__ functions with DiscardableODR linkage to use StrongODR linkage instead, so they are visible externally. Differential Revision: http://reviews.llvm.org/D13067 llvm-svn: 248400
* [CUDA] Allow function overloads in CUDA based on host/device attributes.Artem Belevich2015-09-221-0/+214
| | | | | | | | | | | | | | | | | | | The patch makes it possible to parse CUDA files that contain host/device functions with identical signatures, but different attributes without having to physically split source into host-only and device-only parts. This change is needed in order to parse CUDA header files that have a lot of name clashes with standard include files. Gory details are in design doc here: https://goo.gl/EXnymm Feel free to leave comments there or in this review thread. This feature is controlled with CC1 option -fcuda-target-overloads and is disabled by default. Differential Revision: http://reviews.llvm.org/D12453 llvm-svn: 248295
* [CUDA] Add implicit __attribute__((used)) to all __global__ functions.Artem Belevich2015-09-221-0/+15
| | | | | | | | | | This makes sure that we emit kernels that were instantiated from the host code and which would never be explicitly referenced by anything else on device side. Differential Revision: http://reviews.llvm.org/D11666 llvm-svn: 248293
* [CUDA] Postprocess bitcode linked in during device-side CUDA compilation.Artem Belevich2015-09-102-0/+94
| | | | | | | | Link in and internalize the symbols we need from supplied bitcode library. Differential Revision: http://reviews.llvm.org/D11664 llvm-svn: 247317
* [CUDA] Allow trivial constructors as initializer for __shared__ variables.Artem Belevich2015-09-101-0/+14
| | | | | | Differential Revision: http://reviews.llvm.org/D12739 llvm-svn: 247307
* [CUDA] Change initializer for CUDA device code based on CUDA documentation.Jingyue Wu2015-08-222-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: According to CUDA documentation, global variables declared with __device__, __constant__ can be initialized from host code, so mark them as externally initialized. Because __shared__ variables cannot have an initialization as part of their declaration and since the value maybe kept across different kernel invocation, the value of __shared__ is effectively undefined instead of zero initialized. Wrongly using zero initializer may cause illegitimate optimization, e.g. removing unused __constant__ variable because it's not updated in the device code and the value is initialized with zero. Test Plan: test/CodeGenCUDA/address-spaces.cu Patch by Xuetian Weng Reviewers: jholewinski, eliben, tra, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12241 llvm-svn: 245786
* Revert "[CUDA] Add implicit __attribute__((used)) to all __global__ functions."Daniel Jasper2015-08-111-15/+0
| | | | | | This is breaking internal test. I'll provide a reproduction. llvm-svn: 244583
* [CUDA] Add implicit __attribute__((used)) to all __global__ functions.Artem Belevich2015-08-101-0/+15
| | | | | | | | | This allows emitting kernels that were instantiated from the host code and which would never be explicitly referenced otherwise. Differential Revision: http://reviews.llvm.org/D11666 llvm-svn: 244501
* [cuda] Fixed test case failure on s390xArtem Belevich2015-05-111-2/+2
| | | | llvm-svn: 237007
* Fixed test failure on machines with 32-bit size_t.Artem Belevich2015-05-071-1/+1
| | | | llvm-svn: 236773
* [cuda] Include GPU binary into host object file and generate init/deinit code.Artem Belevich2015-05-071-1/+40
| | | | | | | | | | | | - added -fcuda-include-gpubinary option to incorporate results of device-side compilation into host-side one. - generate code to register GPU binaries and associated kernels with CUDA runtime and clean-up on exit. - added test case for init/deinit code generation. Differential Revision: http://reviews.llvm.org/D9507 llvm-svn: 236765
* [cuda] treat file scope __asm as __host__ and ignore it during device-side ↵Artem Belevich2015-04-271-0/+6
| | | | | | | | | | | | | | | | compilation. Currently clang emits file-scope asm during *both* host and device compilation modes which is usually a wrong thing to do. There's no way to attach any attribute to an __asm statement, so there's no way to differentiate between host-side and device-side file-scope asm. This patch makes clang to match nvcc behavior and emit file-scope-asm only during host-side compilation. Differential Revision: http://reviews.llvm.org/D9270 llvm-svn: 235905
* [cuda] Allow using integral non-type template parameters as launch_bounds ↵Artem Belevich2015-04-211-0/+51
| | | | | | | | | | | | | | | | attribute arguments. - Changed CUDALaunchBounds arguments from integers to Expr* so they can be saved in AST for instantiation. - Added support for template instantiation of launch_bounds attrubute. - Moved evaluation of launch_bounds arguments to NVPTXTargetCodeGenInfo:: SetTargetAttributes() where it can be done after template instantiation. - Added a warning on negative launch_bounds arguments. - Amended test cases. Differential Revision: http://reviews.llvm.org/D8985 llvm-svn: 235452
* [cuda] Added support for CUDA built-in variables.Artem Belevich2015-04-211-0/+28
| | | | | | | | | | | | | | | | | Added cuda_builtin_vars.h which implements built-in CUDA variables using __declattr(property). Fields of built-in variables (except for warpSize) are implemented using __declattr(property) which replaces read/write of a member field with a call to a getter/setter member function, in this case with appropriate NVPTX builtin. Added a test case to check diagnostics on attempt to construct or improperly access a built-in variable. Differential Revision: http://reviews.llvm.org/D9064 llvm-svn: 235448
* Revert r235398 "[cuda] Added support for CUDA built-in variables."Artem Belevich2015-04-211-28/+0
| | | | | | r235398 was causing buildbot break due to missing Makefile changes. llvm-svn: 235401
* [cuda] Added support for CUDA built-in variables.Artem Belevich2015-04-211-0/+28
| | | | | | | | | | | | | | | | | Added cuda_builtin_vars.h which implements built-in CUDA variables using __declattr(property). Fields of built-in variables (except for warpSize) are implemented using __declattr(property) which replaces read/write of a member field with a call to a getter/setter member function, in this case with appropriate NVPTX builtin. Added a test case to check diagnostics on attempt to construct or improperly access a built-in variable. Differential Revision: http://reviews.llvm.org/D9064 llvm-svn: 235398
* Fix addrspace when emitting constructors of static local variablesJingyue Wu2015-03-251-0/+17
| | | | | | | | | | | | | | | | | | | | Summary: Due to CUDA's implicit address space casting, the type of a static local variable may be more specific (i.e. with address space qualifiers) than the type expected by the constructor. Emit an addrspacecast in that case. Test Plan: Clang used to crash on the added test. Reviewers: nlewycky, pcc, eliben, rsmith Reviewed By: eliben, rsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8575 llvm-svn: 233208
* Test case updates for explicit type parameter to the gep operatorDavid Blaikie2015-03-131-1/+1
| | | | llvm-svn: 232187
* Update Clang tests to handle explicitly typed load changes in LLVM.David Blaikie2015-02-271-6/+6
| | | | llvm-svn: 230795
OpenPOWER on IntegriCloud