summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGenCUDA
Commit message (Collapse)AuthorAgeFilesLines
* [CUDA] Make sure device-side __global__ functions are always visible.Artem Belevich2016-05-021-2/+11
| | | | | | | | | | | | __global__ functions are a special case in CUDA. Even when the symbol would normally not be externally visible according to C++ rules, they still must be visible in CUDA GPU object so host-side stub can launch them. Differential Revision: http://reviews.llvm.org/D19748 llvm-svn: 268299
* [CUDA] Add -fcuda-flush-denormals-to-zero.Justin Lebar2016-04-051-0/+23
| | | | | | | | | | | | | | | | | | Summary: Setting this flag causes all functions are annotated with the "nvvm-f32ftz" = "true" attribute. In addition, we annotate the module with "nvvm-reflect-ftz" set to 0 or 1, depending on whether -cuda-flush-denormals-to-zero is set. This is read by the NVVMReflect pass. Reviewers: tra, rnk Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D18671 llvm-svn: 265435
* [CUDA] Add -disable-llvm-passes to CodeGenCUDA/link-device-bitcode.cu. NFCJustin Lebar2016-03-301-4/+4
| | | | | | | | We already have this flag in most of the file, but we need it everywhere else, to disable the NVVMReflect pass, which we're explicitly checking doesn't run here. (Upcoming changes to llvm will cause it to be run.) llvm-svn: 264969
* [CUDA] Remove three obsolete CUDA cc1 flags.Justin Lebar2016-03-292-18/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: * -fcuda-target-overloads Previously unconditionally set to true by the driver. Necessary for correct functioning of the compiler -- our CUDA headers wrapper won't compile without this. * -fcuda-disable-target-call-checks Previously unconditionally set to true by the driver. Necessary to compile almost any external CUDA code -- almost all libraries assume that host+device code can call host or device functions. * -fcuda-allow-host-calls-from-host-device No effect when target overloading is enabled. Reviewers: tra Subscribers: rsmith, cfe-commits Differential Revision: http://reviews.llvm.org/D18416 llvm-svn: 264739
* [CUDA] Merge most of CodeGenCUDA/function-overload.cu into ↵Justin Lebar2016-03-231-230/+4
| | | | | | | | | | | | | | | | | | | | SemaCUDA/function-overload.cu. Summary: Previously we were using the codegen test to ensure that we choose the right overload. But we can do this within sema, with a bit of cleverness. I left the constructor/destructor checks in CodeGen, because these overloads (particularly on the destructors) are hard to check in Sema. Reviewers: tra Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D18386 llvm-svn: 264207
* Fixed test failure platforms with name mangling different from Linux.Artem Belevich2016-03-021-3/+4
| | | | | | | | * Run cc with -triple x86_64-linux-gnu to make symbol mangling predictable. * Use temporary file as a fake GPU input so its content does not interfere with pattern matching. llvm-svn: 262516
* [CUDA] Do not generate unnecessary runtime init code.Artem Belevich2016-03-021-0/+20
| | | | | | Differential Revision: http://reviews.llvm.org/D17780 llvm-svn: 262499
* [CUDA] Emit host-side 'shadows' for device-side global variablesArtem Belevich2016-03-022-7/+46
| | | | | | | | | | | | | ... and register them with CUDA runtime. This is needed for commonly used cudaMemcpy*() APIs that use address of host-side shadow to access their counterparts on device side. Fixes PR26340 Differential Revision: http://reviews.llvm.org/D17779 llvm-svn: 262498
* [CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.Justin Lebar2016-02-242-1/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is important for e.g. the following case: void sync() { __syncthreads(); } void foo() { do_something(); sync(); do_something_else(): } Without this change, if the optimizer does not inline sync() (which it won't because __syncthreads is also marked as noduplicate, for now anyway), it is free to perform optimizations on sync() that it would not be able to perform on __syncthreads(), because sync() is not marked as convergent. Similarly, we need a notion of convergent calls, since in the case when we can't statically determine a call's target(s), we need to know whether it's safe to perform optimizations around the call. This change is conservative; the optimizer will remove these attrs where it can, see r260318, r260319. Reviewers: majnemer Subscribers: cfe-commits, jhen, echristo, tra Differential Revision: http://reviews.llvm.org/D17056 llvm-svn: 261779
* [CUDA] Tweak attribute-based overload resolution to match nvcc behavior.Artem Belevich2016-02-121-17/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | This is an artefact of split-mode CUDA compilation that we need to mimic. HD functions are sometimes allowed to call H or D functions. Due to split compilation mode device-side compilation will not see host-only function and thus they will not be considered at all. For clang both H and D variants will become function overloads visible to compiler. Normally target attribute is considered only if C++ rules can not determine which function is better. However in this case we need to ignore functions that would not be present during current compilation phase before we apply normal overload resolution rules. Changes: * introduced another level of call preference to better describe possible call combinations. * removed WrongSide functions from consideration if the set contains SameSide function. * disabled H->D, D->H and G->H calls. These combinations are not allowed by CUDA and we were reluctantly allowing them to work around device-side calls to math functions in std namespace. We no longer need it after r258880. Differential Revision: http://reviews.llvm.org/D16870 llvm-svn: 260697
* [CUDA] Don't crash when trying to printf a non-scalar object.Justin Lebar2016-02-111-0/+17
| | | | | | | | | | | | | | Summary: We can't do the right thing, since there's no right thing to do, but at least we can not crash the compiler. Reviewers: majnemer, rnk Subscribers: cfe-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D17103 llvm-svn: 260479
* [CUDA] Do not allow dynamic initialization of global device side variables.Artem Belevich2016-02-021-0/+393
| | | | | | | | | | | | | | In general CUDA does not allow dynamic initialization of global device-side variables. One exception is that CUDA allows records with empty constructors as described in section E2.2.1 of CUDA 7.5 Programming guide. This patch applies initializer checks for all device-side variables. Empty constructors are accepted, but no code is generated for them. Differential Revision: http://reviews.llvm.org/D15305 llvm-svn: 259592
* [CUDA] Generate CUDA's printf alloca in its function's entry block.Justin Lebar2016-01-281-33/+23
| | | | | | | | | | | | | | | Summary: This is necessary to prevent llvm from generating stacksave intrinsics around this alloca. NVVM doesn't have a stack, and we don't handle said intrinsics. Reviewers: rnk, echristo Subscribers: cfe-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D16664 llvm-svn: 259122
* [CUDA] Don't generate aliases for static extern "C" functions.Justin Lebar2016-01-251-0/+17
| | | | | | | | | | | | | | Summary: These aliases are done to support inline asm, but there's nothing we can do: NVPTX doesn't support aliases. Reviewers: tra Subscribers: cfe-commits, jhen, echristo Differential Revision: http://reviews.llvm.org/D16501 llvm-svn: 258734
* [CUDA] Make printf work.Justin Lebar2016-01-232-0/+55
| | | | | | | | | | | | | | | | | | Summary: The code in CGCUDACall is largely based on a patch written by Eli Bendersky: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140324/210218.html That patch implemented an LLVM pass lowering printf to vprintf; this one does something similar, but in Clang codegen. Reviewers: echristo Subscribers: cfe-commits, jhen, tra, majnemer Differential Revision: http://reviews.llvm.org/D16372 llvm-svn: 258642
* [CUDA] Make vtable construction aware of host/device side of CUDA compilation.Artem Belevich2015-12-171-0/+61
| | | | | | | | | | | | | | | | | | | C++ emits vtables for classes that have key function present in the current TU. While we compile CUDA the fact that key function was found in this TU does not mean that we are going to generate code for it. E.g. vtable for a class with host-only methods should not (and can not) be generated on device side, because we'll never generate code for them during device-side compilation. This patch adds an extra CUDA-specific check during key method computation and filters out potential key methods that are not suitable for this side of CUDA compilation. When we codegen vtable, entries for unsuitable methods are set to null. Differential Revision: http://reviews.llvm.org/D15309 llvm-svn: 255911
* Allow linking multiple bitcode files.Artem Belevich2015-10-272-2/+32
| | | | | | | | | | | | | | | | | | Linking options for particular file depend on the option that specifies the file. Currently there are two: * -mlink-bitcode-file links in complete content of the specified file. * -mlink-cuda-bitcode links in only the symbols needed by current TU. Linked symbols are internalized. This bitcode linking mode is used to link device-specific bitcode provided by CUDA. Files are linked in order they are specified on command line. -mlink-cuda-bitcode replaces -fcuda-uses-libdevice flag. Differential Revision: http://reviews.llvm.org/D13913 llvm-svn: 251427
* [CUDA] __global__ functions should always be visible externally.Artem Belevich2015-09-231-6/+1
| | | | | | | | | Adjust __global__ functions with DiscardableODR linkage to use StrongODR linkage instead, so they are visible externally. Differential Revision: http://reviews.llvm.org/D13067 llvm-svn: 248400
* [CUDA] Allow function overloads in CUDA based on host/device attributes.Artem Belevich2015-09-221-0/+214
| | | | | | | | | | | | | | | | | | | The patch makes it possible to parse CUDA files that contain host/device functions with identical signatures, but different attributes without having to physically split source into host-only and device-only parts. This change is needed in order to parse CUDA header files that have a lot of name clashes with standard include files. Gory details are in design doc here: https://goo.gl/EXnymm Feel free to leave comments there or in this review thread. This feature is controlled with CC1 option -fcuda-target-overloads and is disabled by default. Differential Revision: http://reviews.llvm.org/D12453 llvm-svn: 248295
* [CUDA] Add implicit __attribute__((used)) to all __global__ functions.Artem Belevich2015-09-221-0/+15
| | | | | | | | | | This makes sure that we emit kernels that were instantiated from the host code and which would never be explicitly referenced by anything else on device side. Differential Revision: http://reviews.llvm.org/D11666 llvm-svn: 248293
* [CUDA] Postprocess bitcode linked in during device-side CUDA compilation.Artem Belevich2015-09-102-0/+94
| | | | | | | | Link in and internalize the symbols we need from supplied bitcode library. Differential Revision: http://reviews.llvm.org/D11664 llvm-svn: 247317
* [CUDA] Allow trivial constructors as initializer for __shared__ variables.Artem Belevich2015-09-101-0/+14
| | | | | | Differential Revision: http://reviews.llvm.org/D12739 llvm-svn: 247307
* [CUDA] Change initializer for CUDA device code based on CUDA documentation.Jingyue Wu2015-08-222-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: According to CUDA documentation, global variables declared with __device__, __constant__ can be initialized from host code, so mark them as externally initialized. Because __shared__ variables cannot have an initialization as part of their declaration and since the value maybe kept across different kernel invocation, the value of __shared__ is effectively undefined instead of zero initialized. Wrongly using zero initializer may cause illegitimate optimization, e.g. removing unused __constant__ variable because it's not updated in the device code and the value is initialized with zero. Test Plan: test/CodeGenCUDA/address-spaces.cu Patch by Xuetian Weng Reviewers: jholewinski, eliben, tra, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12241 llvm-svn: 245786
* Revert "[CUDA] Add implicit __attribute__((used)) to all __global__ functions."Daniel Jasper2015-08-111-15/+0
| | | | | | This is breaking internal test. I'll provide a reproduction. llvm-svn: 244583
* [CUDA] Add implicit __attribute__((used)) to all __global__ functions.Artem Belevich2015-08-101-0/+15
| | | | | | | | | This allows emitting kernels that were instantiated from the host code and which would never be explicitly referenced otherwise. Differential Revision: http://reviews.llvm.org/D11666 llvm-svn: 244501
* [cuda] Fixed test case failure on s390xArtem Belevich2015-05-111-2/+2
| | | | llvm-svn: 237007
* Fixed test failure on machines with 32-bit size_t.Artem Belevich2015-05-071-1/+1
| | | | llvm-svn: 236773
* [cuda] Include GPU binary into host object file and generate init/deinit code.Artem Belevich2015-05-071-1/+40
| | | | | | | | | | | | - added -fcuda-include-gpubinary option to incorporate results of device-side compilation into host-side one. - generate code to register GPU binaries and associated kernels with CUDA runtime and clean-up on exit. - added test case for init/deinit code generation. Differential Revision: http://reviews.llvm.org/D9507 llvm-svn: 236765
* [cuda] treat file scope __asm as __host__ and ignore it during device-side ↵Artem Belevich2015-04-271-0/+6
| | | | | | | | | | | | | | | | compilation. Currently clang emits file-scope asm during *both* host and device compilation modes which is usually a wrong thing to do. There's no way to attach any attribute to an __asm statement, so there's no way to differentiate between host-side and device-side file-scope asm. This patch makes clang to match nvcc behavior and emit file-scope-asm only during host-side compilation. Differential Revision: http://reviews.llvm.org/D9270 llvm-svn: 235905
* [cuda] Allow using integral non-type template parameters as launch_bounds ↵Artem Belevich2015-04-211-0/+51
| | | | | | | | | | | | | | | | attribute arguments. - Changed CUDALaunchBounds arguments from integers to Expr* so they can be saved in AST for instantiation. - Added support for template instantiation of launch_bounds attrubute. - Moved evaluation of launch_bounds arguments to NVPTXTargetCodeGenInfo:: SetTargetAttributes() where it can be done after template instantiation. - Added a warning on negative launch_bounds arguments. - Amended test cases. Differential Revision: http://reviews.llvm.org/D8985 llvm-svn: 235452
* [cuda] Added support for CUDA built-in variables.Artem Belevich2015-04-211-0/+28
| | | | | | | | | | | | | | | | | Added cuda_builtin_vars.h which implements built-in CUDA variables using __declattr(property). Fields of built-in variables (except for warpSize) are implemented using __declattr(property) which replaces read/write of a member field with a call to a getter/setter member function, in this case with appropriate NVPTX builtin. Added a test case to check diagnostics on attempt to construct or improperly access a built-in variable. Differential Revision: http://reviews.llvm.org/D9064 llvm-svn: 235448
* Revert r235398 "[cuda] Added support for CUDA built-in variables."Artem Belevich2015-04-211-28/+0
| | | | | | r235398 was causing buildbot break due to missing Makefile changes. llvm-svn: 235401
* [cuda] Added support for CUDA built-in variables.Artem Belevich2015-04-211-0/+28
| | | | | | | | | | | | | | | | | Added cuda_builtin_vars.h which implements built-in CUDA variables using __declattr(property). Fields of built-in variables (except for warpSize) are implemented using __declattr(property) which replaces read/write of a member field with a call to a getter/setter member function, in this case with appropriate NVPTX builtin. Added a test case to check diagnostics on attempt to construct or improperly access a built-in variable. Differential Revision: http://reviews.llvm.org/D9064 llvm-svn: 235398
* Fix addrspace when emitting constructors of static local variablesJingyue Wu2015-03-251-0/+17
| | | | | | | | | | | | | | | | | | | | Summary: Due to CUDA's implicit address space casting, the type of a static local variable may be more specific (i.e. with address space qualifiers) than the type expected by the constructor. Emit an addrspacecast in that case. Test Plan: Clang used to crash on the added test. Reviewers: nlewycky, pcc, eliben, rsmith Reviewed By: eliben, rsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8575 llvm-svn: 233208
* Test case updates for explicit type parameter to the gep operatorDavid Blaikie2015-03-131-1/+1
| | | | llvm-svn: 232187
* Update Clang tests to handle explicitly typed load changes in LLVM.David Blaikie2015-02-271-6/+6
| | | | llvm-svn: 230795
* CUDA: Add option to allow host device functions to call host functionsJacques Pienaar2015-02-241-0/+32
| | | | | | Commiting code from review http://reviews.llvm.org/D7841 llvm-svn: 230385
* When generating llvm.used, we may need an addrspacecast instead of a bitcast.Justin Holewinski2015-02-021-0/+8
| | | | | | | | | | | | | | | | | | Summary: This is especially important for targets that use multiple address spaces, and commonly place global variables in address spaces other than zero. Fixes PR22383 Test Plan: New test case added: llvm-used.cu Reviewers: jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7345 llvm-svn: 227861
* IR: Make metadata typeless in assembly, clang sideDuncan P. N. Exon Smith2014-12-152-4/+4
| | | | | | Match LLVM changes from r224257. llvm-svn: 224259
* Move all CUDA testing inputs to Inputs/ subdirectory inside the tests.Eli Bendersky2014-04-287-6/+26
| | | | llvm-svn: 207453
* Add test case for r206302Eli Bendersky2014-04-151-0/+30
| | | | llvm-svn: 206303
* Proper handling of static local variables with address space qualifiers.Eli Bendersky2014-03-241-3/+69
| | | | | | | | Similar to the implementation for globals in r157167. Patch by Jingyue Wu. llvm-svn: 204677
* Remove the -cxx-abi command-line flag.Hans Wennborg2014-01-141-2/+2
| | | | | | | | | | | | | | | This makes the C++ ABI depend entirely on the target: MS ABI for -win32 triples, Itanium otherwise. It's no longer possible to do weird combinations. To be able to run a test with a specific ABI without constraining it to a specific triple, new substitutions are added to lit: %itanium_abi_triple and %ms_abi_triple can be used to get the current target triple adjusted to the desired ABI. For example, if the test suite is running with the i686-pc-win32 target, %itanium_abi_triple will expand to i686-pc-mingw32. Differential Revision: http://llvm-reviews.chandlerc.com/D2545 llvm-svn: 199250
* Update tests in preparation for using the MS ABI for Win32 targetsHans Wennborg2014-01-131-2/+2
| | | | | | | | | | In preparation for making the Win32 triple imply MS ABI mode, make all tests pass in this mode, or make them use the Itanium mode explicitly. Differential Revision: http://llvm-reviews.chandlerc.com/D2401 llvm-svn: 199130
* Fix test failures after addrspacecast added.Matt Arsenault2013-11-151-3/+3
| | | | | | Bitcasts between address spaces are no longer allowed. llvm-svn: 194765
* CHECK-LABEL-ify some code gen tests to improve diagnostic experience when ↵Stephen Lin2013-08-151-2/+2
| | | | | | tests fail. llvm-svn: 188447
* Use kernel metadata to differentiate between kernel and deviceJustin Holewinski2013-03-301-3/+7
| | | | | | functions for the NVPTX target. llvm-svn: 178418
* CUDA: give static storage class to __shared__ and __constant__Peter Collingbourne2012-08-281-0/+8
| | | | | | | | variables without a storage class within a function, to implement CUDA B.2.5: "__shared__ and __constant__ variables have implied static storage [duration]." llvm-svn: 162788
* CUDA: give correct address space to globals declared in functionsPeter Collingbourne2012-08-281-0/+4
| | | | llvm-svn: 162787
* Replace PTX back-end with NVPTX back-end in all places where Clang caresJustin Holewinski2012-05-242-8/+8
| | | | | | NV_CONTRIB llvm-svn: 157403
OpenPOWER on IntegriCloud