summaryrefslogtreecommitdiffstats
path: root/clang/test/OpenMP/nvptx_target_codegen.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [OPENMP]Fix PR41826: symbols visibility in device code.Alexey Bataev2019-11-251-1/+1
| | | | | | | | | | | | | | | | | | Summary: Currently, we ignore all locality attributes/info when building for the device and thus all symblos are externally visible and can be preemted at the runtime. It may lead to incorrect results. We need to follow the same logic, compiler uses for static/pie builds. But in some cases changing of dso locality may lead to problems with codegen, so instead mark external symbols as hidden instead in the device code. Reviewers: jdoerfert Subscribers: guansong, caomhin, kkwli0, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D70549
* [OPENMP]Use different addresses for zeroed thread_id/bound_id.Alexey Bataev2019-10-161-1/+3
| | | | | | | | When the parallel region is called directly in the sequential region, the zeroed tid/bound id are used. But they must point to the different memory locations as the parameters are marked as noalias. llvm-svn: 375017
* [OPENMP]Use the attributes for dso locality when building for device.Alexey Bataev2019-05-211-1/+1
| | | | | | | | | Currently, we ignore all dso locality attributes/info when building for the device and thus all symblos are externally visible and can be preemted at the runtime. It may lead to incorrect results. We need to follow the same logic, compiler uses for static/pie builds. llvm-svn: 361283
* [OPENMP][NVPTX]Mark more functions as always_inline for betterAlexey Bataev2019-05-211-3/+3
| | | | | | | | | | | performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269
* [OPENMP][NVPTX]Run parallel regions with num_threads clauses in SPMDAlexey Bataev2019-04-151-41/+27
| | | | | | | | | | mode. After the previous patch with the more correct handling of the number of threads in parallel regions, the parallel regions with num_threads clauses can be executed in SPMD mode. llvm-svn: 358445
* [OPENMP][NVPTX]Use new functions from the runtime library.Alexey Bataev2019-01-041-1/+1
| | | | | | Updated codegen to use the new functions from the runtime library. llvm-svn: 350415
* [OPENMP][NVPTX]Use __kmpc_barrier_simple_spmd(nullptr, 0) instead ofAlexey Bataev2019-01-031-22/+22
| | | | | | | | | | nvvm_barrier0. Use runtime functions instead of the direct call to the nvvm intrinsics. It allows to prevent some dangerous LLVM optimizations, that breaks the code for the NVPTX target. llvm-svn: 350328
* [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytesAlexey Bataev2018-12-181-1/+1
| | | | | | | | | | | buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540
* [OPENMP][NVPTX] Fix globalization of the mapped array sections.Alexey Bataev2018-12-061-19/+52
| | | | | | | | | If the array section is based on pointer and this sections is mapped in target region + then it is used in the inner parallel region, it also must be globalized as the pointer itself is passed by value, not by reference. llvm-svn: 348492
* [OPENMP][NVPTX]Call get __kmpc_global_thread_num in worker afterAlexey Bataev2018-11-291-1/+1
| | | | | | | | | initialization. Function __kmpc_global_thread_num should be called only after initialization, not earlier. llvm-svn: 347919
* [OPENMP][NVPTX]Emit default locations with the correct Exec|RuntimeAlexey Bataev2018-11-261-24/+26
| | | | | | | | | | | modes. If the region is inside target|teams|distribute region, we can emit the locations with the correct info for execution mode and runtime mode. Patch adds this ability to the NVPTX codegen to help the optimizer to produce better code. llvm-svn: 347583
* [OPENMP][NVPTX]Use __kmpc_data_sharing_coalesced_push_stack function.Alexey Bataev2018-11-021-1/+1
| | | | | | | | Coalesced memory access requires use of the new function `__kmpc_data_sharing_coalesced_push_stack` instead of the `__kmpc_data_sharing_push_stack`. llvm-svn: 345991
* [OPENMP][NVPTX]Reduce memory usage in orphaned functions.Alexey Bataev2018-10-121-3/+9
| | | | | | | | | | | | if the function has globalized variables and called in context of target/teams/distribute regions, it does not need to globalize 32 copies of the same variables for memory coalescing, it is enough to have just one copy, because there is parallel region. Patch does this by adding call for `__kmpc_parallel_level` function and checking its return value. If the code sees that the parallel level is 0, then only one variable is allocated, not 32. llvm-svn: 344356
* [OPENMP][NVPTX] Support memory coalescing for globalized variables.Alexey Bataev2018-10-091-5/+10
| | | | | | | | | Added support for memory coalescing for better performance for globalized variables. From now on all the globalized variables are represented as arrays of 32 elements and each thread accesses these elements using `tid & 31` as index. llvm-svn: 344049
* [OpenMP][NVPTX] Simplify codegen for orphaned parallel, NFCI.Jonas Hahnfeld2018-10-021-8/+0
| | | | | | | | | | | Worker threads fork off to the compiler generated worker function directly after entering the kernel function. Hence, there is no need to check whether the current thread is the master if we are outside of a parallel region (neither SPMD nor parallel_level > 0). Differential Revision: https://reviews.llvm.org/D52732 llvm-svn: 343618
* [OPENMP][NVPTX] Add support for lightweight runtime.Alexey Bataev2018-08-291-3/+15
| | | | | | | | If the target construct can be executed in SPMD mode + it is a loop based directive with static scheduling, we can use lightweight runtime support. llvm-svn: 340953
* [OPENMP] ThreadId in serialized parallel regions is 0.Alexey Bataev2018-07-251-2/+2
| | | | | | | | The first argument for the parallel outlined functions, called as serialized parallel regions, should be a pointer to the global thread id that always is 0. llvm-svn: 337957
* [OPENMP, NVPTX] Globalize only captured variables.Alexey Bataev2018-07-161-18/+20
| | | | | | | Sometimes we can try to globalize non-variable declarations, which may lead to compiler crash. llvm-svn: 337191
* [OPENMP, NVPTX] Fixed codegen for orphaned parallel region.Alexey Bataev2018-05-251-9/+4
| | | | | | | | | | | | | | If orphaned parallel region is found, the next code must be emitted: ``` if(__kmpc_is_spmd_exec_mode() || __kmpc_parallel_level(loc, gtid)) Serialized execution. else if (IsMasterThread()) Prepare and signal worker. else Outined function call. ``` llvm-svn: 333301
* [OPENMP, NVPTX] Add check for SPMD mode in orphaned parallel directives.Alexey Bataev2018-05-161-0/+10
| | | | | | | | If the orphaned directive is executed in SPMD mode, we need to emit the check for the SPMD mode and run the orphaned parallel directive in sequential mode. llvm-svn: 332467
* [OPENMP, NVPTX] Do not globalize variables with reference/pointer types.Alexey Bataev2018-05-151-9/+9
| | | | | | | | In generic data-sharing mode we do not need to globalize variables/parameters of reference/pointer types. They already are placed in the global memory. llvm-svn: 332380
* [OPENMP, NVPTX] Added support for L2 parallelism.Alexey Bataev2018-05-071-18/+70
| | | | | | | Added initial codegen for level 2, 3 etc. parallelism. Currently, all the second, the third etc. parallel regions will run sequentially. llvm-svn: 331642
* [OPENMP, NVPTX] Fix codegen for the teams reduction.Alexey Bataev2018-04-061-12/+12
| | | | | | | Added NUW flags for all the add|mul|sub operations + replaced sdiv by udiv as we operate on unsigned values only (addresses, converted to integers) llvm-svn: 329411
* [OpenMP] Adjust arguments of nvptx runtime functionsJonas Hahnfeld2017-11-221-6/+6
| | | | | | | | | | | | | In the future the compiler will analyze whether the OpenMP runtime needs to be (fully) initialized and avoid that overhead if possible. The functions already take an argument to transfer that information to the runtime, so pass in the default value 1. (This is needed for binary compatibility with libomptarget-nvptx currently being upstreamed.) Differential Revision: https://reviews.llvm.org/D40354 llvm-svn: 318836
* [OPENMP] Allow value of thread local variables in target regions.Alexey Bataev2017-05-241-18/+21
| | | | | | | | | If the variable is marked as TLS variable and target device does not support TLS, the error is emitted for the variable even if it is not used in target regions. Patch fixes this and allows to use the values of the TLS variables in target regions. llvm-svn: 303768
* [OpenMP] Codegen for the 'target parallel' directive on the NVPTX device.Arpith Chacko Jacob2017-01-181-12/+20
| | | | | | | | | | | | | | | | | | This patch adds codegen for the 'target parallel' directive on the NVPTX device. We term offload OpenMP directives such as 'target parallel' and 'target teams distribute parallel for' as SPMD constructs. SPMD constructs, in contrast to Generic ones like the plain 'target', can never contain a serial region. SPMD constructs can be handled more efficiently on the GPU and do not require the Warp Loop of the Generic codegen scheme. This patch adds SPMD codegen support for 'target parallel' on the NVPTX device and can be reused for other SPMD constructs. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28755 llvm-svn: 292428
* [OpenMP] Update target codegen for NVPTX device.Arpith Chacko Jacob2017-01-051-161/+193
| | | | | | | | | | | | | | | | | This patch includes updates for codegen of the target region for the NVPTX device. It moves initializers from the compiler to the runtime and updates the worker loop to assume parallel work is retrieved from the runtime. A subsequent patch will update the codegen to retrieve the parallel work using calls to the runtime. It includes the removal of the inline attribute for the worker loop and disabling debug info in it. This allows codegen for a target directive and serial execution on the NVPTX device. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28125 llvm-svn: 291121
* Reverting commit r290983 while debugging test failure on windows.Arpith Chacko Jacob2017-01-041-211/+161
| | | | llvm-svn: 290989
* [OpenMP] Update target codegen for NVPTX device.Arpith Chacko Jacob2017-01-041-161/+211
| | | | | | | | | | | | | | | | | This patch includes updates for codegen of the target region for the NVPTX device. It moves initializers from the compiler to the runtime and updates the worker loop to assume parallel work is retrieved from the runtime. A subsequent patch will update the codegen to retrieve the parallel work using calls to the runtime. It includes the removal of the inline attribute for the worker loop and disabling debug info in it. This allows codegen for a target directive and serial execution on the NVPTX device. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28125 llvm-svn: 290983
* [OPENMP] Fix for PR31417: assert failure when compiling trivial openmpAlexey Bataev2016-12-221-18/+19
| | | | | | | | | | program Offload related code is not quite ready yet, but some simple examples must not crash the compiler. Patch fixes the problem in offloading code with exceptions. llvm-svn: 290364
* [OpenMP] Use fopenmp prefix for all options introduced by the offloading ↵Samuel Antao2016-06-301-4/+4
| | | | | | | | | | | | | | implementation. Summary: This patch changes the options used by offloading to start with -fopenmp instead of -fomp. This makes the option naming more consistent and materializes a suggestion by Richard Smith in http://reviews.llvm.org/D9888. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, ABataev Subscribers: kkwli0, cfe-commits, caomhin Differential Revision: http://reviews.llvm.org/D21841 llvm-svn: 274283
* [OpenMP] Base support for target directive codegen on NVPTX device.Arpith Chacko Jacob2016-03-221-0/+581
| | | | | | | | | | | | | | Summary: This patch adds base support for codegen of the target directive on the NVPTX device. Reviewers: ABataev Differential Revision: http://reviews.llvm.org/D17877 Reworked test case after buildbot failure on windows. Updated patch to integrate r263837 and test case nvptx_target_firstprivate_codegen.cpp. llvm-svn: 264018
* Revert r263783 as buildbot failure is being investigated.Arpith Chacko Jacob2016-03-181-587/+0
| | | | llvm-svn: 263784
* [OpenMP] Base support for target directive codegen on NVPTX device.Arpith Chacko Jacob2016-03-181-0/+587
| | | | | | | | | | | | | Summary: Reworked test case after buildbot failure on windows. This patch adds base support for codegen of the target directive on the NVPTX device. Reviewers: ABataev Differential Revision: http://reviews.llvm.org/D17877 llvm-svn: 263783
* Revert commit http://reviews.llvm.org/D17877 to fix tests on x86.Arpith Chacko Jacob2016-03-151-587/+0
| | | | llvm-svn: 263589
* [OpenMP] Base support for target directive codegen on NVPTX device.Arpith Chacko Jacob2016-03-151-0/+587
| | | | | | | | | | | Summary: This patch adds base support for codegen of the target directive on the NVPTX device. Reviewers: ABataev Differential Revision: http://reviews.llvm.org/D17877 llvm-svn: 263587
* Reverted http://reviews.llvm.org/D17877 to fix tests.Arpith Chacko Jacob2016-03-151-587/+0
| | | | llvm-svn: 263555
* [OpenMP] Base support for target directive codegen on NVPTX device.Arpith Chacko Jacob2016-03-151-0/+587
Summary: This patch adds base support for codegen of the target directive on the NVPTX device. Reviewers: ABataev Differential Revision: http://reviews.llvm.org/D17877 llvm-svn: 263552
OpenPOWER on IntegriCloud