summaryrefslogtreecommitdiffstats
path: root/clang/test/OpenMP/nvptx_parallel_codegen.cpp
Commit message (Collapse)AuthorAgeFilesLines
* OpenMP: Add convergent to more runtime functionsMatt Arsenault2019-10-271-6/+10
| | | | | Several of these other functions are probably also convergent, but these two seem obviously convergent.
* [OPENMP][NVPTX]Fix critical region codegen.Alexey Bataev2019-08-261-1/+2
| | | | | | | | | | | | | | | | | Summary: Previously critical regions were emitted with the barrier making it a worksharing construct though it is not. Also, it leads to incorrect behavior in Cuda9+. Patch fixes this problem. Reviewers: ABataev, jdoerfert Subscribers: jholewinski, guansong, cfe-commits, grokos Tags: #clang Differential Revision: https://reviews.llvm.org/D66673 llvm-svn: 369946
* [OPENMP][NVPTX]Mark barrier functions calls as convergent.Alexey Bataev2019-07-181-4/+4
| | | | | | | Added convergent attribute to the barrier functions calls for correct optimizations. llvm-svn: 366437
* [OPENMP][NVPTX]Mark more functions as always_inline for betterAlexey Bataev2019-05-211-3/+3
| | | | | | | | | | | performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269
* [OPENMP][NVPTX]Use new functions from the runtime library.Alexey Bataev2019-01-041-2/+2
| | | | | | Updated codegen to use the new functions from the runtime library. llvm-svn: 350415
* [OPENMP][NVPTX]Use __kmpc_barrier_simple_spmd(nullptr, 0) instead ofAlexey Bataev2019-01-031-12/+12
| | | | | | | | | | nvvm_barrier0. Use runtime functions instead of the direct call to the nvvm intrinsics. It allows to prevent some dangerous LLVM optimizations, that breaks the code for the NVPTX target. llvm-svn: 350328
* [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytesAlexey Bataev2018-12-181-1/+1
| | | | | | | | | | | buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540
* [OPENMP][NVPTX]Fixed emission of the critical region.Alexey Bataev2018-12-041-1/+7
| | | | | | | | | | | | | Critical regions in NVPTX are the constructs, which, generally speaking, are not supported by the NVPTX target. Instead we're using special technique to handle the critical regions. Currently they are supported only within the loop and all the threads in the loop must execute the same critical region. Inside of this special regions the regions still must be emitted as critical, to avoid possible data races between the teams + synchronization must use __kmpc_barrier functions. llvm-svn: 348272
* [OPENMP][NVPTX]Mark __kmpc_barrier functions as convergent.Alexey Bataev2018-12-041-2/+8
| | | | | | | | __kmpc_barrier runtime functions must be marked as convergent to prevent some dangerous optimizations. Also, for NVPTX target all barriers must be emitted as simple barriers. llvm-svn: 348271
* [OPENMP][NVPTX]Emit correct reduction code for teams/parallelAlexey Bataev2018-11-161-1/+1
| | | | | | | | | | | reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches. llvm-svn: 347081
* [OPENMP][NVPTX]Allow to use shared memory for theAlexey Bataev2018-11-091-5/+6
| | | | | | | | | | target|teams|distribute variables. If the total size of the variables, declared in target|teams|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory. llvm-svn: 346507
* [OPENMP][NVPTX]Improve emission of the globalized variables forAlexey Bataev2018-11-021-1/+12
| | | | | | | | | | | | | | | | | | | target/teams/distribute regions. Target/teams/distribute regions exist for all the time the kernel is executed. Thus, if the variable is declared in their context and then escape it, we can allocate global memory statically instead of allocating it dynamically. Patch captures all the globalized variables in target/teams/distribute contexts, merges them into the records, one per each target region. Those records are then joined into the union, one per compilation unit (to save the global memory). Those units are organized into 2 x dimensional arrays, where the first dimension is the number of blocks per SM and the second one is the number of SMs. Runtime functions manage this global memory space between the executing teams. llvm-svn: 345978
* [OPENMP][NVPTX]Reduce memory usage in target region.Alexey Bataev2018-10-121-5/+2
| | | | | | | Additional reduction of the global memory usage in the target regions without parallel regions. llvm-svn: 344413
* [OPENMP][NVPTX] Support memory coalescing for globalized variables.Alexey Bataev2018-10-091-2/+5
| | | | | | | | | Added support for memory coalescing for better performance for globalized variables. From now on all the globalized variables are represented as arrays of 32 elements and each thread accesses these elements using `tid & 31` as index. llvm-svn: 344049
* [OPENMP, NVPTX] Fix globalization of the variables passed to orphanedAlexey Bataev2018-06-211-0/+9
| | | | | | | | | | parallel region. If the current construct requires sharing of the local variable in the inner parallel region, this variable must be globalized to avoid runtime crash. llvm-svn: 335285
* [OPENMP, NVPTX] Initial support for L2 parallelism in SPMD mode.Alexey Bataev2018-05-101-0/+1
| | | | | | | | Added initial support for L2 parallelism in SPMD mode. Note, though, that the orphaned parallel directives are not currently supported in SPMD mode. llvm-svn: 332016
* [OPENMP, NVPTX] Small test fix, NFC.Alexey Bataev2018-05-071-2/+2
| | | | llvm-svn: 331654
* [OPENMP, NVPTX] Codegen for critical construct.Alexey Bataev2018-05-071-2/+24
| | | | | | Added correct codegen for the critical construct on NVPTX devices. llvm-svn: 331652
* [OPENMP] Emit names of the globals depending on target.Alexey Bataev2018-05-021-1/+11
| | | | | | | | Some symbols are not allowed to be used as names on some targets. Patch ries to unify the emission of the names of LLVM globals so they could be used on different targets. llvm-svn: 331358
* [OPENMP, NVPTX] Fix codegen for the teams reduction.Alexey Bataev2018-04-061-4/+4
| | | | | | | Added NUW flags for all the add|mul|sub operations + replaced sdiv by udiv as we operate on unsigned values only (addresses, converted to integers) llvm-svn: 329411
* [OpenMP] Add OpenMP data sharing infrastructure using global memoryGheorghe-Teodor Bercea2018-03-141-243/+232
| | | | | | | | | | | | | | | | | Summary: This patch handles the Clang code generation phase for the OpenMP data sharing infrastructure. TODO: add a more detailed description. Reviewers: ABataev, carlo.bertolli, caomhin, hfinkel, Hahnfeld Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D43660 llvm-svn: 327513
* [OpenMP] Remove implicit data sharing code gen that aims to use device ↵Gheorghe-Teodor Bercea2018-03-071-11/+11
| | | | | | | | | | | | | | | | shared memory Summary: Remove this scheme for now since it will be covered by another more generic scheme using global memory. This code will be worked into an optimization for the generic data sharing scheme. Removing this completely and then adding it via future patches will make all future data sharing patches cleaner. Reviewers: ABataev, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D43625 llvm-svn: 326948
* [OpenMP] Adjust arguments of nvptx runtime functionsJonas Hahnfeld2017-11-221-2/+2
| | | | | | | | | | | | | In the future the compiler will analyze whether the OpenMP runtime needs to be (fully) initialized and avoid that overhead if possible. The functions already take an argument to transfer that information to the runtime, so pass in the default value 1. (This is needed for binary compatibility with libomptarget-nvptx currently being upstreamed.) Differential Revision: https://reviews.llvm.org/D40354 llvm-svn: 318836
* [OpenMP] Add implicit data sharing support when offloading to NVIDIA GPUs ↵Gheorghe-Teodor Bercea2017-11-211-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | using OpenMP device offloading Summary: This patch is part of the development effort to add support in the current OpenMP GPU offloading implementation for implicitly sharing variables between a target region executed by the team master thread and the worker threads within that team. This patch is the first of three required for successfully performing the implicit sharing of master thread variables with the worker threads within a team. The remaining two patches are: - Patch D38978 to the LLVM NVPTX backend which ensures the lowering of shared variables to an device memory which allows the sharing of references; - Patch (coming soon) is a patch to libomptarget runtime library which ensures that a list of references to shared variables is properly maintained. A simple code snippet which illustrates an implicit data sharing situation is as follows: ``` #pragma omp target { // master thread only int v; #pragma omp parallel { // worker threads // use v } } ``` Variable v is implicitly shared from the team master thread which executes the code in between the target and parallel directives. The worker threads must operate on the latest version of v, including any updates performed by the master. The code generated in this patch relies on the LLVM NVPTX patch (mentioned above) which prevents v from being lowered in the thread local memory of the master thread thus making the reference to this variable un-shareable with the workers. This ensures that the code generated by this patch is correct. Since the parallel region is outlined the passing of arguments to the outlined regions must preserve the original order of arguments. The runtime therefore maintains a list of references to shared variables thus ensuring their passing in the correct order. The passing of arguments to the outlined parallel function is performed in a separate function which the data sharing infrastructure constructs in this patch. The function is inlined when optimizations are enabled. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, Hahnfeld, ABataev, caomhin Reviewed By: ABataev Subscribers: cfe-commits, jholewinski Differential Revision: https://reviews.llvm.org/D38976 llvm-svn: 318773
* [OpenMP] Basic support for a parallel directive in a target region on an ↵Arpith Chacko Jacob2017-01-101-0/+317
NVPTX device Summary: This patch introduces support for the execution of parallel constructs in a target region on the NVPTX device. Parallel regions must be in the lexical scope of the target directive. The master thread in the master warp signals parallel work for worker threads in worker warps on encountering a parallel region. Note: The patch does not yet support capture of arguments in a parallel region so the test cases are simple. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28145 llvm-svn: 291565
OpenPOWER on IntegriCloud