summaryrefslogtreecommitdiffstats
path: root/clang/test/OpenMP/nvptx_data_sharing.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [OPENMP][NVPTX]Mark more functions as always_inline for betterAlexey Bataev2019-05-211-1/+1
| | | | | | | | | | | performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269
* [OPENMP][NVPTX]Use new functions from the runtime library.Alexey Bataev2019-01-041-2/+2
| | | | | | Updated codegen to use the new functions from the runtime library. llvm-svn: 350415
* [OPENMP][NVPTX]Use __kmpc_barrier_simple_spmd(nullptr, 0) instead ofAlexey Bataev2019-01-031-5/+5
| | | | | | | | | | nvvm_barrier0. Use runtime functions instead of the direct call to the nvvm intrinsics. It allows to prevent some dangerous LLVM optimizations, that breaks the code for the NVPTX target. llvm-svn: 350328
* [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytesAlexey Bataev2018-12-181-1/+1
| | | | | | | | | | | buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540
* [OPENMP][NVPTX]Emit correct reduction code for teams/parallelAlexey Bataev2018-11-161-1/+1
| | | | | | | | | | | reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches. llvm-svn: 347081
* [OPENMP][NVPTX]Allow to use shared memory for theAlexey Bataev2018-11-091-5/+6
| | | | | | | | | | target|teams|distribute variables. If the total size of the variables, declared in target|teams|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory. llvm-svn: 346507
* [OPENMP][NVPTX]Improve emission of the globalized variables forAlexey Bataev2018-11-021-2/+11
| | | | | | | | | | | | | | | | | | | target/teams/distribute regions. Target/teams/distribute regions exist for all the time the kernel is executed. Thus, if the variable is declared in their context and then escape it, we can allocate global memory statically instead of allocating it dynamically. Patch captures all the globalized variables in target/teams/distribute contexts, merges them into the records, one per each target region. Those records are then joined into the union, one per compilation unit (to save the global memory). Those units are organized into 2 x dimensional arrays, where the first dimension is the number of blocks per SM and the second one is the number of SMs. Runtime functions manage this global memory space between the executing teams. llvm-svn: 345978
* [OPENMP][NVPTX]Reduce memory usage in target region.Alexey Bataev2018-10-121-9/+3
| | | | | | | Additional reduction of the global memory usage in the target regions without parallel regions. llvm-svn: 344413
* [OPENMP][NVPTX] Support memory coalescing for globalized variables.Alexey Bataev2018-10-091-3/+9
| | | | | | | | | Added support for memory coalescing for better performance for globalized variables. From now on all the globalized variables are represented as arrays of 32 elements and each thread accesses these elements using `tid & 31` as index. llvm-svn: 344049
* [OpenMP] Initialize data sharing stack for SPMD caseGheorghe-Teodor Bercea2018-07-131-1/+1
| | | | | | | | | | | | | | Summary: In the SPMD case, we need to initialize the data sharing and globalization infrastructure. This covers the case when an SPMD region calls a function in a different compilation unit. Reviewers: ABataev, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D49188 llvm-svn: 337015
* [OPENMP, NVPTX] Do not globalize local variables in parallel regions.Alexey Bataev2018-07-091-4/+3
| | | | | | | | | | In generic data-sharing mode we are allowed to not globalize local variables that escape their declaration context iff they are declared inside of the parallel region. We can do this because L2 parallel regions are executed sequentially and, thus, we do not need to put shared local variables in the global memory. llvm-svn: 336567
* [OPENMP, NVPTX] Fix linkage of the global entries.Alexey Bataev2018-05-081-1/+1
| | | | | | | The linkage of the global entries must be weak to enable support of redefinition of the same target regions in multiple compilation units. llvm-svn: 331768
* [OpenMP][Clang] Add call to global data sharing stack initialization on the ↵Gheorghe-Teodor Bercea2018-03-221-0/+5
| | | | | | | | | | | | | | | | workers side Summary: The workers also need to initialize the global stack. The call to the initialization function needs to happen after the kernel_init() function is called by the master. This ensures that the per-team data structures of the runtime have been initialized. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D44749 llvm-svn: 328219
* [OPENMP, NVPTX] Globalization of the private redeclarations.Alexey Bataev2018-03-201-1/+12
| | | | | | | | If the generic codegen is enabled and private copy of the original variable escapes the declaration context, this private copy should be globalized just like it was the original variable. llvm-svn: 327985
* [OpenMP] Add OpenMP data sharing infrastructure using global memoryGheorghe-Teodor Bercea2018-03-141-0/+91
| | | | | | | | | | | | | | | | | Summary: This patch handles the Clang code generation phase for the OpenMP data sharing infrastructure. TODO: add a more detailed description. Reviewers: ABataev, carlo.bertolli, caomhin, hfinkel, Hahnfeld Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D43660 llvm-svn: 327513
* [OpenMP] Remove implicit data sharing code gen that aims to use device ↵Gheorghe-Teodor Bercea2018-03-071-57/+0
| | | | | | | | | | | | | | | | shared memory Summary: Remove this scheme for now since it will be covered by another more generic scheme using global memory. This code will be worked into an optimization for the generic data sharing scheme. Removing this completely and then adding it via future patches will make all future data sharing patches cleaner. Reviewers: ABataev, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D43625 llvm-svn: 326948
* [OpenMP] Further adjustments of nvptx runtime functionsJonas Hahnfeld2017-12-271-2/+2
| | | | | | | | Pass in default value of 1, similar to previous commit r318836. Differential Revision: https://reviews.llvm.org/D41012 llvm-svn: 321486
* [OpenMP] Add function attribute for triggering data sharing.Gheorghe-Teodor Bercea2017-12-121-3/+8
| | | | | | | | | | | | | | | | Summary: The backend should only emit data sharing code for the cases where it is needed. A new function attribute is used by Clang to enable data sharing only for the cases where OpenMP semantics require it and there are variables that need to be shared. Reviewers: hfinkel, Hahnfeld, ABataev, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: cfe-commits, jholewinski Differential Revision: https://reviews.llvm.org/D41123 llvm-svn: 320527
* Fix test/OpenMP/nvptx_data_sharing.cppJonas Hahnfeld2017-11-211-2/+2
| | | | | | This was an oversight that stayed in the test from development. llvm-svn: 318779
* [OpenMP] Add implicit data sharing support when offloading to NVIDIA GPUs ↵Gheorghe-Teodor Bercea2017-11-211-0/+52
using OpenMP device offloading Summary: This patch is part of the development effort to add support in the current OpenMP GPU offloading implementation for implicitly sharing variables between a target region executed by the team master thread and the worker threads within that team. This patch is the first of three required for successfully performing the implicit sharing of master thread variables with the worker threads within a team. The remaining two patches are: - Patch D38978 to the LLVM NVPTX backend which ensures the lowering of shared variables to an device memory which allows the sharing of references; - Patch (coming soon) is a patch to libomptarget runtime library which ensures that a list of references to shared variables is properly maintained. A simple code snippet which illustrates an implicit data sharing situation is as follows: ``` #pragma omp target { // master thread only int v; #pragma omp parallel { // worker threads // use v } } ``` Variable v is implicitly shared from the team master thread which executes the code in between the target and parallel directives. The worker threads must operate on the latest version of v, including any updates performed by the master. The code generated in this patch relies on the LLVM NVPTX patch (mentioned above) which prevents v from being lowered in the thread local memory of the master thread thus making the reference to this variable un-shareable with the workers. This ensures that the code generated by this patch is correct. Since the parallel region is outlined the passing of arguments to the outlined regions must preserve the original order of arguments. The runtime therefore maintains a list of references to shared variables thus ensuring their passing in the correct order. The passing of arguments to the outlined parallel function is performed in a separate function which the data sharing infrastructure constructs in this patch. The function is inlined when optimizations are enabled. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, Hahnfeld, ABataev, caomhin Reviewed By: ABataev Subscribers: cfe-commits, jholewinski Differential Revision: https://reviews.llvm.org/D38976 llvm-svn: 318773
OpenPOWER on IntegriCloud