summaryrefslogtreecommitdiffstats
path: root/clang/test/OpenMP/nvptx_teams_codegen.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [OPENMP][NVPTX]Mark more functions as always_inline for betterAlexey Bataev2019-05-211-26/+22
| | | | | | | | | | | performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269
* [OPENMP][NVPTX]Use new functions from the runtime library.Alexey Bataev2019-01-041-4/+4
| | | | | | Updated codegen to use the new functions from the runtime library. llvm-svn: 350415
* [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytesAlexey Bataev2018-12-181-2/+2
| | | | | | | | | | | buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540
* [OPENMP][NVPTX]Emit correct reduction code for teams/parallelAlexey Bataev2018-11-161-2/+2
| | | | | | | | | | | reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches. llvm-svn: 347081
* [OPENMP][NVPTX]Allow to use shared memory for theAlexey Bataev2018-11-091-12/+14
| | | | | | | | | | target|teams|distribute variables. If the total size of the variables, declared in target|teams|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory. llvm-svn: 346507
* [OPENMP][NVPTX]Improve emission of the globalized variables forAlexey Bataev2018-11-021-4/+34
| | | | | | | | | | | | | | | | | | | target/teams/distribute regions. Target/teams/distribute regions exist for all the time the kernel is executed. Thus, if the variable is declared in their context and then escape it, we can allocate global memory statically instead of allocating it dynamically. Patch captures all the globalized variables in target/teams/distribute contexts, merges them into the records, one per each target region. Those records are then joined into the union, one per compilation unit (to save the global memory). Those units are organized into 2 x dimensional arrays, where the first dimension is the number of blocks per SM and the second one is the number of SMs. Runtime functions manage this global memory space between the executing teams. llvm-svn: 345978
* [OPENMP][NVPTX]Reduce memory usage in target region.Alexey Bataev2018-10-121-20/+8
| | | | | | | Additional reduction of the global memory usage in the target regions without parallel regions. llvm-svn: 344413
* [OPENMP][NVPTX] Support memory coalescing for globalized variables.Alexey Bataev2018-10-091-8/+20
| | | | | | | | | Added support for memory coalescing for better performance for globalized variables. From now on all the globalized variables are represented as arrays of 32 elements and each thread accesses these elements using `tid & 31` as index. llvm-svn: 344049
* [OPENMP][NVPTX] Fix emission of __kmpc_global_thread_num() for non-SPMDAlexey Bataev2018-10-051-2/+2
| | | | | | | | | mode. __kmpc_global_thread_num() should be called before initialization of the runtime. llvm-svn: 343857
* [OPENMP] General code improvements.Alexey Bataev2018-04-161-4/+4
| | | | llvm-svn: 330140
* [OPENMP, NVPTX] Emit correct thread id.Alexey Bataev2018-03-191-2/+2
| | | | | | | We emitted fake thread id for the outined function in NVPTX codegen. Patch adds emission of the real thread id. llvm-svn: 327867
* [OPENMP, NVPTX] Improve globalization of the variables captured by value.Alexey Bataev2018-03-151-4/+20
| | | | | | | | | | | | | If the variable is captured by value and the corresponding parameter in the outlined function escapes its declaration context, this parameter must be globalized. To globalize it we need to get the address of the original parameter, load the value, store it to the global address and use this global address instead of the original. Patch improves globalization for parallel|teams regions + functions in declare target regions. llvm-svn: 327654
* [OpenMP] Use fopenmp prefix for all options introduced by the offloading ↵Samuel Antao2016-06-301-8/+8
| | | | | | | | | | | | | | implementation. Summary: This patch changes the options used by offloading to start with -fopenmp instead of -fomp. This makes the option naming more consistent and materializes a suggestion by Richard Smith in http://reviews.llvm.org/D9888. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, ABataev Subscribers: kkwli0, cfe-commits, caomhin Differential Revision: http://reviews.llvm.org/D21841 llvm-svn: 274283
* [OPENMP] Pass scalar firstprivate vars by value.Alexey Bataev2016-05-171-17/+13
| | | | | | | | For better performance and to unify code with offloading part we pass scalar firstprivate values by value, instead of by reference. It will remove some extra copying operations. llvm-svn: 269751
* [OPENMP] Codegen for teams directive for NVPTXCarlo Bertolli2016-04-041-0/+136
This patch implements the teams directive for the NVPTX backend. It is different from the host code generation path as it: Does not call kmpc_fork_teams. All necessary teams and threads are started upon touching the target region, when launching a CUDA kernel, and their execution is coordinated through sequential and parallel regions within the target region. Does not call kmpc_push_num_teams even if a num_teams of thread_limit clause is present. Setting the number of teams and the thread limit is implemented by the nvptx-related runtime. Please note that I am now passing a Clang Expr * to emitPushNumTeams instead of the originally chosen llvm::Value * type. The reason for that is that I want to avoid emitting expressions for num_teams and thread_limit if they are not needed in the target region. http://reviews.llvm.org/D17963 llvm-svn: 265304
OpenPOWER on IntegriCloud