summaryrefslogtreecommitdiffstats
path: root/clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [OPENMP][NVPTX]Mark more functions as always_inline for betterAlexey Bataev2019-05-211-6/+11
| | | | | | | | | | | performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269
* [OPENMP][NVPTX]Run combined constructs with if clause in SPMD mode.Alexey Bataev2019-04-171-13/+6
| | | | | | | | All target-parallel-based constructs can be run in SPMD mode from now on. Even if num_threads clauses or if clauses are used, such constructs can be executed in SPMD mode. llvm-svn: 358595
* [OPENMP][NVPTX]Use new functions from the runtime library.Alexey Bataev2019-01-041-2/+2
| | | | | | Updated codegen to use the new functions from the runtime library. llvm-svn: 350415
* [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytesAlexey Bataev2018-12-181-1/+1
| | | | | | | | | | | buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540
* [OPENMP][NVPTX]Emit correct reduction code for teams/parallelAlexey Bataev2018-11-161-1/+1
| | | | | | | | | | | reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches. llvm-svn: 347081
* [OPENMP][NVPTX]Allow to use shared memory for theAlexey Bataev2018-11-091-5/+3
| | | | | | | | | | target|teams|distribute variables. If the total size of the variables, declared in target|teams|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory. llvm-svn: 346507
* [OPENMP][NVPTX]Improve emission of the globalized variables forAlexey Bataev2018-11-021-3/+10
| | | | | | | | | | | | | | | | | | | target/teams/distribute regions. Target/teams/distribute regions exist for all the time the kernel is executed. Thus, if the variable is declared in their context and then escape it, we can allocate global memory statically instead of allocating it dynamically. Patch captures all the globalized variables in target/teams/distribute contexts, merges them into the records, one per each target region. Those records are then joined into the union, one per compilation unit (to save the global memory). Those units are organized into 2 x dimensional arrays, where the first dimension is the number of blocks per SM and the second one is the number of SMs. Runtime functions manage this global memory space between the executing teams. llvm-svn: 345978
* [OPENMP][NVPTX]Reduce memory use for globalized vars inAlexey Bataev2018-10-111-5/+2
| | | | | | | | | | | target/teams/distribute regions. Previously introduced globalization scheme that uses memory coalescing scheme may increase memory usage fr the variables that are devlared in target/teams/distribute contexts. We don't need 32 copies of such variables, just 1. Patch reduces memory use in this case. llvm-svn: 344273
* [OPENMP][NVPTX] Support memory coalescing for globalized variables.Alexey Bataev2018-10-091-2/+5
| | | | | | | | | Added support for memory coalescing for better performance for globalized variables. From now on all the globalized variables are represented as arrays of 32 elements and each thread accesses these elements using `tid & 31` as index. llvm-svn: 344049
* [OPENMP, NVPTX] Do not globalize local variables in parallel regions.Alexey Bataev2018-07-091-2/+2
| | | | | | | | | | In generic data-sharing mode we are allowed to not globalize local variables that escape their declaration context iff they are declared inside of the parallel region. We can do this because L2 parallel regions are executed sequentially and, thus, we do not need to put shared local variables in the global memory. llvm-svn: 336567
* [OPENMP, NVPTX] Reduce the number of the globalized variables.Alexey Bataev2018-06-261-0/+51
Patch tries to make better analysis of the variables that should be globalized. From now, instead of all parallel directives it will check only distribute parallel .. directives and check only for firstprivte/lastprivate variables if they must be globalized. llvm-svn: 335632
OpenPOWER on IntegriCloud