summaryrefslogtreecommitdiffstats
path: root/clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [OPENMP][NVPTX]Mark more functions as always_inline for betterAlexey Bataev2019-05-211-1/+1
| | | | | | | | | | | performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269
* [OPENMP][NVPTX]Use new functions from the runtime library.Alexey Bataev2019-01-041-2/+2
| | | | | | Updated codegen to use the new functions from the runtime library. llvm-svn: 350415
* [OPENMP][NVPTX]Use __kmpc_barrier_simple_spmd(nullptr, 0) instead ofAlexey Bataev2019-01-031-3/+3
| | | | | | | | | | nvvm_barrier0. Use runtime functions instead of the direct call to the nvvm intrinsics. It allows to prevent some dangerous LLVM optimizations, that breaks the code for the NVPTX target. llvm-svn: 350328
* [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytesAlexey Bataev2018-12-181-1/+1
| | | | | | | | | | | buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540
* [OPENMP][NVPTX]Emit correct reduction code for teams/parallelAlexey Bataev2018-11-161-1/+1
| | | | | | | | | | | reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches. llvm-svn: 347081
* [OPENMP][NVPTX]Allow to use shared memory for theAlexey Bataev2018-11-091-5/+6
| | | | | | | | | | target|teams|distribute variables. If the total size of the variables, declared in target|teams|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory. llvm-svn: 346507
* [OPENMP][NVPTX]Improve emission of the globalized variables forAlexey Bataev2018-11-021-2/+12
| | | | | | | | | | | | | | | | | | | target/teams/distribute regions. Target/teams/distribute regions exist for all the time the kernel is executed. Thus, if the variable is declared in their context and then escape it, we can allocate global memory statically instead of allocating it dynamically. Patch captures all the globalized variables in target/teams/distribute contexts, merges them into the records, one per each target region. Those records are then joined into the union, one per compilation unit (to save the global memory). Those units are organized into 2 x dimensional arrays, where the first dimension is the number of blocks per SM and the second one is the number of SMs. Runtime functions manage this global memory space between the executing teams. llvm-svn: 345978
* [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases.Gheorghe-Teodor Bercea2018-10-291-2/+24
| | | | | | | | | | | | | | Summary: This patch enables the choosing of the default schedule for parallel for loops even in non-SPMD cases. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53443 llvm-svn: 345507
* [NFC][OpenMP] Add new test for parallel for code generation.Gheorghe-Teodor Bercea2018-10-261-0/+101
Summary: This is a simple test of the parallel for code generation. It will be used to showcase the change introduced by patch D53443. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53772 llvm-svn: 345417
OpenPOWER on IntegriCloud