summaryrefslogtreecommitdiffstats
path: root/openmp/libomptarget/deviceRTLs/nvptx/test/parallel
Commit message (Collapse)AuthorAgeFilesLines
* [OpenMP] NFC: Fix trivial typos in commentsKazuaki Ishizaki2020-01-071-2/+2
| | | | | | | | | | | | Reviewers: jdoerfert, Jim Reviewed By: Jim Subscribers: Jim, mgorny, guansong, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72285
* [OPENMP][NVPTX]Fix parallel level counter in non-SPMD mode.Alexey Bataev2019-09-031-0/+12
| | | | | | | | | | | | | | | | | | | Summary: In non-SPMD mode we may end up with the divergent threads when trying to increment/decrement parallel level counter. It may lead to incorrect calculations of the parallel level and wrong results when threads are divergent. We need to reconverge the threads before trying to modify the parallel level counter. Reviewers: grokos, jdoerfert Subscribers: guansong, openmp-commits, caomhin, kkwli0 Tags: #openmp Differential Revision: https://reviews.llvm.org/D66802 llvm-svn: 370803
* [OPENMP][NVPTX]Perform memory flush if number of threads to sync is 1 or less.Alexey Bataev2019-07-251-0/+37
| | | | | | | | | | | | | | | | | Summary: According to the OpenMP standard, barrier operation must perform implicit flush operation. Currently, if there is only one thread in the team, barrier does not flush the memory. Patch fixes this problem. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62398 llvm-svn: 367024
* [OPENMP]Make __kmpc_push_tripcount thread safe.Alexey Bataev2019-07-081-0/+22
| | | | | | | | | | | | | | | | | | | | | | Summary: __kmpc_push_tripcount function is not thread safe and may lead to data race when the target regions are executed in parallel threads. The patch makes loopTripCnt counter thread aware and stores the tripcount value per thread in the map. Access to map is guarded by mutex to prevent data race in the map itself. Test is for NVPTX target because it does not work correctly on the host. Seems to me, there is a problem in libomp with target regions in the parallel threads. Reviewers: grokos Subscribers: guansong, jfb, jdoerfert, openmp-commits, kkwli0, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D64080 llvm-svn: 365332
* [OPENMP][NVPTX]Relax flush directive.Alexey Bataev2019-06-271-0/+35
| | | | | | | | | | | | | | | | | | | | | | | Summary: According to the OpenMP standard, flush makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied. According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality. __threadfence_system() also provides required functionality, but it also includes some extra functionality, like synchronization of page-locked host memory, synchronization for the host, etc. It is not required per the standard and we can use more relaxed version of memory fence operation. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62397 llvm-svn: 364572
* [OPENMP][NVPTX]Improve code by using parallel level counter.Alexey Bataev2019-05-021-7/+71
| | | | | | | | | | | | | | | | | | Summary: Previously for the different purposes we need to get the active/common parallel level and with full runtime we iterated over all the records to calculate this level. Instead, we can used the warp-based parallel level counters used in no-runtime mode. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61395 llvm-svn: 359822
* [OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.Alexey Bataev2019-04-261-11/+22
| | | | | | | | | | | | | | | | | | | Summary: The parallelLevel counter must be on per-thread basis to fully support L2+ parallelism, otherwise we may end up with undefined behavior. Introduce the parallelLevel on per-warp basis using shared memory. It allows to avoid the problems with the synchronization and allows fully support L2+ parallelism in SPMD mode with no runtime. Reviewers: gtbercea, grokos Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D60918 llvm-svn: 359341
* [OPENMP][NVPTX] Fix the test, NFC.Alexey Bataev2019-04-221-7/+17
| | | | | | | | Fix the test to run it really in SPMD mode without runtime. Previously it was run in SPMD + full runtime mode and does not allow to cehck the functionality correctly. llvm-svn: 358902
* [OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions.Alexey Bataev2019-04-151-0/+30
| | | | | | | | | | | | | | | | | | | | Summary: If the kernel is executed in SPMD mode and the L2+ parallel for region with the dynamic scheduling is executed, dynamic scheduling functions are called. They expect full runtime support, but SPMD kernels may be executed without the full runtime. It leads to the runtime crash of the compiled program. Patch fixes this problem + fixes handling of the parallelism level in SPMD mode, which is required as part of this patch. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D60578 llvm-svn: 358442
* [libomptarget-nvptx] Fix ancestor_thread_num and team_size (non-SPMD)Jonas Hahnfeld2018-09-301-7/+77
| | | | | | | | | | | | | | According to OpenMP 4.5, p250:12-14: If the requested nest level is outside the range of 0 and the nest level of the current thread, as returned by the omp_get_level routine, the routine returns -1. The SPMD code path will need a similar fix. Differential Revision: https://reviews.llvm.org/D51787 llvm-svn: 343401
* [libomptarget-nvptx] Add tests for nested parallelismJonas Hahnfeld2018-09-292-0/+141
| | | | | | | | | Clang trunk will serialize nested parallel regions. Check that this is correctly reflected in various API methods. Differential Revision: https://reviews.llvm.org/D51786 llvm-svn: 343382
* [libomptarget-nvptx] Fix number of threads in parallelJonas Hahnfeld2018-09-291-0/+102
| | | | | | | | | | | | If there is no num_threads() clause we must consider the nthreads-var ICV. Its value is set by omp_set_num_threads() and can be queried using omp_get_max_num_threads(). The rewritten code now closely resembles the algorithm given in the OpenMP standard. Differential Revision: https://reviews.llvm.org/D51783 llvm-svn: 343380
* [libomptarget-nvptx] Add testing infrastructureJonas Hahnfeld2018-09-281-0/+77
This patch also introduces testing for libomptarget-nvptx which has been missing until now. I propose to add tests for all bugs that are fixed in the future. The target check-libomptarget-nvptx is not run by default because - we can't determine if there is a GPU plugged into the system. - it will require the latest Clang compiler. Keeping compatibility with older releases would prevent testing newer code generation developed in trunk. Differential Revision: https://reviews.llvm.org/D51687 llvm-svn: 343324
OpenPOWER on IntegriCloud