summaryrefslogtreecommitdiffstats
path: root/openmp/libomptarget/deviceRTLs
Commit message (Collapse)AuthorAgeFilesLines
...
* Use named constant to indicate all lanes, to handle 32 and 64 wide architecturesJon Chesterfield2019-10-043-8/+10
| | | | | | | | | | | | | | | | Summary: Use named constant to indicate all lanes, to handle 32 and 64 wide architectures Reviewers: ABataev, jdoerfert, grokos, ronlieb Reviewed By: grokos Subscribers: ronlieb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D68369 llvm-svn: 373793
* [OPENMP][NVPTX]Fix parallel level counter in non-SPMD mode.Alexey Bataev2019-09-034-16/+45
| | | | | | | | | | | | | | | | | | | Summary: In non-SPMD mode we may end up with the divergent threads when trying to increment/decrement parallel level counter. It may lead to incorrect calculations of the parallel level and wrong results when threads are divergent. We need to reconverge the threads before trying to modify the parallel level counter. Reviewers: grokos, jdoerfert Subscribers: guansong, openmp-commits, caomhin, kkwli0 Tags: #openmp Differential Revision: https://reviews.llvm.org/D66802 llvm-svn: 370803
* [libomptarget] Refactor activemask macro to inline functionJon Chesterfield2019-09-038-30/+29
| | | | | | | | | | | | | | | | | | Summary: [libomptarget] Refactor activemask macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Reviewed By: jdoerfert, ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66851 llvm-svn: 370781
* Use target_impl functions to replace more inline asmJon Chesterfield2019-08-285-39/+38
| | | | | | | | | | | | | | | | | Summary: Use target_impl functions to replace more inline asm Follow on from D65836. Removes remaining asm shuffles and lanemask accessors Also changes the types of target_impl bitwise functions to unsigned. Reviewers: jdoerfert, ABataev, grokos, Hahnfeld, gregrodgers, ronlieb, hfinkel Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66809 llvm-svn: 370216
* [libomptarget] Refactor syncthreads macro to inline functionJon Chesterfield2019-08-285-19/+16
| | | | | | | | | | | | | | | | | | Summary: [libomptarget] Refactor syncthreads macro to inline function See also abandoned D66846, split into this diff and others. Rev 2 of D66855 Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66861 llvm-svn: 370210
* [libomptarget] Refactor syncwarp macro to inline functionJon Chesterfield2019-08-283-8/+13
| | | | | | | | | | | | | | | | Summary: [libomptarget] Refactor syncwarp macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66857 llvm-svn: 370149
* Fix build break due to close brace lost in mergeJon Chesterfield2019-08-281-0/+1
| | | | llvm-svn: 370148
* [libomptarget] Refactor shfl_down_sync macro to inline functionJon Chesterfield2019-08-283-7/+14
| | | | | | | | | | | | | | | | Summary: [libomptarget] Refactor shfl_down_sync macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66853 llvm-svn: 370146
* [libomptarget] Refactor shfl_sync macro to inline functionJon Chesterfield2019-08-285-8/+24
| | | | | | | | | | | | | | | | Summary: [libomptarget] Refactor shfl_sync macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66852 llvm-svn: 370144
* [OPENMP][NVPTX]Add __kmpc_syncwarp(int32_t) function.Alexey Bataev2019-08-263-0/+14
| | | | | | | | | | | | | | | | | | | Summary: Added function void __kmpc_syncwarp(int32_t) to expose it to the compiler. It is required to fix the problem with the critical regions in Cuda9.0+. We cannot use barrier in the critical region, but still need to reconverge the threads in the warp after. This function allows to do this. Reviewers: grokos, jdoerfert Subscribers: guansong, openmp-commits, kkwli0, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D66672 llvm-svn: 369933
* [OPENMP][NVPTX]Use __syncwarp() to reconverge the threads.Alexey Bataev2019-08-232-10/+19
| | | | | | | | | | | | | | | | | | | | | | | Summary: In Cuda 9.0 it is not guaranteed that threads in the warps are convergent. We need to use __syncwarp() function to reconverge the threads and to guarantee the memory ordering among threads in the warps. This is the first patch to fix the problem with the test libomptarget/deviceRTLs/nvptx/src/sync.cu on Cuda9+. This patch just replaces calls to __shfl_sync() function with the call of __syncwarp() function where we need to reconverge the threads when we try to modify the value of the parallel level counter. Reviewers: grokos Subscribers: guansong, jfb, jdoerfert, caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65013 llvm-svn: 369796
* Factor architecture dependent code out of loop.cuJon Chesterfield2019-08-132-9/+49
| | | | | | | | | | | | | | | | | | | | | Summary: [libomptarget] Factor architecture dependent code out of loop.cu Related to the patch series starting D64217. Added subscribers to said series as reviewers. This effort is smaller in scope. This patch factors out just enough architecture dependent code from loop.cu to allow the same source to be used with amdgcn, given a different target_impl.h. Testing is that the same bitcode (modulo variable names) is generated for libomptarget before and after the refactor, for nvptx and the out of tree amdgcn. Reviewers: jdoerfert, ABataev, bollu, jfb, tra, grokos, Hahnfeld, guansong, xtian, gregrodgers, ronlieb, hfinkel, gtbercea, guraypp, arpith-jacob Reviewed By: jdoerfert, ABataev Subscribers: dexonsmith, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65836 llvm-svn: 368751
* Use forceinline. Necessary for nvcc to inline small functions within the ↵Jon Chesterfield2019-08-071-1/+1
| | | | | | | | | | | | | | | | | | bitcode library Summary: [libomptarget] Use forceinline. Necessary for nvcc to inline small functions within the bitcode library Suggested in D65836 Reviewers: ABataev, jdoerfert, grokos, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65876 llvm-svn: 368177
* [OPENMP][NVPTX]Perform memory flush if number of threads to sync is 1 or less.Alexey Bataev2019-07-252-0/+40
| | | | | | | | | | | | | | | | | Summary: According to the OpenMP standard, barrier operation must perform implicit flush operation. Currently, if there is only one thread in the team, barrier does not flush the memory. Patch fixes this problem. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62398 llvm-svn: 367024
* [OPENMP][NVPTX]Fixed checks for cuda versions.Alexey Bataev2019-07-161-2/+4
| | | | | | | | | | | | | | | | | | | | | Summary: We used CUDART_VERSION macro to check for the installed cuda version but this macro is defined in cuda_runtime_api.h, which is not used by project. Better to use CUDA_VERSION macro, which is defined in cuda.h. Also, added the check if this macro is defined. If macro is undefined, there is something wrong with the cuda configuration and we should not continue the compilation. This also fixes problems with runtime building in cuda 10+. Reviewers: grokos Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D64648 llvm-svn: 366224
* [libomptarget-nvptx] Remove dead functionsJonas Hahnfeld2019-07-112-85/+1
| | | | | | | | | These entry points are never called by Clang trunk nor clang-ykt. If XL doesn't use them either, they can finally go away. Differential Revision: https://reviews.llvm.org/D52700 llvm-svn: 365817
* [OPENMP]Make __kmpc_push_tripcount thread safe.Alexey Bataev2019-07-081-0/+22
| | | | | | | | | | | | | | | | | | | | | | Summary: __kmpc_push_tripcount function is not thread safe and may lead to data race when the target regions are executed in parallel threads. The patch makes loopTripCnt counter thread aware and stores the tripcount value per thread in the map. Access to map is guarded by mutex to prevent data race in the map itself. Test is for NVPTX target because it does not work correctly on the host. Seems to me, there is a problem in libomp with target regions in the parallel threads. Reviewers: grokos Subscribers: guansong, jfb, jdoerfert, openmp-commits, kkwli0, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D64080 llvm-svn: 365332
* [OPENMP][NVPTX]Relax flush directive.Alexey Bataev2019-06-272-1/+36
| | | | | | | | | | | | | | | | | | | | | | | Summary: According to the OpenMP standard, flush makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied. According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality. __threadfence_system() also provides required functionality, but it also includes some extra functionality, like synchronization of page-locked host memory, synchronization for the host, etc. It is not required per the standard and we can use more relaxed version of memory fence operation. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62397 llvm-svn: 364572
* [OPENMP][CUDA]Use __syncthreads when compiled by nvcc and clang >= 9.0.Alexey Bataev2019-06-191-0/+5
| | | | | | | | | | | | | | | | Summary: The problems with __syncthreads() were fixed in clang >= 9.0 and the original __syncthreads() can be used instead of the ptx instruction. Reviewers: grokos Subscribers: guansong, jdoerfert, openmp-commits, kkwli0, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D63515 llvm-svn: 363807
* Revert "[OPENMP][NVPTX]Fix barriers and parallel level counters, NFC."Alexey Bataev2019-05-243-6/+3
| | | | | | This reverts commit r361421 to split the patch into 3 parts. llvm-svn: 361638
* [OPENMP][NVPTX]Fix barriers and parallel level counters, NFC.Alexey Bataev2019-05-223-3/+6
| | | | | | | | | | | | | | | | | | | | | Summary: Parallel level counter should be volatile to prevent some dangerous optimiations by the ptxas. Otherwise, ptxas optimizations lead to undefined behaviour in some cases. Also, use __threadfence() for #pragma omp flush and if the barrier should not be used (we have only one thread in the team), still perform flush operation since the standard requires implicit flush when executing barriers. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62199 llvm-svn: 361421
* [OPENMP][NVPTX]Simplify handling of thread limit, NFC.Alexey Bataev2019-05-136-26/+19
| | | | | | | | | | | | | | | | | Summary: Patch improves performance of the full runtime mode by moving threads limit counter to the shared memory. It also allows to save global memory. Reviewers: grokos, kkwli0, gtbercea Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D61801 llvm-svn: 360584
* [OPENMP][NVPTX]Improve number of threads counter, NFC.Alexey Bataev2019-05-1011-83/+47
| | | | | | | | | | | | | | | | | Summary: Patch improves performance of the full runtime mode by moving number-of-threads counter to the shared memory. It also allows to save global memory. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D61785 llvm-svn: 360457
* [OPENMP][NVPTX]Improve thread limit counter, NFC.Alexey Bataev2019-05-035-12/+6
| | | | | | | | | | | | | | | | | Summary: Patch improves performance of the full runtime mode by moving thread-limit counter to the shared memory. It also allows to save global memory. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61526 llvm-svn: 359922
* [OPENMP][NVPTX]Improved several standard OpenMP functions, NFC.Alexey Bataev2019-05-031-20/+19
| | | | | | | | | | | | | | | | | Summary: Used parallelLevel[] counter to simplify and improve implementation of the existing standard OpenMP functions. Functions are tested already in several tests, the patch is NFC. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61459 llvm-svn: 359892
* [OPENMP][NVPTX]Improve code by using parallel level counter.Alexey Bataev2019-05-0210-197/+200
| | | | | | | | | | | | | | | | | | Summary: Previously for the different purposes we need to get the active/common parallel level and with full runtime we iterated over all the records to calculate this level. Instead, we can used the warp-based parallel level counters used in no-runtime mode. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61395 llvm-svn: 359822
* [OPENMP][NVPTX]Improve omp_get_max_threads() function.Alexey Bataev2019-05-022-5/+48
| | | | | | | | | | | | | | | | Summary: Function omp_get_max_threads() can always return 1 if current execution mode is SPMD. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61379 llvm-svn: 359792
* [OPENMP][NVPTX]Improved omp_get_thread_limit() function.Alexey Bataev2019-05-022-6/+75
| | | | | | | | | | | | | | | | Summary: Function omp_get_thread_limit() in SPMD mode can return the maximum available number of threads as a result. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D61378 llvm-svn: 359790
* [OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.Alexey Bataev2019-04-268-26/+50
| | | | | | | | | | | | | | | | | | | Summary: The parallelLevel counter must be on per-thread basis to fully support L2+ parallelism, otherwise we may end up with undefined behavior. Introduce the parallelLevel on per-warp basis using shared memory. It allows to avoid the problems with the synchronization and allows fully support L2+ parallelism in SPMD mode with no runtime. Reviewers: gtbercea, grokos Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D60918 llvm-svn: 359341
* [OPENMP][NVPTX] Fix the test, NFC.Alexey Bataev2019-04-221-7/+17
| | | | | | | | Fix the test to run it really in SPMD mode without runtime. Previously it was run in SPMD + full runtime mode and does not allow to cehck the functionality correctly. llvm-svn: 358902
* [OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions.Alexey Bataev2019-04-154-6/+48
| | | | | | | | | | | | | | | | | | | | Summary: If the kernel is executed in SPMD mode and the L2+ parallel for region with the dynamic scheduling is executed, dynamic scheduling functions are called. They expect full runtime support, but SPMD kernels may be executed without the full runtime. It leads to the runtime crash of the compiled program. Patch fixes this problem + fixes handling of the parallelism level in SPMD mode, which is required as part of this patch. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D60578 llvm-svn: 358442
* [OpenMP][libomptarget] New reduction scheme for team reductionsGheorghe-Teodor Bercea2019-02-202-0/+148
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds a more sophisticated team reduction scheme to the OpenMP libomptarget-nvptx runtime. The scheme uses a fixed size global memory buffer whose length can be adjusted via compiler flag: ``` -fopenmp-cuda-teams-reduction-recs-num=1024 ``` The global buffer is a structure of arrays (with default size of 1024 each and controlled by the above flag), one array for each reduction variable. Values in the buffer are processed by the last team to finish executing the body of the target region. In addition to adding support for the new flag, the compiler also emits special functions used for the reduction of the intermediate reduction values. These changes will be added in a separate compiler patch following this one. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, jdoerfert, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D58409 llvm-svn: 354471
* Update more file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-1922-88/+66
| | | | | | | | | | | | | | | | | | to reflect the new license. These used slightly different spellings that defeated my regular expressions. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351648
* [OpenMP][libomptarget] Use shared memory variable for tracking parallel levelGheorghe-Teodor Bercea2019-01-096-69/+21
| | | | | | | | | | | | | | Summary: Replace existing infrastructure for tracking parallel level using global memory with a per-team shared memory variable. This minimizes the impact of the overhead of tracking the parallel level for non-nested cases. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D55773 llvm-svn: 350747
* [OPENMP][NVPTX]Fix dynamic scheduling.Alexey Bataev2019-01-072-22/+47
| | | | | | | | | | | | | | | Summary: Previous implementation may cause the runtime crash when the number of teams is > 1024. Patch fixes this problem + reduces number of the atomic operations by 32 times. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56332 llvm-svn: 350524
* [OPENMP][NVPTX]General formatting/code improvement, NFC.Alexey Bataev2019-01-0410-81/+57
| | | | | | | | | | | | Summary: Formatting. Reviewers: gtbercea, grokos, kkwli0 Subscribers: guansong, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56290 llvm-svn: 350431
* [OPENMP][NVPTX]Improve performance + reduce number of used registers.Alexey Bataev2019-01-0413-83/+124
| | | | | | | | | | | | | | | | | Summary: Reduced number of the used register + improved performance propagating the information about current execution/data sharing mode directly from the compiler, where it is possible. In some cases, it requires new/reworked interfaces of the runtime external functions. Old functions are marked as deprecated. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56278 llvm-svn: 350405
* [OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC.Alexey Bataev2019-01-034-6/+15
| | | | | | | | | | | | | | | | | | Summary: One of the LLVM optimizations, split critical edges, also clones tail instructions. This is a dangerous operation for __syncthreads() functions and this transformation leads to undefined behavior or incorrect results. Patch fixes this problem by replacing __syncthreads() function with the assembler instruction, which cost is too high and wich cannot be copied. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56274 llvm-svn: 350333
* [OPENMP][NVPTX]Added/fixed debugging messages, NFC.Alexey Bataev2018-12-283-3/+7
| | | | | | | | | | | | | | Summary: Added or fixed new/old debugging messages for the better diagnostics. Reviewers: gtbercea, kkwli0, grokos Reviewed By: grokos Subscribers: caomhin, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D56102 llvm-svn: 350137
* [OPENMP][NVPTX]Fixed initialization of the data-sharing interface.Alexey Bataev2018-12-281-6/+9
| | | | | | | | | | | | | | | | Summary: Avoid using of the atomic loop to wait for the completion of the data-sharing interface initialization, use __shfl_sync instead for the communication within the warp to signal other threads in the warp about completion of the initialization. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D56100 llvm-svn: 350129
* [OPENMP][NVPTX]Outline assert into noinline function, NFC.Alexey Bataev2018-12-281-8/+19
| | | | | | | | | | | | | | | | Summary: At high optimization level asserts lead to some unexpected results because of auto-inserted unreachable instructions. This outlining prevents some of such dangerous optimizations and leads to better stability. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D56101 llvm-svn: 350128
* [OPENMP][NVPTX]Revert __kmpc_shuffle_int64 to its original form.Alexey Bataev2018-12-101-5/+0
| | | | | | | | | | | | | | Summary: Use the original shuffle implementation for __kmpc_shuffle_int64 since default implementation uses the same implementation. Reviewers: gtbercea Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55514 llvm-svn: 348772
* [OPENMP][NVPTX]Enable fast shuffles on 64bit values only if CUDA >= 9.Alexey Bataev2018-12-101-1/+11
| | | | | | | | | | | | | | Summary: Shuffle on 64bit data is allowed only for CUDA >= 9.0. Also, fixed the constant for the mask, need one extra L in the end. Reviewers: gtbercea, kkwli0 Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55440 llvm-svn: 348758
* [OPENMP][NVPTX]Save registers for optimized builds with enabled logging.Alexey Bataev2018-12-071-18/+14
| | | | | | | | | | | | | | | | | Summary: Introduced special noinline function log that allows to save some registers for optimized builds but with enabled logging. Also, it increases the stability of the optimized builds with inlined runtime. Reviewers: gtbercea, kkwli0 Reviewed By: gtbercea Subscribers: caomhin, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D55436 llvm-svn: 348606
* [OPENMP][NVPTX]Correct type casting for printf args + simplified shfl64 ↵Alexey Bataev2018-12-0611-108/+140
| | | | | | | | | | | | | | | | function. Summary: Explicitly casted printf's args to the required types + simplified shfl64 function. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55379 llvm-svn: 348521
* [OPENMP][NVPTX]Fix __kmpc_flush to flush the memory per system, not per block.Alexey Bataev2018-12-061-1/+1
| | | | | | | | | | | | | | | Summary: According to the standard, after memory flushing the changes in the memory must be visible to all the threads in all teams. Patch fixes this. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55370 llvm-svn: 348491
* [OpenMP][libomptarget] Flush intermediate values during team reduction Gheorghe-Teodor Bercea2018-12-031-0/+1
| | | | | | | | | | | | | | Summary: Ensure intermediate values of a team reduction are flushed to memory. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D55219 llvm-svn: 348148
* [OPENMP][NVPTX]Make runtime compatible with the original runtime.Alexey Bataev2018-11-305-159/+87
| | | | | | | | | | | | | | | Summary: Reworked runtime to make it compatible with the requirements of the original runtime library. Also, simplified some code to reduce number of function calls. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55130 llvm-svn: 348003
* [OpenMP][libomptarget] Add new version of SPMD deinit kernel function with ↵Gheorghe-Teodor Bercea2018-11-272-3/+8
| | | | | | | | | | | | | | | | argument Summary: To enable the compiler to optimize parts of the function that are not needed when runtime can be omitted, a new version of the SPMD deinit kernel function is needed. This function takes the runtime required flag as an argument. Reviewers: ABataev, kkwli0, caomhin Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D54969 llvm-svn: 347714
* [OPENMP][NVPTX]Basic support for reductions across the teams.Alexey Bataev2018-11-272-0/+24
| | | | | | | | | | | | | | | Summary: Added functions __kmpc_nvptx_teams_reduce_nowait_simple and __kmpc_nvptx_teams_end_reduce_nowait_simple to implement basic support for reductions across the teams. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D54967 llvm-svn: 347710
OpenPOWER on IntegriCloud