diff options
| author | Alexey Bataev <a.bataev@hotmail.com> | 2018-07-26 17:53:45 +0000 |
|---|---|---|
| committer | Alexey Bataev <a.bataev@hotmail.com> | 2018-07-26 17:53:45 +0000 |
| commit | 3bdd60095f637118b7227065868a9206577e4867 (patch) | |
| tree | dc79b646760cffb60f7b5bcca2842c0e671216a2 /clang/docs/OpenMPSupport.rst | |
| parent | 66d405d31f5cb778902315fbefd6421fa85aa6d1 (diff) | |
| download | bcm5719-llvm-3bdd60095f637118b7227065868a9206577e4867.tar.gz bcm5719-llvm-3bdd60095f637118b7227065868a9206577e4867.zip | |
[OPENMP] What's new for OpenMP in clang.
Updated ReleaseNotes + Status of the OpenMP support in clang.
llvm-svn: 338049
Diffstat (limited to 'clang/docs/OpenMPSupport.rst')
| -rw-r--r-- | clang/docs/OpenMPSupport.rst | 75 |
1 files changed, 69 insertions, 6 deletions
diff --git a/clang/docs/OpenMPSupport.rst b/clang/docs/OpenMPSupport.rst index f34049473f8..e8ec1e371b0 100644 --- a/clang/docs/OpenMPSupport.rst +++ b/clang/docs/OpenMPSupport.rst @@ -10,13 +10,15 @@ .. role:: partial .. role:: good +.. contents:: + :local: + ================== OpenMP Support ================== -Clang fully supports OpenMP 3.1 + some elements of OpenMP 4.5. Clang supports offloading to X86_64, AArch64 and PPC64[LE] devices. -Support for Cuda devices is not ready yet. -The status of major OpenMP 4.5 features support in Clang. +Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64, +PPC64[LE] and has `basic support for Cuda devices`_. Standalone directives ===================== @@ -35,7 +37,7 @@ Standalone directives * #pragma omp target: :good:`Complete`. -* #pragma omp declare target: :partial:`Partial`. No full codegen support. +* #pragma omp declare target: :good:`Complete`. * #pragma omp teams: :good:`Complete`. @@ -64,5 +66,66 @@ Combined directives * #pragma omp target teams distribute parallel for [simd]: :good:`Complete`. -Clang does not support any constructs/updates from upcoming OpenMP 5.0 except for `reduction`-based clauses in the `task` and `target`-based directives. -In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS. +Clang does not support any constructs/updates from upcoming OpenMP 5.0 except +for `reduction`-based clauses in the `task` and `target`-based directives. + +In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools +Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS. +ows, and mac OS. + +.. _basic support for Cuda devices: + +Cuda devices support +==================== + +Directives execution modes +-------------------------- + +Clang code generation for target regions supports two modes: the SPMD and +non-SPMD modes. Clang chooses one of these two modes automatically based on the +way directives and clauses on those directives are used. The SPMD mode uses a +simplified set of runtime functions thus increasing performance at the cost of +supporting some OpenMP features. The non-SPMD mode is the most generic mode and +supports all currently available OpenMP features. The compiler will always +attempt to use the SPMD mode wherever possible. SPMD mode will not be used if: + + - The target region contains an `if()` clause that refers to a `parallel` + directive. + + - The target region contains a `parallel` directive with a `num_threads()` + clause. + + - The target region contains user code (other than OpenMP-specific + directives) in between the `target` and the `parallel` directives. + +Data-sharing modes +------------------ + +Clang supports two data-sharing models for Cuda devices: `Generic` and `Cuda` +modes. The default mode is `Generic`. `Cuda` mode can give an additional +performance and can be activated using the `-fopenmp-cuda-mode` flag. In +`Generic` mode all local variables that can be shared in the parallel regions +are stored in the global memory. In `Cuda` mode local variables are not shared +between the threads and it is user responsibility to share the required data +between the threads in the parallel regions. + +Features not supported or with limited support for Cuda devices +--------------------------------------------------------------- + +- Reductions across the teams are not supported yet. + +- Cancellation constructs are not supported. + +- Doacross loop nest is not supported. + +- User-defined reductions are supported only for trivial types. + +- Nested parallelism: inner parallel regions are executed sequentially. + +- Static linking of libraries containing device code is not supported yet. + +- Automatic translation of math functions in target regions to device-specific + math functions is not implemented yet. + +- Debug information for OpenMP target regions is not supported yet. + |

