| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements parsing and sema for "omp declare mapper"
directive. User defined mapper, i.e., declare mapper directive, is a new
feature in OpenMP 5.0. It is introduced to extend existing map clauses
for the purpose of simplifying the copy of complex data structures
between host and device (i.e., deep copy). An example is shown below:
struct S { int len; int *d; };
#pragma omp declare mapper(struct S s) map(s, s.d[0:s.len]) // Memory region that d points to is also mapped using this mapper.
Contributed-by: Lingda Li <lildmh@gmail.com>
Differential Revision: https://reviews.llvm.org/D56326
llvm-svn: 352906
|
|
|
|
|
|
|
| |
In case of the empty module, the ptxas tool may emit error message about
empty debug info sections. This patch fixes this bug.
llvm-svn: 352421
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
|
|
|
|
|
|
|
|
|
| |
Each we create the target regions with the teams distribute inner
region, we can better estimate number of the teams required to execute
the target region. Function __kmpc_push_target_tripcount() is used for
purpose, which accepts device_id and the number of the iterations,
performed by the associated loop.
llvm-svn: 350571
|
|
|
|
|
|
|
| |
After the fix for the syncthreads we don't need to generate extra
barriers for the parallel reductions.
llvm-svn: 350530
|
|
|
|
|
|
| |
Updated codegen to use the new functions from the runtime library.
llvm-svn: 350415
|
|
|
|
|
|
|
|
|
|
| |
nvvm_barrier0.
Use runtime functions instead of the direct call to the nvvm intrinsics.
It allows to prevent some dangerous LLVM optimizations, that breaks the
code for the NVPTX target.
llvm-svn: 350328
|
|
|
|
|
|
|
|
|
|
|
| |
buffer.
Seems to me, nvlink has a bug with the proper support of the weakly
linked symbols. It does not allow to define several shared memory buffer
with the different sizes even with the weak linkage. Instead we always
use 128 bytes buffer to prevent nvlink from the error message emission.
llvm-svn: 349540
|
|
|
|
|
|
|
| |
The parallel reduction operation requires an extra synchronization point
in the inter-warp copy function to avoid divergence.
llvm-svn: 349525
|
|
|
|
|
|
|
|
|
| |
Inlined runtime with the current implementation of the interwarp copy
function leads to the undefined behavior because of the not quite
correct implementation of the barriers. Start using generic
__kmpc_barier function instead of the custom made barriers.
llvm-svn: 349192
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Found via `codespell -q 3 -I ../clang-whitelist.txt -L uint,importd,crasher,gonna,cant,ue,ons,orign,ned`
Reviewers: teemperor
Reviewed By: teemperor
Subscribers: teemperor, jholewinski, jvesely, nhaehnle, whisperity, jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D55475
llvm-svn: 348755
|
|
|
|
|
|
|
|
|
| |
If the array section is based on pointer and this sections is mapped in
target region + then it is used in the inner parallel region, it also
must be globalized as the pointer itself is passed by value, not by
reference.
llvm-svn: 348492
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Critical regions in NVPTX are the constructs, which, generally speaking,
are not supported by the NVPTX target. Instead we're using special
technique to handle the critical regions. Currently they are supported
only within the loop and all the threads in the loop must execute the
same critical region.
Inside of this special regions the regions still must be emitted as
critical, to avoid possible data races between the teams +
synchronization must use __kmpc_barrier functions.
llvm-svn: 348272
|
|
|
|
|
|
|
|
| |
__kmpc_barrier runtime functions must be marked as convergent to prevent
some dangerous optimizations. Also, for NVPTX target all barriers must
be emitted as simple barriers.
llvm-svn: 348271
|
|
|
|
|
|
|
|
|
| |
initialization.
Function __kmpc_global_thread_num should be called only after
initialization, not earlier.
llvm-svn: 347919
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed.
Reviewers: ABataev, caomhin
Reviewed By: ABataev
Subscribers: jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D54970
llvm-svn: 347915
|
|
|
|
|
|
| |
Added basic codegen support for the reductions across the teams.
llvm-svn: 347715
|
|
|
|
|
|
|
|
|
|
|
| |
modes.
If the region is inside target|teams|distribute region, we can emit the
locations with the correct info for execution mode and runtime mode.
Patch adds this ability to the NVPTX codegen to help the optimizer to
produce better code.
llvm-svn: 347583
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the NVPTX target default locations should be emitted as constants +
additional info must be emitted in the reserved_2 field of the ident_t
structure. The 1st bit controls the execution mode and the 2nd bit
controls use of the lightweight runtime. The combination of the bits for
Non-SPMD mode + lightweight runtime represents special undefined mode,
used outside of the target regions for orphaned directives or functions.
Should allow and additional optimization inside of the target regions.
llvm-svn: 347425
|
|
|
|
|
|
| |
requires directive. Differential Review: https://reviews.llvm.org/D54493
llvm-svn: 347214
|
|
|
|
| |
llvm-svn: 347133
|
|
|
|
|
|
|
|
|
|
|
| |
reductions.
Fixed previously committed code for the reduction support in
teams/parallel constructs taking into account new design of the NVPTX
support in the compiler. Teams reduction are not fully functional yet,
it is going to be fixed in the following patches.
llvm-svn: 347081
|
|
|
|
|
|
|
|
|
| |
If the statements between target|teams|distribute directives does not
require execution in master thread, like constant expressions, null
statements, simple declarations, etc., such construct can be xecuted in
SPMD mode.
llvm-svn: 346551
|
|
|
|
|
|
|
|
|
|
| |
target|teams|distribute variables.
If the total size of the variables, declared in target|teams|distribute
regions, is less than the maximal size of shared memory available, the
buffer is allocated in the shared memory.
llvm-svn: 346507
|
|
|
|
|
|
|
|
| |
Coalesced memory access requires use of the new function
`__kmpc_data_sharing_coalesced_push_stack` instead of the
`__kmpc_data_sharing_push_stack`.
llvm-svn: 345991
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
target/teams/distribute regions.
Target/teams/distribute regions exist for all the time the kernel is
executed. Thus, if the variable is declared in their context and then
escape it, we can allocate global memory statically instead of
allocating it dynamically.
Patch captures all the globalized variables in target/teams/distribute
contexts, merges them into the records, one per each target region.
Those records are then joined into the union, one per compilation unit
(to save the global memory). Those units are organized into
2 x dimensional arrays, where the first dimension is
the number of blocks per SM and the second one is the number of SMs.
Runtime functions manage this global memory space between the executing
teams.
llvm-svn: 345978
|
|
|
|
|
|
|
|
|
|
| |
Added support for mapping of lambdas in the target regions. It scans all
the captures by reference in the lambda, implicitly maps those variables
in the target region and then later reinstate the addresses of
references in lambda to the correct addresses of the captured|privatized
variables.
llvm-svn: 345609
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
parallel for
Summary: This patch adds a new code generation path for bound sharing directives containing distribute parallel for. The new code generation scheme applies to chunked schedules on distribute and parallel for directives. The scheme simplifies the code that is being generated by eliminating the need for an outer for loop over chunks for both distribute and parallel for directives. In the case of distribute it applies to any sized chunk while in the parallel for case it only applies when chunk size is 1.
Reviewers: ABataev, caomhin
Reviewed By: ABataev
Subscribers: jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D53448
llvm-svn: 345509
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch enables the choosing of the default schedule for parallel for loops even in non-SPMD cases.
Reviewers: ABataev, caomhin
Reviewed By: ABataev
Subscribers: jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D53443
llvm-svn: 345507
|
|
|
|
| |
llvm-svn: 344574
|
|
|
|
|
|
|
| |
Additional reduction of the global memory usage in the target regions
without parallel regions.
llvm-svn: 344413
|
|
|
|
|
|
|
|
|
|
|
|
| |
if the function has globalized variables and called in context of
target/teams/distribute regions, it does not need to globalize 32
copies of the same variables for memory coalescing, it is enough to
have just one copy, because there is parallel region.
Patch does this by adding call for `__kmpc_parallel_level` function and
checking its return value. If the code sees that the parallel level is
0, then only one variable is allocated, not 32.
llvm-svn: 344356
|
|
|
|
|
|
|
|
|
|
|
| |
target/teams/distribute regions.
Previously introduced globalization scheme that uses memory coalescing
scheme may increase memory usage fr the variables that are devlared in
target/teams/distribute contexts. We don't need 32 copies of such
variables, just 1. Patch reduces memory use in this case.
llvm-svn: 344273
|
|
|
|
|
|
|
|
|
| |
Added support for memory coalescing for better performance for
globalized variables. From now on all the globalized variables are
represented as arrays of 32 elements and each thread accesses these
elements using `tid & 31` as index.
llvm-svn: 344049
|
|
|
|
|
|
|
|
|
| |
mode.
__kmpc_global_thread_num() should be called before initialization of the
runtime.
llvm-svn: 343857
|
|
|
|
|
|
|
|
|
| |
Fixed emission of the __kmpc_global_thread_num() so that it is not
messed up with alloca instructions anymore. Plus, fixes emission of the
__kmpc_global_thread_num() functions in the target outlined regions so
that they are not called before runtime is initialized.
llvm-svn: 343856
|
|
|
|
|
|
|
|
|
|
|
| |
Worker threads fork off to the compiler generated worker function
directly after entering the kernel function. Hence, there is no
need to check whether the current thread is the master if we are
outside of a parallel region (neither SPMD nor parallel_level > 0).
Differential Revision: https://reviews.llvm.org/D52732
llvm-svn: 343618
|
|
|
|
|
|
|
|
| |
lightweight runtime.
The datasharing flag must be set to `1` when executing SPMD-mode compatible directive with reduction|lastprivate clauses.
llvm-svn: 343492
|
|
|
|
| |
llvm-svn: 343483
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mode achieve coalescing
Summary: Set default schedule for parallel for loops to schedule(static, 1) when using SPMD mode on the NVPTX device offloading toolchain to ensure coalescing.
Reviewers: ABataev, Hahnfeld, caomhin
Reviewed By: ABataev
Subscribers: jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D52629
llvm-svn: 343260
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mode achieve coalescing
Summary: For the OpenMP NVPTX toolchain choose a default distribute schedule that ensures coalescing on the GPU when in SPMD mode. This significantly increases the performance of offloaded target code and reduces the number of registers used on the GPU side.
Reviewers: ABataev, caomhin, Hahnfeld
Reviewed By: ABataev, Hahnfeld
Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D52434
llvm-svn: 343253
|
|
|
|
|
|
|
|
|
| |
Add support for OMP5.0 requires directive and unified_address clause.
Patches to follow will include support for additional clauses.
Differential Revision: https://reviews.llvm.org/D52359
llvm-svn: 343063
|
|
|
|
|
|
|
| |
Previously we could not use lastprivates in SPMD constructs, patch
allows supporting lastprivates in SPMD with uninitialized runtime.
llvm-svn: 342738
|
|
|
|
|
|
|
|
|
| |
'declare target'.
All the functions, referenced in implicit|explicit target regions must
be emitted during code emission for the device.
llvm-svn: 341093
|
|
|
|
|
|
|
| |
Added options -f[no-]openmp-cuda-force-full-runtime to [not] force use
of the full runtime for OpenMP offloading to CUDA devices.
llvm-svn: 341073
|
|
|
|
|
|
|
|
| |
If the target construct can be executed in SPMD mode + it is a loop
based directive with static scheduling, we can use lightweight runtime
support.
llvm-svn: 340953
|
|
|
|
|
|
|
| |
The attribute marked as inheritable since OpenMP 5.0 supports it +
additional fixes to support new functionality.
llvm-svn: 339704
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: teemperor!
Subscribers: jholewinski, whisperity, jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D50350
llvm-svn: 339385
|
|
|
|
|
|
|
|
| |
The first argument for the parallel outlined functions, called as
serialized parallel regions, should be a pointer to the global thread id
that always is 0.
llvm-svn: 337957
|
|
|
|
|
|
|
| |
Sometimes we can try to globalize non-variable declarations, which may
lead to compiler crash.
llvm-svn: 337191
|