| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
target|teams|distribute variables.
If the total size of the variables, declared in target|teams|distribute
regions, is less than the maximal size of shared memory available, the
buffer is allocated in the shared memory.
llvm-svn: 346507
|
|
|
|
|
|
|
|
| |
Coalesced memory access requires use of the new function
`__kmpc_data_sharing_coalesced_push_stack` instead of the
`__kmpc_data_sharing_push_stack`.
llvm-svn: 345991
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
target/teams/distribute regions.
Target/teams/distribute regions exist for all the time the kernel is
executed. Thus, if the variable is declared in their context and then
escape it, we can allocate global memory statically instead of
allocating it dynamically.
Patch captures all the globalized variables in target/teams/distribute
contexts, merges them into the records, one per each target region.
Those records are then joined into the union, one per compilation unit
(to save the global memory). Those units are organized into
2 x dimensional arrays, where the first dimension is
the number of blocks per SM and the second one is the number of SMs.
Runtime functions manage this global memory space between the executing
teams.
llvm-svn: 345978
|
|
|
|
|
|
|
|
|
|
| |
Added support for mapping of lambdas in the target regions. It scans all
the captures by reference in the lambda, implicitly maps those variables
in the target region and then later reinstate the addresses of
references in lambda to the correct addresses of the captured|privatized
variables.
llvm-svn: 345609
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
parallel for
Summary: This patch adds a new code generation path for bound sharing directives containing distribute parallel for. The new code generation scheme applies to chunked schedules on distribute and parallel for directives. The scheme simplifies the code that is being generated by eliminating the need for an outer for loop over chunks for both distribute and parallel for directives. In the case of distribute it applies to any sized chunk while in the parallel for case it only applies when chunk size is 1.
Reviewers: ABataev, caomhin
Reviewed By: ABataev
Subscribers: jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D53448
llvm-svn: 345509
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch enables the choosing of the default schedule for parallel for loops even in non-SPMD cases.
Reviewers: ABataev, caomhin
Reviewed By: ABataev
Subscribers: jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D53443
llvm-svn: 345507
|
|
|
|
| |
llvm-svn: 344574
|
|
|
|
|
|
|
| |
Additional reduction of the global memory usage in the target regions
without parallel regions.
llvm-svn: 344413
|
|
|
|
|
|
|
|
|
|
|
|
| |
if the function has globalized variables and called in context of
target/teams/distribute regions, it does not need to globalize 32
copies of the same variables for memory coalescing, it is enough to
have just one copy, because there is parallel region.
Patch does this by adding call for `__kmpc_parallel_level` function and
checking its return value. If the code sees that the parallel level is
0, then only one variable is allocated, not 32.
llvm-svn: 344356
|
|
|
|
|
|
|
|
|
|
|
| |
target/teams/distribute regions.
Previously introduced globalization scheme that uses memory coalescing
scheme may increase memory usage fr the variables that are devlared in
target/teams/distribute contexts. We don't need 32 copies of such
variables, just 1. Patch reduces memory use in this case.
llvm-svn: 344273
|
|
|
|
|
|
|
|
|
| |
Added support for memory coalescing for better performance for
globalized variables. From now on all the globalized variables are
represented as arrays of 32 elements and each thread accesses these
elements using `tid & 31` as index.
llvm-svn: 344049
|
|
|
|
|
|
|
|
|
| |
mode.
__kmpc_global_thread_num() should be called before initialization of the
runtime.
llvm-svn: 343857
|
|
|
|
|
|
|
|
|
| |
Fixed emission of the __kmpc_global_thread_num() so that it is not
messed up with alloca instructions anymore. Plus, fixes emission of the
__kmpc_global_thread_num() functions in the target outlined regions so
that they are not called before runtime is initialized.
llvm-svn: 343856
|
|
|
|
|
|
|
|
|
|
|
| |
Worker threads fork off to the compiler generated worker function
directly after entering the kernel function. Hence, there is no
need to check whether the current thread is the master if we are
outside of a parallel region (neither SPMD nor parallel_level > 0).
Differential Revision: https://reviews.llvm.org/D52732
llvm-svn: 343618
|
|
|
|
|
|
|
|
| |
lightweight runtime.
The datasharing flag must be set to `1` when executing SPMD-mode compatible directive with reduction|lastprivate clauses.
llvm-svn: 343492
|
|
|
|
| |
llvm-svn: 343483
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mode achieve coalescing
Summary: Set default schedule for parallel for loops to schedule(static, 1) when using SPMD mode on the NVPTX device offloading toolchain to ensure coalescing.
Reviewers: ABataev, Hahnfeld, caomhin
Reviewed By: ABataev
Subscribers: jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D52629
llvm-svn: 343260
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mode achieve coalescing
Summary: For the OpenMP NVPTX toolchain choose a default distribute schedule that ensures coalescing on the GPU when in SPMD mode. This significantly increases the performance of offloaded target code and reduces the number of registers used on the GPU side.
Reviewers: ABataev, caomhin, Hahnfeld
Reviewed By: ABataev, Hahnfeld
Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D52434
llvm-svn: 343253
|
|
|
|
|
|
|
|
|
| |
Add support for OMP5.0 requires directive and unified_address clause.
Patches to follow will include support for additional clauses.
Differential Revision: https://reviews.llvm.org/D52359
llvm-svn: 343063
|
|
|
|
|
|
|
| |
Previously we could not use lastprivates in SPMD constructs, patch
allows supporting lastprivates in SPMD with uninitialized runtime.
llvm-svn: 342738
|
|
|
|
|
|
|
|
|
| |
'declare target'.
All the functions, referenced in implicit|explicit target regions must
be emitted during code emission for the device.
llvm-svn: 341093
|
|
|
|
|
|
|
| |
Added options -f[no-]openmp-cuda-force-full-runtime to [not] force use
of the full runtime for OpenMP offloading to CUDA devices.
llvm-svn: 341073
|
|
|
|
|
|
|
|
| |
If the target construct can be executed in SPMD mode + it is a loop
based directive with static scheduling, we can use lightweight runtime
support.
llvm-svn: 340953
|
|
|
|
|
|
|
| |
The attribute marked as inheritable since OpenMP 5.0 supports it +
additional fixes to support new functionality.
llvm-svn: 339704
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: teemperor!
Subscribers: jholewinski, whisperity, jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D50350
llvm-svn: 339385
|
|
|
|
|
|
|
|
| |
The first argument for the parallel outlined functions, called as
serialized parallel regions, should be a pointer to the global thread id
that always is 0.
llvm-svn: 337957
|
|
|
|
|
|
|
| |
Sometimes we can try to globalize non-variable declarations, which may
lead to compiler crash.
llvm-svn: 337191
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: In the SPMD case, we need to initialize the data sharing and globalization infrastructure. This covers the case when an SPMD region calls a function in a different compilation unit.
Reviewers: ABataev, carlo.bertolli, caomhin
Reviewed By: ABataev
Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D49188
llvm-svn: 337015
|
|
|
|
|
|
|
|
|
|
| |
In generic data-sharing mode we are allowed to not globalize local
variables that escape their declaration context iff they are declared
inside of the parallel region. We can do this because L2 parallel
regions are executed sequentially and, thus, we do not need to put
shared local variables in the global memory.
llvm-svn: 336567
|
|
|
|
|
|
|
|
|
| |
Patch tries to make better analysis of the variables that should be
globalized. From now, instead of all parallel directives it will check
only distribute parallel .. directives and check only for
firstprivte/lastprivate variables if they must be globalized.
llvm-svn: 335632
|
|
|
|
|
|
|
|
| |
If the shuffle is required for the reduced structures/big data type,
current code may cause compiler crash because of the loading of the
aggregate values. Patch fixes this problem.
llvm-svn: 335377
|
|
|
|
|
|
|
|
|
|
| |
parallel region.
If the current construct requires sharing of the local variable in the
inner parallel region, this variable must be globalized to avoid
runtime crash.
llvm-svn: 335285
|
|
|
|
|
|
|
| |
If simple reduction is requested, use the simple reduction instead of
the runtime functions calls.
llvm-svn: 334962
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If orphaned parallel region is found, the next code must be emitted:
```
if(__kmpc_is_spmd_exec_mode() || __kmpc_parallel_level(loc, gtid))
Serialized execution.
else if (IsMasterThread())
Prepare and signal worker.
else
Outined function call.
```
llvm-svn: 333301
|
|
|
|
|
|
|
|
| |
If the orphaned directive is executed in SPMD mode, we need to emit the
check for the SPMD mode and run the orphaned parallel directive in
sequential mode.
llvm-svn: 332467
|
|
|
|
|
|
|
|
| |
In generic data-sharing mode we do not need to globalize
variables/parameters of reference/pointer types. They already are placed
in the global memory.
llvm-svn: 332380
|
|
|
|
|
|
|
|
|
| |
distribute simd directives.
Directives `target simd` and `target teams distribute simd` must be
executed in non-SPMD mode.
llvm-svn: 332129
|
|
|
|
|
|
|
|
| |
Added initial support for L2 parallelism in SPMD mode. Note, though,
that the orphaned parallel directives are not currently supported in
SPMD mode.
llvm-svn: 332016
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is similar to the LLVM change https://reviews.llvm.org/D46290.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46320
llvm-svn: 331834
|
|
|
|
|
|
| |
Added correct codegen for the critical construct on NVPTX devices.
llvm-svn: 331652
|
|
|
|
|
|
|
| |
Added initial codegen for level 2, 3 etc. parallelism. Currently, all
the second, the third etc. parallel regions will run sequentially.
llvm-svn: 331642
|
|
|
|
|
|
|
|
| |
regions.
Added codegen for `simd reduction()` constructs in target directives.
llvm-svn: 331393
|
|
|
|
|
|
|
|
| |
Some symbols are not allowed to be used as names on some targets. Patch
ries to unify the emission of the names of LLVM globals so they could be
used on different targets.
llvm-svn: 331358
|
|
|
|
|
|
|
|
|
|
|
| |
NVPTX target.
When generating the wrapper function for the offloading region, we need
to call the outlined function and cast the arguments correctly to follow
the ABI. Usually, variables captured by value are casted to `uintptr_t`
type. But this should not performed for the variables with pointer type.
llvm-svn: 330620
|
|
|
|
| |
llvm-svn: 330154
|
|
|
|
|
|
| |
Added attributes for better optimization of the OpenMP code.
llvm-svn: 329751
|
|
|
|
|
|
|
| |
Added NUW flags for all the add|mul|sub operations + replaced sdiv by udiv
as we operate on unsigned values only (addresses, converted to integers)
llvm-svn: 329411
|
|
|
|
|
|
|
|
|
|
| |
variables.
Added emission of the offloading data sections for the variables within
declare target regions + fixes emission of the declare target variables
marked as declare target not within the declare target region.
llvm-svn: 328888
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
workers side
Summary: The workers also need to initialize the global stack. The call to the initialization function needs to happen after the kernel_init() function is called by the master. This ensures that the per-team data structures of the runtime have been initialized.
Reviewers: ABataev, grokos, carlo.bertolli, caomhin
Reviewed By: ABataev
Subscribers: jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D44749
llvm-svn: 328219
|
|
|
|
|
|
|
|
|
|
| |
constructs in generic mode.
Fixed codegen for distribute parallel combined constructs. We have to
pass and read the shared lower and upper bound from the distribute
region in the inner parallel region. Patch is for generic mode.
llvm-svn: 327990
|