| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
Used incorrect types for target specific parameters in debug mode,
should use original pointers rather than the pointee types.
llvm-svn: 313186
|
|
|
|
|
|
|
|
| |
When translating arguments for NVPTX target it is not taken into account
that function may have variable number of arguments. Patch fixes this
problem.
llvm-svn: 310920
|
|
|
|
|
|
| |
General improvement of the outlined functions calls.
llvm-svn: 310840
|
|
|
|
|
|
|
|
|
|
|
| |
Arguments, passed to the outlined function, must have correct address
space info for proper Debug info support. Patch sets global address
space for arguments that are mapped and passed by reference.
Also, cuda-gdb does not handle reference types correctly, so reference
arguments are represented as pointers.
llvm-svn: 310387
|
|
|
|
|
|
| |
This reverts commit r310377.
llvm-svn: 310379
|
|
|
|
|
|
|
|
|
|
|
| |
Arguments, passed to the outlined function, must have correct address
space info for proper Debug info support. Patch sets global address
space for arguments that are mapped and passed by reference.
Also, cuda-gdb does not handle reference types correctly, so reference
arguments are represented as pointers.
llvm-svn: 310377
|
|
|
|
|
|
| |
This reverts commit r310360.
llvm-svn: 310364
|
|
|
|
|
|
|
|
|
|
|
| |
Arguments, passed to the outlined function, must have correct address
space info for proper Debug info support. Patch sets global address
space for arguments that are mapped and passed by reference.
Also, cuda-gdb does not handle reference types correctly, so reference
arguments are represented as pointers.
llvm-svn: 310360
|
|
|
|
|
|
| |
This reverts commit r310104.
llvm-svn: 310135
|
|
|
|
|
|
|
|
|
|
|
| |
Arguments, passed to the outlined function, must have correct address
space info for proper Debug info support. Patch sets global address
space for arguments that are mapped and passed by reference.
Also, cuda-gdb does not handle reference types correctly, so reference
arguments are represented as pointers.
llvm-svn: 310104
|
|
|
|
| |
llvm-svn: 310098
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If the first parameter of the function is the ImplicitParamDecl, codegen
automatically marks it as an implicit argument with `this` or `self`
pointer. Added internal kind of the ImplicitParamDecl to separate
'this', 'self', 'vtt' and other implicit parameters from other kind of
parameters.
Reviewers: rjmccall, aaron.ballman
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D33735
llvm-svn: 305075
|
|
|
|
|
|
|
|
|
|
|
| |
Amongst other, this will help LTO to correctly handle/honor files
compiled with O0, helping debugging failures.
It also seems in line with how we handle other options, like how
-fnoinline adds the appropriate attribute as well.
Differential Revision: https://reviews.llvm.org/D28404
llvm-svn: 304127
|
|
|
|
| |
llvm-svn: 304028
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements codegen for the reduction clause on
any teams construct for elementary data types. It builds
on parallel reductions on the GPU. Subsequently,
the team master writes to a unique location in a global
memory scratchpad. The last team to do so loads and
reduces this array to calculate the final result.
This patch emits two helper functions that are used by
the OpenMP runtime on the GPU to perform reductions across
teams.
Patch by Tian Jin in collaboration with Arpith Jacob
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29879
llvm-svn: 295335
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types. An efficient
implementation requires hierarchical reduction within a
warp and a threadblock. It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.
The patch creates a struct to hold reduction variables and
a number of helper functions. The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team. Variables are
shared between CUDA threads using shuffle intrinsics.
An implementation of reductions on the NVPTX device is
substantially different to that of CPUs. However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.
The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions. The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.
While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.
Patch by Tian Jin in collaboration with Arpith Jacob
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758
llvm-svn: 295333
|
|
|
|
| |
llvm-svn: 295323
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types. An efficient
implementation requires hierarchical reduction within a
warp and a threadblock. It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.
The patch creates a struct to hold reduction variables and
a number of helper functions. The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team. Variables are
shared between CUDA threads using shuffle intrinsics.
An implementation of reductions on the NVPTX device is
substantially different to that of CPUs. However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.
The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions. The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.
While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.
Patch by Tian Jin in collaboration with Arpith Jacob
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758
llvm-svn: 295319
|
|
|
|
|
|
|
|
|
|
| |
This is a simple patch to teach OpenMP codegen to emit the construct
in Generic mode.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29143
llvm-svn: 293183
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
device.
This patch adds support for the proc_bind clause on the Spmd construct
'target parallel' on the NVPTX device. Since the parallel region is created
upon kernel launch, this clause can be safely ignored on the NVPTX device at
codegen time for level 0 parallelism.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29128
llvm-svn: 293069
|
|
|
|
|
|
| |
patches.
llvm-svn: 293003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for codegen of 'target teams' on the host.
This combined directive has two captured statements, one for the
'teams' region, and the other for the 'parallel'.
This target teams region is offloaded using the __tgt_target_teams()
call. The patch sets the number of teams as an argument to
this call.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29084
llvm-svn: 293001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NVPTX device.
This patch adds support for the Spmd construct 'target parallel' on the
NVPTX device. This involves ignoring the num_threads clause on the device
since the number of threads in this combined construct is already set on
the host through the call to __tgt_target_teams().
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29083
llvm-svn: 292999
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds codegen for the 'target parallel' directive on the NVPTX
device. We term offload OpenMP directives such as 'target parallel' and
'target teams distribute parallel for' as SPMD constructs. SPMD constructs,
in contrast to Generic ones like the plain 'target', can never contain
a serial region.
SPMD constructs can be handled more efficiently on the GPU and do not
require the Warp Loop of the Generic codegen scheme. This patch adds
SPMD codegen support for 'target parallel' on the NVPTX device and can
be reused for other SPMD constructs.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28755
llvm-svn: 292428
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements. Support for this functionality is included in
the patch.
A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region. Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp). For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not. The patch adds support for handling multiple captured
statements based on the combined directive.
When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753
llvm-svn: 292419
|
|
|
|
| |
llvm-svn: 292400
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements. Support for this functionality is included in
the patch.
A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region. Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp). For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not. The patch adds support for handling multiple captured
statements based on the combined directive.
When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753
llvm-svn: 292374
|
|
|
|
| |
llvm-svn: 291939
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NVPTX device
Summary:
This patch introduces support for the execution of parallel constructs in a target
region on the NVPTX device. Parallel regions must be in the lexical scope of the
target directive.
The master thread in the master warp signals parallel work for worker threads in worker
warps on encountering a parallel region.
Note: The patch does not yet support capture of arguments in a parallel region so
the test cases are simple.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28145
llvm-svn: 291565
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds two fields to the offload entry descriptor. One field is meant to signal Ctors/Dtors and `link` global variables, and the other is reserved for runtime library use.
Currently, these fields are only filled with zeros in the current code generation, but that will change when `declare target` is added.
The reason, we are adding these fields now is to make the code generation consistent with the runtime library proposal under review in https://reviews.llvm.org/D14031.
Reviewers: ABataev, hfinkel, carlo.bertolli, kkwli0, arpith-jacob, Hahnfeld
Subscribers: cfe-commits, caomhin, jholewinski
Differential Revision: https://reviews.llvm.org/D28298
llvm-svn: 291124
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch includes updates for codegen of the target region for the NVPTX
device. It moves initializers from the compiler to the runtime and updates
the worker loop to assume parallel work is retrieved from the runtime. A
subsequent patch will update the codegen to retrieve the parallel work using
calls to the runtime. It includes the removal of the inline attribute
for the worker loop and disabling debug info in it.
This allows codegen for a target directive and serial execution on the
NVPTX device.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28125
llvm-svn: 291121
|
|
|
|
| |
llvm-svn: 290989
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch includes updates for codegen of the target region for the NVPTX
device. It moves initializers from the compiler to the runtime and updates
the worker loop to assume parallel work is retrieved from the runtime. A
subsequent patch will update the codegen to retrieve the parallel work using
calls to the runtime. It includes the removal of the inline attribute
for the worker loop and disabling debug info in it.
This allows codegen for a target directive and serial execution on the
NVPTX device.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28125
llvm-svn: 290983
|
|
|
|
|
|
|
|
|
|
|
| |
This patch cleans up private methods for NVPTX OpenMP codegen. It converts private
members to static functions to follow the coding style of CGOpenMPRuntime.cpp and
declutter the header file.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28124
llvm-svn: 290904
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-fno-inline-functions, -O0, and optnone.
These were really, really tangled together:
- We used the noinline LLVM attribute for -fno-inline
- But not for -fno-inline-functions (breaking LTO)
- But we did use it for -finline-hint-functions (yay, LTO is happy!)
- But we didn't for -O0 (LTO is sad yet again...)
- We had weird structuring of CodeGenOpts with both an inlining
enumeration and a boolean. They interacted in weird ways and
needlessly.
- A *lot* of set smashing went on with setting these, and then got worse
when we considered optnone and other inlining-effecting attributes.
- A bunch of inline affecting attributes were managed in a completely
different place from -fno-inline.
- Even with -fno-inline we failed to put the LLVM noinline attribute
onto many generated function definitions because they didn't show up
as AST-level functions.
- If you passed -O0 but -finline-functions we would run the normal
inliner pass in LLVM despite it being in the O0 pipeline, which really
doesn't make much sense.
- Lastly, we used things like '-fno-inline' to manipulate the pass
pipeline which forced the pass pipeline to be much more
parameterizable than it really needs to be. Instead we can *just* use
the optimization level to select a pipeline and control the rest via
attributes.
Sadly, this causes a bunch of churn in tests because we don't run the
optimizer in the tests and check the contents of attribute sets. It
would be awesome if attribute sets were a bit more FileCheck friendly,
but oh well.
I think this is a significant improvement and should remove the semantic
need to change what inliner pass we run in order to comply with the
requested inlining semantics by relying completely on attributes. It
also cleans up tho optnone and related handling a bit.
One unfortunate aspect of this is that for generating alwaysinline
routines like those in OpenMP we end up removing noinline and then
adding alwaysinline. I tried a bunch of other approaches, but because we
recompute function attributes from scratch and don't have a declaration
here I couldn't find anything substantially cleaner than this.
Differential Revision: https://reviews.llvm.org/D28053
llvm-svn: 290398
|
|
|
|
|
|
|
|
|
|
| |
program
Offload related code is not quite ready yet, but some simple examples
must not crash the compiler. Patch fixes the problem in offloading code
with exceptions.
llvm-svn: 290364
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements the teams directive for the NVPTX backend. It is different from the host code generation path as it:
Does not call kmpc_fork_teams. All necessary teams and threads are started upon touching the target region, when launching a CUDA kernel, and their execution is coordinated through sequential and parallel regions within the target region.
Does not call kmpc_push_num_teams even if a num_teams of thread_limit clause is present. Setting the number of teams and the thread limit is implemented by the nvptx-related runtime.
Please note that I am now passing a Clang Expr * to emitPushNumTeams instead of the originally chosen llvm::Value * type. The reason for that is that I want to avoid emitting expressions for num_teams and thread_limit if they are not needed in the target region.
http://reviews.llvm.org/D17963
llvm-svn: 265304
|
|
|
|
|
|
|
|
|
|
| |
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264700
|
|
|
|
|
|
| |
Reverting because of failed tests.
llvm-svn: 264577
|
|
|
|
|
|
|
|
|
|
| |
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264576
|
|
|
|
|
|
| |
This reverts commit 3ee791165100607178073f14531a0dc90c622b36.
llvm-svn: 264570
|
|
|
|
|
|
|
|
|
|
| |
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264569
|
|
|
|
| |
llvm-svn: 264035
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
Reworked test case after buildbot failure on windows.
Updated patch to integrate r263837 and test case nvptx_target_firstprivate_codegen.cpp.
llvm-svn: 264018
|
|
|
|
| |
llvm-svn: 263784
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Reworked test case after buildbot failure on windows.
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
llvm-svn: 263783
|
|
|
|
| |
llvm-svn: 263589
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
llvm-svn: 263587
|
|
|
|
| |
llvm-svn: 263555
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
llvm-svn: 263552
|