| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch adds support for the target enter data directive code generation.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: cfe-commits, fraggamuffin, caomhin
Differential Revision: http://reviews.llvm.org/D17368
llvm-svn: 267812
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds support for the target data directive code generation.
Part of the already existent functionality related with data maps is moved to a new function so that it could be reused.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: cfe-commits, fraggamuffin, caomhin
Differential Revision: http://reviews.llvm.org/D17367
llvm-svn: 267811
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Implement codegen for the map clause. All the new list items in 4.5 specification are supported.
Fix bug in the generation of array sections that was exposed by some of the map clause tests: for pointer types the offsets have to be calculated from the pointee not the pointer.
Reviewers: hfinkel, kkwli0, carlo.bertolli, arpith-jacob, ABataev
Subscribers: ABataev, cfe-commits, caomhin, fraggamuffin
Differential Revision: http://reviews.llvm.org/D16749
llvm-svn: 267808
|
|
|
|
|
|
|
|
| |
Currently there is a problem with codegen of inlined directives inside
lambdas, it may cause a crash during codegen because of incorrect
capturing of variables. Patch fixes this problem.
llvm-svn: 267677
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using OpenMP tasks. The iterations are distributed across tasks created by the construct and scheduled to be executed.
The next code will be generated for the taskloop directive:
#pragma omp taskloop num_tasks(N) lastprivate(j)
for( i=0; i<N*GRAIN*STRIDE-1; i+=STRIDE ) {
int th = omp_get_thread_num();
#pragma omp atomic
counter++;
#pragma omp atomic
th_counter[th]++;
j = i;
}
Generated code:
task = __kmpc_omp_task_alloc(NULL,gtid,1,sizeof(struct
task),sizeof(struct shar),&task_entry);
psh = task->shareds;
psh->pth_counter = &th_counter;
psh->pcounter = &counter;
psh->pj = &j;
task->lb = 0;
task->ub = N*GRAIN*STRIDE-2;
task->st = STRIDE;
__kmpc_taskloop(
NULL, // location
gtid, // gtid
task, // task structure
1, // if clause value
&task->lb, // lower bound
&task->ub, // upper bound
STRIDE, // loop increment
0, // 1 if nogroup specified
2, // schedule type: 0-none, 1-grainsize, 2-num_tasks
N, // schedule value (ignored for type 0)
(void*)&__task_dup_entry // tasks duplication routine
);
llvm-svn: 267395
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM really wants a debug location on every inlinable call in a function
with debug info, because it otherwise cannot set up inlining debug info.
This change applies an artificial line 0 debug location (which is how
DWARF marks automatically generated code that has no corresponding
source code) to the .__kmpc_global_dtor_. functions to avoid the
LLVM Verifier complaining.
llvm-svn: 267369
|
|
|
|
|
|
|
|
| |
If the untied clause is present on a task construct, any thread in the
team can resume the task region after a suspension. Patch adds proper
codegen for untied tasks.
llvm-svn: 266853
|
|
|
|
|
|
| |
This reverts commit r266754.
llvm-svn: 266755
|
|
|
|
|
|
|
|
| |
If the untied clause is present on a task construct, any thread in the
team can resume the task region after a suspension. Patch adds proper
codegen for untied tasks.
llvm-svn: 266754
|
|
|
|
|
|
| |
This reverts commit 266722.
llvm-svn: 266724
|
|
|
|
|
|
| |
If the untied clause is present on a task construct, any thread in the team can resume the task region after a suspension. Patch adds proper codegen for untied tasks.
llvm-svn: 266722
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: See LLVM change D18775 for details, this change depends on it.
Reviewers: jyknight, reames
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18776
llvm-svn: 265569
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements the teams directive for the NVPTX backend. It is different from the host code generation path as it:
Does not call kmpc_fork_teams. All necessary teams and threads are started upon touching the target region, when launching a CUDA kernel, and their execution is coordinated through sequential and parallel regions within the target region.
Does not call kmpc_push_num_teams even if a num_teams of thread_limit clause is present. Setting the number of teams and the thread limit is implemented by the nvptx-related runtime.
Please note that I am now passing a Clang Expr * to emitPushNumTeams instead of the originally chosen llvm::Value * type. The reason for that is that I want to avoid emitting expressions for num_teams and thread_limit if they are not needed in the target region.
http://reviews.llvm.org/D17963
llvm-svn: 265304
|
|
|
|
|
|
|
|
|
|
| |
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264700
|
|
|
|
|
|
| |
Reverting because of failed tests.
llvm-svn: 264577
|
|
|
|
|
|
|
|
|
|
| |
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264576
|
|
|
|
|
|
| |
This reverts commit 3ee791165100607178073f14531a0dc90c622b36.
llvm-svn: 264570
|
|
|
|
|
|
|
|
|
|
| |
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264569
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
Reworked test case after buildbot failure on windows.
Updated patch to integrate r263837 and test case nvptx_target_firstprivate_codegen.cpp.
llvm-svn: 264018
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements the following aspects:
It extends sema to check that a variable is not reference in both a map clause and firstprivate or private. This is needed to ensure correct functioning at codegen level, apart from being useful for the user.
It implements firstprivate for target in codegen. The implementation applies to both host and nvptx devices.
It adds regression tests for codegen of firstprivate, host and device side when using the host as device, and nvptx side.
Please note that the regression test for nvptx codegen is missing VLAs. This is because VLAs currently require saving and restoring the stack which appears not to be a supported operation by nvptx backend.
It adds a check in sema regression tests for target map, firstprivate, and private clauses.
http://reviews.llvm.org/D18203
llvm-svn: 263837
|
|
|
|
| |
llvm-svn: 263784
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Reworked test case after buildbot failure on windows.
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
llvm-svn: 263783
|
|
|
|
|
|
|
|
| |
OpenMP 4.0 allows to define custom reduction operations using '#pragma
omp declare reduction' construct. Patch allows to use this custom
defined reduction operations in 'reduction' clauses.
llvm-svn: 263701
|
|
|
|
|
|
|
|
| |
This patch adds support for codegen of private clause of target and a regression test for host code generation, when the host is used as target device. I believe that code generation for nvptx backend would not require anything additional or different to what is done for the host.
http://reviews.llvm.org/D18105
llvm-svn: 263654
|
|
|
|
| |
llvm-svn: 263589
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
llvm-svn: 263587
|
|
|
|
| |
llvm-svn: 263555
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
llvm-svn: 263552
|
|
|
|
|
|
|
|
|
| |
As part of this, make the function-arrangement interfaces
a little simpler and more semantic.
NFC.
llvm-svn: 263191
|
|
|
|
|
|
|
|
| |
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.
http://reviews.llvm.org/D17170
llvm-svn: 262832
|
|
|
|
|
|
| |
Was causing a failure in one of the buildbot slaves.
llvm-svn: 262744
|
|
|
|
|
|
|
|
| |
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.
http://reviews.llvm.org/D17170
llvm-svn: 262741
|
|
|
|
|
|
|
| |
Emit function for 'combiner' part of 'declare reduction' construct and
'initialilzer' part, if any.
llvm-svn: 262699
|
|
|
|
| |
llvm-svn: 262652
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements the launching of a target region in the presence of a nested teams region, i.e calls tgt_target_teams with the required arguments gathered from the enclosed teams directive.
The actual codegen of the region enclosed by the teams construct will be contributed in a separate patch.
Reviewers: hfinkel, arpith-jacob, kkwli0, carlo.bertolli, ABataev
Subscribers: cfe-commits, caomhin, fraggamuffin
Differential Revision: http://reviews.llvm.org/D17019
llvm-svn: 262625
|
|
|
|
| |
llvm-svn: 261315
|
|
|
|
|
|
| |
Cleanup for upcoming Clang warning -Wcomma. No functionality change intended.
llvm-svn: 261271
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Unlike other outlined regions in OpenMP, offloading entry points have to have be visible (external linkage) for the device side. Using dots in the names of the entries can be therefore problematic for some toolchains, e.g. NVPTX.
Also the patch drops the column information in the unique name of the entry points. The parsing of directives ignore unknown tokens, preventing several target regions to be implemented in the same line. Therefore, the line information is sufficient for the name to be unique. Also, the preprocessor printer does not preserve the column information, causing offloading-entry detection issues if the host uses an integrated preprocessor and the target doesn't (or vice versa).
Reviewers: hfinkel, arpith-jacob, carlo.bertolli, kkwli0, ABataev
Subscribers: cfe-commits, fraggamuffin, caomhin
Differential Revision: http://reviews.llvm.org/D17179
llvm-svn: 260837
|
|
|
|
|
|
|
|
| |
fixes.
Differential revision: http://reviews.llvm.org/D17060
llvm-svn: 260414
|
|
|
|
|
|
| |
Codegen for array sections/array subscripts worked only for expressions with arrays as base. Patch fixes codegen for bases with pointer/reference types.
llvm-svn: 259776
|
|
|
|
|
|
| |
from Driver to Frontend.
llvm-svn: 259489
|
|
|
|
|
|
| |
Differential revision: http://reviews.llvm.org/D16567
llvm-svn: 258836
|
|
|
|
|
|
| |
OpenMP 4.5, alogn with array sections, allows to use variables of array type in reductions.
llvm-svn: 258804
|
|
|
|
|
|
| |
If 'sections' directive has only one sub-section, the code for 'single'-based directive was emitted. Removed this codegen, because it causes crashes in different cases.
llvm-svn: 258495
|
|
|
|
|
|
| |
reworked codegen for reduction operation for complex types to avoid crash
llvm-svn: 258394
|
|
|
|
|
|
| |
Allow to emit code for 'cancel' directive within 'sections' directive with single sub-section.
llvm-svn: 258307
|
|
|
|
|
|
|
| |
The referenced llvm::function_ref<void(CodeGenFunction &)> object can go
away before CodeGen is used, resulting in a crash.
llvm-svn: 257516
|
|
|
|
|
|
|
|
|
|
|
|
| |
device codegen.
This patch attempts to fix the regressions identified when the patch was committed initially.
Thanks to Michael Liao for identifying the fix in the offloading metadata generation
related with side effects in evaluation of function arguments.
llvm-svn: 256933
|
|
|
|
|
|
|
|
| |
device codegen.
It was causing two regression, so I'm reverting until the cause is found.
llvm-svn: 256858
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In order to offloading work properly two things need to be in place:
- a descriptor with all the offloading information (device entry functions, and global variable) has to be created by the host and registered in the OpenMP offloading runtime library.
- all the device functions need to be emitted for the device and a convention has to be in place so that the runtime library can easily map the host ID of an entry point with the actual function in the device.
This patch adds support for these two things. However, only entry functions are being registered given that 'declare target' directive is not yet implemented.
About offloading descriptor:
The details of the descriptor are explained with more detail in http://goo.gl/L1rnKJ. Basically the descriptor will have fields that specify the number of devices, the pointers to where the device images begin and end (that will be defined by the linker), and also pointers to a the begin and end of table whose entries contain information about a specific entry point. Each entry has the type:
```
struct __tgt_offload_entry{
void *addr;
char *name;
int64_t size;
};
```
and will be implemented in a pre determined (ELF) section `.omp_offloading.entries` with 1-byte alignment, so that when all the objects are linked, the table is in that section with no padding in between entries (will be like a C array). The code generation ensures that all `__tgt_offload_entry` entries are emitted in the same order for both host and device so that the runtime can have the corresponding entries in both host and device in same index of the table, and efficiently implement the mapping.
The resulting descriptor is registered/unregistered with the runtime library using the calls `__tgt_register_lib` and `__tgt_unregister_lib`. The registration is implemented in a high priority global initializer so that the registration happens always before any initializer (that can potentially include target regions) is run.
The driver flag -omptargets= was created to specify a comma separated list of devices the user wants to support so that the new functionality can be exercised. Each device is specified with its triple.
About target codegen:
The target codegen is pretty much straightforward as it reuses completely the logic of the host version for the same target region. The tricky part is to identify the meaningful target regions in the device side. Unlike other programming models, like CUDA, there are no already outlined functions with attributes that mark what should be emitted or not. So, the information on what to emit is passed in the form of metadata in host bc file. This requires a new option to pass the host bc to the device frontend. Then everything is similar to what happens in CUDA: the global declarations emission is intercepted to check to see if it is an "interesting" declaration. The difference is that instead of checking an attribute, the metadata information in checked. Right now, there is only a form of metadata to pass information about the device entry points (target regions). A class `OffloadEntriesInfoManagerTy` was created to manage all the information and queries related with the metadata. The metadata looks like this:
```
!omp_offload.info = !{!0, !1, !2, !3, !4, !5, !6}
!0 = !{i32 0, i32 52, i32 77426347, !"_ZN2S12r1Ei", i32 479, i32 13, i32 4}
!1 = !{i32 0, i32 52, i32 77426347, !"_ZL7fstatici", i32 461, i32 11, i32 5}
!2 = !{i32 0, i32 52, i32 77426347, !"_Z9ftemplateIiET_i", i32 444, i32 11, i32 6}
!3 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 99, i32 11, i32 0}
!4 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 272, i32 11, i32 3}
!5 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 127, i32 11, i32 1}
!6 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 159, i32 11, i32 2}
```
The fields in each metadata entry are (in sequence):
Entry 1) an ID of the type of metadata - right now only zero is used meaning "OpenMP target region".
Entry 2) a unique ID of the device where the input source file that contain the target region lives.
Entry 3) a unique ID of the file where the input source file that contain the target region lives.
Entry 4) a mangled name of the function that encloses the target region.
Entries 5) and 6) line and column number where the target region was found.
Entry 7) is the order the entry was emitted.
Entry 2) and 3) are required to distinguish files that have the same function name.
Entry 4) is required to distinguish different instances of the same declaration (usually templated ones)
Entries 5) and 6) are required to distinguish the particular target region in body of the function (it is possible that a given target region is not an entry point - if clause can evaluate always to zero - and therefore we need to identify the "interesting" target regions. )
This patch replaces http://reviews.llvm.org/D12306.
Reviewers: ABataev, hfinkel, tra, rjmccall, sfantao
Subscribers: FBrygidyn, piotr.rak, Hahnfeld, cfe-commits
Differential Revision: http://reviews.llvm.org/D12614
llvm-svn: 256842
|