summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen/CGOpenMPRuntime.h
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix typos in clangAlexander Kornienko2018-04-061-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Found via codespell -q 3 -I ../clang-whitelist.txt Where whitelist consists of: archtype cas classs checkk compres definit frome iff inteval ith lod methode nd optin ot pres statics te thru Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few files that have dubious fixes reverted.) Differential revision: https://reviews.llvm.org/D44188 llvm-svn: 329399
* [OPENMP] Added emission of offloading data sections for declare targetAlexey Bataev2018-03-301-50/+131
| | | | | | | | | | variables. Added emission of the offloading data sections for the variables within declare target regions + fixes emission of the declare target variables marked as declare target not within the declare target region. llvm-svn: 328888
* [OPENMP] Codegen for ctor|dtor of declare target variables.Alexey Bataev2018-03-281-19/+40
| | | | | | | | When the declare target variables are emitted for the device, constructors|destructors for these variables must emitted and registered by the runtime in the offloading sections. llvm-svn: 328705
* [OPENMP] Codegen for declare target with link clause.Alexey Bataev2018-03-261-0/+5
| | | | | | | | If the link clause is used on the declare target directive, the object should be linked on target or target data directives, not during the codegen. Patch adds support for this clause. llvm-svn: 328544
* [OPENMP, NVPTX] Emit correct thread id.Alexey Bataev2018-03-191-4/+4
| | | | | | | We emitted fake thread id for the outined function in NVPTX codegen. Patch adds emission of the real thread id. llvm-svn: 327867
* [OPENMP] Codegen for `omp declare target` construct.Alexey Bataev2018-03-151-0/+20
| | | | | | | | Added initial codegen for device side of declarations inside `omp declare target` construct + codegen for implicit `declare target` functions, which are used in the target regions. llvm-svn: 327636
* [OpenMP] Add OpenMP data sharing infrastructure using global memoryGheorghe-Teodor Bercea2018-03-141-1/+9
| | | | | | | | | | | | | | | | | Summary: This patch handles the Clang code generation phase for the OpenMP data sharing infrastructure. TODO: add a more detailed description. Reviewers: ABataev, carlo.bertolli, caomhin, hfinkel, Hahnfeld Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D43660 llvm-svn: 327513
* [OPENMP] Fix generation of the unique names for task reductionAlexey Bataev2018-03-061-0/+2
| | | | | | | | | | | variables. If the task has reduction construct and this construct for some variable requires unique threadprivate storage, we may generate different names for variables used in taskgroup task_reduction clause and in task in_reduction clause. Patch fixes this problem. llvm-svn: 326827
* [OPENMP] Require valid SourceLocation in function call, NFC.Alexey Bataev2018-02-221-3/+2
| | | | | | | Removed default empty SourceLocation argument from `emitCall` function and require valid location. llvm-svn: 325812
* [OPENMP] Add codegen for `depend` clauses on `target` directive.Alexey Bataev2018-01-151-6/+2
| | | | | | | Added basic support for codegen of `depend` clauses on `target` directive. llvm-svn: 322501
* [OPENMP] Add debug info for generated functions.Alexey Bataev2018-01-041-1/+2
| | | | | | | Most of the generated functions for the OpenMP were generated with disabled debug info. Patch fixes this for better user experience. llvm-svn: 321816
* [OPENMP] Support for -fopenmp-simd option with compilation of simd loopsAlexey Bataev2017-12-291-0/+565
| | | | | | | | | only. Added support for -fopenmp-simd option that allows compilation of simd-based constructs without emission of OpenMP runtime calls. llvm-svn: 321560
* [OPENMP] Fix PR34916: Crash on mixing taskloop|tasks directives.Alexey Bataev2017-10-111-0/+4
| | | | | | | | If both taskloop and task directives are used at the same time in one program, we may ran into the situation when the particular type for task directive is reused for taskloop directives. Patch fixes this problem. llvm-svn: 315464
* [OPENMP] Fix for PR33922: New ident_t flags forAlexey Bataev2017-09-061-1/+3
| | | | | | | | | | | | __kmpc_for_static_fini(). Added special flags for calls of __kmpc_for_static_fini(), like previous ly for __kmpc_for_static_init(). Added flag OMP_IDENT_WORK_DISTRIBUTE for distribute cnstruct, OMP_IDENT_WORK_SECTIONS for sections-based constructs and OMP_IDENT_WORK_LOOP for loop-based constructs in location flags. llvm-svn: 312642
* [OPENMP] Fix for PR33922: New ident_t flags forAlexey Bataev2017-08-141-35/+38
| | | | | | | | | | | | | __kmpc_for_static_init(). OpenMP 5.0 will include OpenMP Tools interface that requires distinguishing different worksharing constructs. Since the same entry point (__kmp_for_static_init(ident_t *loc, kmp_int32 global_tid,........)) is called in case static loop/sections/distribute it is suggested using 'flags' field of the ident_t structure to pass the type of the construct. llvm-svn: 310865
* [OPENMP] Generalization of calls of the outlined functions.Alexey Bataev2017-08-141-1/+7
| | | | | | General improvement of the outlined functions calls. llvm-svn: 310840
* [OPENMP][DEBUG] Set proper address space info if required by target.Alexey Bataev2017-08-081-0/+17
| | | | | | | | | | | Arguments, passed to the outlined function, must have correct address space info for proper Debug info support. Patch sets global address space for arguments that are mapped and passed by reference. Also, cuda-gdb does not handle reference types correctly, so reference arguments are represented as pointers. llvm-svn: 310387
* Revert "[OPENMP][DEBUG] Set proper address space info if required by target."Alexey Bataev2017-08-081-26/+0
| | | | | | This reverts commit r310377. llvm-svn: 310379
* [OPENMP][DEBUG] Set proper address space info if required by target.Alexey Bataev2017-08-081-0/+26
| | | | | | | | | | | Arguments, passed to the outlined function, must have correct address space info for proper Debug info support. Patch sets global address space for arguments that are mapped and passed by reference. Also, cuda-gdb does not handle reference types correctly, so reference arguments are represented as pointers. llvm-svn: 310377
* Revert "[OPENMP][DEBUG] Set proper address space info if required by target."Alexey Bataev2017-08-081-26/+0
| | | | | | This reverts commit r310360. llvm-svn: 310364
* [OPENMP][DEBUG] Set proper address space info if required by target.Alexey Bataev2017-08-081-0/+26
| | | | | | | | | | | Arguments, passed to the outlined function, must have correct address space info for proper Debug info support. Patch sets global address space for arguments that are mapped and passed by reference. Also, cuda-gdb does not handle reference types correctly, so reference arguments are represented as pointers. llvm-svn: 310360
* Revert "[OPENMP][DEBUG] Set proper address space info if required by target."Alexey Bataev2017-08-041-27/+0
| | | | | | This reverts commit r310104. llvm-svn: 310135
* [OPENMP][DEBUG] Set proper address space info if required by target.Alexey Bataev2017-08-041-0/+27
| | | | | | | | | | | Arguments, passed to the outlined function, must have correct address space info for proper Debug info support. Patch sets global address space for arguments that are mapped and passed by reference. Also, cuda-gdb does not handle reference types correctly, so reference arguments are represented as pointers. llvm-svn: 310104
* [OPENMP] Unify generation of outlined function calls.Alexey Bataev2017-08-041-0/+6
| | | | llvm-svn: 310098
* [OPENMP] Codegen for reduction clauses in 'taskloop' directives.Alexey Bataev2017-07-171-6/+71
| | | | | | Adds codegen for taskloop-based directives. llvm-svn: 308174
* Fix spelling mistakes in comments. NFCI.Simon Pilgrim2017-07-131-14/+14
| | | | llvm-svn: 307932
* Fix -Wdocumentation warning. NFCISimon Pilgrim2017-07-131-1/+0
| | | | llvm-svn: 307931
* [OPENMP] Generalization of codegen for reduction clauses.Alexey Bataev2017-07-131-0/+80
| | | | | | | Reworked codegen for reduction clauses for future support of reductions in task-based directives. llvm-svn: 307910
* Recommit ofCarlo Bertolli2017-04-251-3/+37
| | | | | | | | | | | | | | [OpenMP] Initial implementation of code generation for pragma 'distribute parallel for' on host https://reviews.llvm.org/D29508 This patch makes the following additions: It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation. It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses. It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code. llvm-svn: 301340
* Revert r301223Carlo Bertolli2017-04-241-37/+3
| | | | llvm-svn: 301233
* [OpenMP] Initial implementation of code generation for pragma 'distribute ↵Carlo Bertolli2017-04-241-3/+37
| | | | | | | | | | | | | | | | | parallel for' on host https://reviews.llvm.org/D29508 This patch makes the following additions: 1. It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation. 2. It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses. It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code. Looking forward to comments. llvm-svn: 301223
* Spelling mistakes in comments. NFCI. (PR27635)Simon Pilgrim2017-03-301-1/+1
| | | | llvm-svn: 299083
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-3/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295333
* Revert r295319 while investigating buildbot failure.Arpith Chacko Jacob2017-02-161-33/+3
| | | | llvm-svn: 295323
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-3/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295319
* [OpenMP] Codegen support for 'target parallel' on the host.Arpith Chacko Jacob2017-01-181-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292419
* Revert r292374 to debug Windows buildbot failure.Arpith Chacko Jacob2017-01-181-14/+1
| | | | llvm-svn: 292400
* [OpenMP] Codegen support for 'target parallel' on the host.Arpith Chacko Jacob2017-01-181-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292374
* [OpenMP] Basic support for a parallel directive in a target region on an ↵Arpith Chacko Jacob2017-01-101-13/+29
| | | | | | | | | | | | | | | | | | | | | NVPTX device Summary: This patch introduces support for the execution of parallel constructs in a target region on the NVPTX device. Parallel regions must be in the lexical scope of the target directive. The master thread in the master warp signals parallel work for worker threads in worker warps on encountering a parallel region. Note: The patch does not yet support capture of arguments in a parallel region so the test cases are simple. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28145 llvm-svn: 291565
* [OpenMP] Add fields for flags in the offload entry descriptor.Samuel Antao2017-01-051-14/+22
| | | | | | | | | | | | | | | | | Summary: This patch adds two fields to the offload entry descriptor. One field is meant to signal Ctors/Dtors and `link` global variables, and the other is reserved for runtime library use. Currently, these fields are only filled with zeros in the current code generation, but that will change when `declare target` is added. The reason, we are adding these fields now is to make the code generation consistent with the runtime library proposal under review in https://reviews.llvm.org/D14031. Reviewers: ABataev, hfinkel, carlo.bertolli, kkwli0, arpith-jacob, Hahnfeld Subscribers: cfe-commits, caomhin, jholewinski Differential Revision: https://reviews.llvm.org/D28298 llvm-svn: 291124
* [OpenMP] Codegen for use_device_ptr clause.Samuel Antao2016-07-281-4/+46
| | | | | | | | | | | | Summary: This patch adds support for the use_device_ptr clause. It includes changes in SEMA that could not be tested without codegen, namely, the use of the first private logic and mappable expressions support. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: https://reviews.llvm.org/D22691 llvm-svn: 276977
* [OpenMP] Codegen for target update directive.Samuel Antao2016-05-261-6/+6
| | | | | | | | | | | | Summary: This patch implements the code generation for the `target update` directive. The implemntation relies on the logic already in place for target data standalone directives, i.e. target enter/exit data. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: http://reviews.llvm.org/D20650 llvm-svn: 270886
* [OPENMP 4.5] Codegen for dacross loop synchronization constructs.Alexey Bataev2016-05-251-0/+17
| | | | | | | OpenMP 4.5 adds support for doacross loop synchronization. Patch implements codegen for this construct. llvm-svn: 270690
* [OPENMP 4.5] Initial codegen for 'priority' clause in task-basedAlexey Bataev2016-05-101-1/+1
| | | | | | | | | directives. OpenMP 4.5 supports clause 'priority' in task-based directives. Patch adds initial codegen support for this clause in codegen. llvm-svn: 269050
* [OPENMP 4.0] Fixed codegen for destructors in task-based directives.Alexey Bataev2016-05-101-0/+1
| | | | | | | | If private variables require destructors call at the deletion of the task, additional flag in task flags must be set. Patch fixes this problem. llvm-svn: 269039
* [OPENMP 4.5] Add codegen support in runtime for '[non]monotonic'Alexey Bataev2016-05-101-7/+6
| | | | | | | | | | schedule modifiers. Runtime library expects some additional data in schedule argument for loop-based directives, that have additional schedule modifiers 'monotonic|nonmonotonic'. llvm-svn: 269035
* [OPENMP 4.0] Codegen for 'declare simd' directive.Alexey Bataev2016-05-061-0/+7
| | | | | | | | | OpenMP 4.0 adds support for elemental functions using declarative directive '#pragma omp declare simd'. Patch adds mangling for simd functions in accordance with https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt llvm-svn: 268721
* [OPENMP 4.5] Codegen for 'lastprivate' clauses in 'taskloop' directives.Alexey Bataev2016-05-051-0/+3
| | | | | | | OpenMP 4.5 adds taskloop/taskloop simd directives. These directives allow to use lastprivate clause. Patch adds codegen for this clause. llvm-svn: 268618
* [OPENMP] Simplified interface for codegen of tasks, NFC.Alexey Bataev2016-04-281-93/+36
| | | | | | | Reduced number of arguments in member functions of runtime support library for task-based directives. llvm-svn: 267863
* [OPENMP 4.5] Codegen for 'grainsize/num_tasks' clauses of 'taskloop'Alexey Bataev2016-04-281-3/+7
| | | | | | | | | | | | | | | | | | | | | | | directive. OpenMP 4.5 defines 'taskloop' directive and 2 additional clauses 'grainsize' and 'num_tasks' for this directive. Patch adds codegen for these clauses. These clauses are generated as arguments of the '__kmpc_taskloop' libcall and are encoded the following way: void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val, kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup, int sched, kmp_uint64 grainsize, void *task_dup); If 'grainsize' is specified, 'sched' argument must be set to '1' and 'grainsize' argument must be set to the value of the 'grainsize' clause. If 'num_tasks' is specified, 'sched' argument must be set to '2' and 'grainsize' argument must be set to the value of the 'num_tasks' clause. It is possible because these 2 clauses are mutually exclusive and can't be used at the same time on the same directive. If none of these clauses is specified, 'sched' argument must be set to '0'. llvm-svn: 267862
OpenPOWER on IntegriCloud