summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen/CGStmtOpenMP.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [OPENMP][DEBUG] Fix for PR33676: Debug info for OpenMP region is broken.Alexey Bataev2017-08-141-2/+2
| | | | | | | After some changes in clang/LLVM debug info for task-based regions was not generated at all. Patch fixes this problem. llvm-svn: 310850
* [OPENMP] Generalization of calls of the outlined functions.Alexey Bataev2017-08-141-5/+7
| | | | | | General improvement of the outlined functions calls. llvm-svn: 310840
* [OPENMP] Emit non-debug version of outlined functions with originalAlexey Bataev2017-08-091-16/+15
| | | | | | | | | | | name. If the host code is compiled with the debug info, while the target without, there is a problem that the compiler is unable to find the debug wrapper. Patch fixes this problem by emitting special name for the debug version of the code. llvm-svn: 310511
* [OPENMP][DEBUG] Set proper address space info if required by target.Alexey Bataev2017-08-081-13/+33
| | | | | | | | | | | Arguments, passed to the outlined function, must have correct address space info for proper Debug info support. Patch sets global address space for arguments that are mapped and passed by reference. Also, cuda-gdb does not handle reference types correctly, so reference arguments are represented as pointers. llvm-svn: 310387
* Revert "[OPENMP][DEBUG] Set proper address space info if required by target."Alexey Bataev2017-08-081-51/+18
| | | | | | This reverts commit r310377. llvm-svn: 310379
* [OPENMP][DEBUG] Set proper address space info if required by target.Alexey Bataev2017-08-081-18/+51
| | | | | | | | | | | Arguments, passed to the outlined function, must have correct address space info for proper Debug info support. Patch sets global address space for arguments that are mapped and passed by reference. Also, cuda-gdb does not handle reference types correctly, so reference arguments are represented as pointers. llvm-svn: 310377
* Revert "[OPENMP][DEBUG] Set proper address space info if required by target."Alexey Bataev2017-08-081-44/+13
| | | | | | This reverts commit r310360. llvm-svn: 310364
* [OPENMP][DEBUG] Set proper address space info if required by target.Alexey Bataev2017-08-081-13/+44
| | | | | | | | | | | Arguments, passed to the outlined function, must have correct address space info for proper Debug info support. Patch sets global address space for arguments that are mapped and passed by reference. Also, cuda-gdb does not handle reference types correctly, so reference arguments are represented as pointers. llvm-svn: 310360
* Revert "[OPENMP][DEBUG] Set proper address space info if required by target."Alexey Bataev2017-08-041-42/+13
| | | | | | This reverts commit r310104. llvm-svn: 310135
* Revert "[OPENMP] Fix for pacify buildbots, NFC."Alexey Bataev2017-08-041-12/+18
| | | | | | This reverts commit r310120. llvm-svn: 310134
* [OPENMP] Fix for pacify buildbots, NFC.Alexey Bataev2017-08-041-18/+12
| | | | llvm-svn: 310120
* [OPENMP][DEBUG] Set proper address space info if required by target.Alexey Bataev2017-08-041-13/+42
| | | | | | | | | | | Arguments, passed to the outlined function, must have correct address space info for proper Debug info support. Patch sets global address space for arguments that are mapped and passed by reference. Also, cuda-gdb does not handle reference types correctly, so reference arguments are represented as pointers. llvm-svn: 310104
* [OPENMP] Unify generation of outlined function calls.Alexey Bataev2017-08-041-3/+4
| | | | llvm-svn: 310098
* [OPENMP] Change the name of outer non-debug function in debug mode, NFC.Alexey Bataev2017-07-311-2/+4
| | | | llvm-svn: 309575
* [OPENMP] Codegen for 'in_reduction' clause.Alexey Bataev2017-07-271-0/+50
| | | | | | | | | | | | | | | | | | Added codegen for task-based directive with in_reduction clause. ``` <body> ``` The next code is emitted: ``` void *td; ... td = call i8* @__kmpc_task_reduction_init(); ... <type> *priv = (<type> *)call i8* @__kmpc_task_reduction_get_th_data(i32 GTID, i8* td, i8* <orig>) ``` llvm-svn: 309270
* [OPENMP] Codegen for 'task_reduction' clause.Alexey Bataev2017-07-251-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added codegen for taskgroup directive with task_reduction clause. ``` <body> ``` The next code is emitted: ``` %struct.kmp_task_red_input_t red_init[n]; void *td; call void @__kmpc_taskgroup(%ident_t id, i32 gtid) ... red_init[i].shar = &<item>; red_init[i].size = sizeof(<item>); red_init[i].init = (void*)initializer_function; red_init[i].fini = (void*)destructor_function; red_init[i].comb = (void*)combiner_function; red_init[i].flags = flags; ... td = call i8* @__kmpc_task_reduction_init(i32 gtid, i32 n, i8* (void*)red_init); call void @__kmpc_end_taskgroup(%ident_t id, i32 gtid) void initializer_function(i8* priv) { *(<type>*)priv = <red_init>; ret void; } void destructor_function(i8* priv) { (<type>*)priv->~(); ret void; } void combiner_function(i8* inout, i8* in) { *(<type>*)inout = *(<type>*)inout <red_id> *(<type>*)in; ret void; } ``` llvm-svn: 308979
* [OPENMP] Initial support for 'in_reduction' clause.Alexey Bataev2017-07-211-0/+1
| | | | | | | Parsing/sema analysis for 'in_reduction' clause for task-based directives. llvm-svn: 308768
* [OPENMP] Initial support for 'task_reduction' clause.Alexey Bataev2017-07-181-0/+1
| | | | | | Parsing/sema analysis of the 'task_reduction' clause. llvm-svn: 308352
* [OPENMP] Codegen for reduction clauses in 'taskloop' directives.Alexey Bataev2017-07-171-1/+50
| | | | | | Adds codegen for taskloop-based directives. llvm-svn: 308174
* Change dyn_casts with unused variables to isa statements to avoid unused ↵Eric Christopher2017-07-141-2/+2
| | | | | | variables. llvm-svn: 307988
* [OPENMP] Generalization of codegen for reduction clauses.Alexey Bataev2017-07-131-387/+90
| | | | | | | Reworked codegen for reduction clauses for future support of reductions in task-based directives. llvm-svn: 307910
* [OPENMP] Emit implicit taskgroup block around taskloop directives.Alexey Bataev2017-07-121-1/+12
| | | | | | | | | If taskloop directive has no associated nogroup clause, it must emitted inside implicit taskgroup block. Runtime supports it, but we need to generate implicit taskgroup block explicitly to support future reductions codegen. llvm-svn: 307822
* [OPENMP][DEBUG] Generate second function with correct arg types.Alexey Bataev2017-06-291-50/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if the some of the parameters are captured by value, this argument is converted to uintptr_t type and thus we loosing the debug info about real type of the argument (captured variable): ``` void @.outlined_function.(uintptr %par); ... %a = alloca i32 %a.casted = alloca uintptr %cast = bitcast uintptr* %a.casted to i32* %a.val = load i32, i32 *%a store i32 %a.val, i32 *%cast %a.casted.val = load uintptr, uintptr* %a.casted call void @.outlined_function.(uintptr %a.casted.val) ... ``` To resolve this problem, in debug mode a speciall external wrapper function is generated, that calls the outlined function with the correct parameters types: ``` void @.wrapper.(uintptr %par) { %a = alloca i32 %cast = bitcast i32* %a to uintptr* store uintptr %par, uintptr *%cast %a.val = load i32, i32* %a call void @.outlined_function.(i32 %a) ret void } void @.outlined_function.(i32 %par); ... %a = alloca i32 %a.casted = alloca uintptr %cast = bitcast uintptr* %a.casted to i32* %a.val = load i32, i32 *%a store i32 %a.val, i32 *%cast %a.casted.val = load uintptr, uintptr* %a.casted call void @.wrapper.(uintptr %a.casted.val) ... ``` llvm-svn: 306697
* [DebugInfo] Add kind of ImplicitParamDecl for emission of FlagObjectPointer.Alexey Bataev2017-06-091-2/+3
| | | | | | | | | | | | | | | | | Summary: If the first parameter of the function is the ImplicitParamDecl, codegen automatically marks it as an implicit argument with `this` or `self` pointer. Added internal kind of the ImplicitParamDecl to separate 'this', 'self', 'vtt' and other implicit parameters from other kind of parameters. Reviewers: rjmccall, aaron.ballman Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D33735 llvm-svn: 305075
* [CodeGen] Propagate LValueBaseInfo instead of AlignmentSourceKrzysztof Parzyszek2017-05-181-3/+4
| | | | | | | | | | | | | The functions creating LValues propagated information about alignment source. Extend the propagated data to also include information about possible unrestricted aliasing. A new class LValueBaseInfo will contain both AlignmentSource and MayAlias info. This patch should not introduce any functional changes. Differential Revision: https://reviews.llvm.org/D33284 llvm-svn: 303358
* Fix -Wpedantic about extra semicolons in CGStmtOpenMP.cppHans Wennborg2017-04-271-3/+3
| | | | llvm-svn: 301564
* Recommit ofCarlo Bertolli2017-04-251-92/+344
| | | | | | | | | | | | | | [OpenMP] Initial implementation of code generation for pragma 'distribute parallel for' on host https://reviews.llvm.org/D29508 This patch makes the following additions: It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation. It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses. It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code. llvm-svn: 301340
* Revert r301223Carlo Bertolli2017-04-241-344/+92
| | | | llvm-svn: 301233
* [OpenMP] Initial implementation of code generation for pragma 'distribute ↵Carlo Bertolli2017-04-241-92/+344
| | | | | | | | | | | | | | | | | parallel for' on host https://reviews.llvm.org/D29508 This patch makes the following additions: 1. It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation. 2. It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses. It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code. Looking forward to comments. llvm-svn: 301223
* [OPENMP] Fix for PR32333: Crash in call of outlined Function.Alexey Bataev2017-04-101-6/+12
| | | | | | | | | If the type of the captured variable is a pointer(s) to variably modified type, this type was not processed correctly. Need to drill into the type, find the innermost variably modified array type and convert it to canonical parameter type. llvm-svn: 299868
* Encapsulate FPOptions and use it consistentlyAdam Nemet2017-03-271-2/+1
| | | | | | | | | | | | | | | | | | Sema holds the current FPOptions which is adjusted by 'pragma STDC FP_CONTRACT'. This then gets propagated into expression nodes as they are built. This encapsulates FPOptions so that this propagation happens opaquely rather than directly with the fp_contractable on/off bit. This allows controlled transitioning of fp_contractable to a ternary value (off, on, fast). It will also allow adding more fast-math flags later. This is toward moving fp-contraction=fast from an LLVM TargetOption to a FastMathFlag in order to fix PR25721. Differential Revision: https://reviews.llvm.org/D31166 llvm-svn: 298877
* [OpenMP] Teams reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-0/+4
| | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any teams construct for elementary data types. It builds on parallel reductions on the GPU. Subsequently, the team master writes to a unique location in a global memory scratchpad. The last team to do so loads and reduces this array to calculate the final result. This patch emits two helper functions that are used by the OpenMP runtime on the GPU to perform reductions across teams. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29879 llvm-svn: 295335
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295333
* Revert r295319 while investigating buildbot failure.Arpith Chacko Jacob2017-02-161-22/+10
| | | | llvm-svn: 295323
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295319
* [OpenMP] Codegen support for 'target teams' on the host.Arpith Chacko Jacob2017-01-251-13/+52
| | | | | | | | | | | | | | | This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293005
* Reverting commit because an NVPTX patch sneaked in. Break up into twoArpith Chacko Jacob2017-01-251-52/+13
| | | | | | patches. llvm-svn: 293003
* [OpenMP] Codegen support for 'target teams' on the host.Arpith Chacko Jacob2017-01-251-13/+52
| | | | | | | | | | | | | | | This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293001
* [OpenMP] Support for the if-clause on the combined directive 'target parallel'.Arpith Chacko Jacob2017-01-181-11/+30
| | | | | | | | | | | | | | | | | | | | | | | The if-clause on the combined directive potentially applies to both the 'target' and the 'parallel' regions. Codegen'ing the if-clause on the combined directive requires additional support because the expression in the clause must be captured by the 'target' capture statement but not the 'parallel' capture statement. Note that this situation arises for other clauses such as num_threads. The OMPIfClause class inherits OMPClauseWithPreInit to support capturing of expressions in the clause. A member CaptureRegion is added to OMPClauseWithPreInit to indicate which captured statement (in this case 'target' but not 'parallel') captures these expressions. To ensure correct codegen of captured expressions in the presence of combined 'target' directives, OMPParallelScope was added to 'parallel' codegen. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28781 llvm-svn: 292437
* [OpenMP] Codegen support for 'target parallel' on the host.Arpith Chacko Jacob2017-01-181-9/+37
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292419
* Revert r292374 to debug Windows buildbot failure.Arpith Chacko Jacob2017-01-181-37/+9
| | | | llvm-svn: 292400
* [OpenMP] Codegen support for 'target parallel' on the host.Arpith Chacko Jacob2017-01-181-9/+37
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292374
* [OpenMP] Refactor code that calls codegen for target regions on the device.Arpith Chacko Jacob2017-01-161-32/+48
| | | | | | | | | | | | This patch refactors code that calls codegen for target regions. Currently the codebase only supports the 'target' directive. The patch pulls out common target processing code into a static function that can be called by codegen for any target directive. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28752 llvm-svn: 292134
* Remove unused lambda captures. NFCMalcolm Parsons2017-01-131-7/+6
| | | | llvm-svn: 291939
* [OpenMP] Sema and parsing for 'target teams distribute simd’ pragmaKelvin Li2017-01-101-0/+10
| | | | | | | | This patch is to implement sema and parsing for 'target teams distribute simd’ pragma. Differential Revision: https://reviews.llvm.org/D28252 llvm-svn: 291579
* [OPENMP] Private, firstprivate, and lastprivate clauses for distribute, host ↵Carlo Bertolli2017-01-031-0/+18
| | | | | | | | | | | | code generation https://reviews.llvm.org/D17840 This patch enables private, firstprivate, and lastprivate clauses for the OpenMP distribute directive. Regression tests differ from the similar case of the same clauses on the for directive, by removing a reference to two global variables g and g1. This is necessary because: 1. a distribute pragma is only allowed inside a target region; 2. referring a global variable (e.g. g and g1) in a target region requires the program to enclose the variable in a "declare target" region; 3. declare target pragmas, which are used to define a declare target region, are currently unavailable in clang (patch being prepared). For this reason, I moved the global declarations into local variables. llvm-svn: 290898
* [OpenMP] Sema and parsing for 'target teams distribute parallel for simd’ ↵Kelvin Li2017-01-031-0/+10
| | | | | | | | | | pragma This patch is to implement sema and parsing for 'target teams distribute parallel for simd’ pragma. Differential Revision: https://reviews.llvm.org/D28202 llvm-svn: 290862
* [OpenMP] Sema and parsing for 'target teams distribute parallel for’ pragmaKelvin Li2016-12-291-0/+10
| | | | | | | | This patch is to implement sema and parsing for 'target teams distribute parallel for’ pragma. Differential Revision: https://reviews.llvm.org/D28160 llvm-svn: 290725
* Fix format. NFCKelvin Li2016-12-281-5/+8
| | | | llvm-svn: 290673
* [OpenMP] Sema and parsing for 'target teams distribute' pragmaKelvin Li2016-12-251-0/+8
| | | | | | | | This patch is to implement sema and parsing for 'target teams distribute' pragma. Differential Revision: https://reviews.llvm.org/D28015 llvm-svn: 290508
OpenPOWER on IntegriCloud