summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen/CGOpenMPRuntime.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Clang/libomptarget map interface flag renaming - NFC patchGeorge Rokos2017-11-071-41/+38
| | | | | | | | | | This patch renames some of the flag names of the clang/libomptarget map interface. The old names are slightly misleading, whereas the new ones describe in a better way what each flag is about. Only the macros within the enumeration are renamed, there is no change in functionality therefore there are no updated regression tests. Differential Revision: https://reviews.llvm.org/D39745 llvm-svn: 317598
* [OPENMP] Fix PR35152: Do not use getInvokeDest() function for EH checks.Alexey Bataev2017-11-021-1/+2
| | | | | | | The compiler may crash under some conditions if the getInvokeDest() is used, but later it is not used. Fixed this problem in OpenMP. llvm-svn: 317227
* [OPENMP] Fix PR35156: Get correct thread id with windows exceptions.Alexey Bataev2017-11-021-6/+8
| | | | | | | If the thread id is requested in windows mode within funclets, we may generate incorrect function call that could lead to broken codegen. llvm-svn: 317208
* [CodeGen] Propagate may-alias'ness of lvalues with TBAA infoIvan A. Kosarev2017-10-311-5/+4
| | | | | | | | | | | | | This patch fixes various places in clang to propagate may-alias TBAA access descriptors during construction of lvalues, thus eliminating the need for the LValueBaseInfo::MayAlias flag. This is part of D38126 reworked to be a separate patch to simplify review. Differential Revision: https://reviews.llvm.org/D39008 llvm-svn: 316988
* [CodeGen] Generate TBAA info for reference loadsIvan A. Kosarev2017-10-301-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D39177 llvm-svn: 316896
* [OpenMP] Avoid VLAs for some reductions on array sectionsJonas Hahnfeld2017-10-231-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases the compiler can deduce the length of an array section as constants. With this information, VLAs can be avoided in place of a constant sized array or even a scalar value if the length is 1. Example: int a[4], b[2]; pragma omp parallel reduction(+: a[1:2], b[1:1]) { } For chained array sections, this optimization is restricted to cases where all array sections except the last have a constant length 1. This trivially guarantees that there are no holes in the memory region that needs to be privatized. Example: int c[3][4]; pragma omp parallel reduction(+: c[1:1][1:2]) { } This relands commit r316229 that I reverted in r316235 because it failed on some bots. During investigation I found that this was because Clang and GCC evaluate the two arguments to emplace_back() in ReductionCodeGen::emitSharedLValue() in a different order, hence leading to a different order of generated instructions in the final LLVM IR. Fix this by passing in the arguments from temporary variables that are evaluated in a defined order. Differential Revision: https://reviews.llvm.org/D39136 llvm-svn: 316362
* Revert "[OpenMP] Avoid VLAs for some reductions on array sections"Jonas Hahnfeld2017-10-201-4/+6
| | | | | | | | | | This breaks at least two buildbots: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/1175 http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/10478 This reverts commit r316229 during local investigation. llvm-svn: 316235
* [OpenMP] Avoid VLAs for some reductions on array sectionsJonas Hahnfeld2017-10-201-6/+4
| | | | | | | | | | | | | | | | | | | | | | | In some cases the compiler can deduce the length of an array section as constants. With this information, VLAs can be avoided in place of a constant sized array or even a scalar value if the length is 1. Example: int a[4], b[2]; pragma omp parallel reduction(+: a[1:2], b[1:1]) { } For chained array sections, this optimization is restricted to cases where all array sections except the last have a constant length 1. This trivially guarantees that there are no holes in the memory region that needs to be privatized. Example: int c[3][4]; pragma omp parallel reduction(+: c[1:1][1:2]) { } Differential Revision: https://reviews.llvm.org/D39136 llvm-svn: 316229
* [OPENMP] Fix PR34927: Emit initializer for reduction array with declareAlexey Bataev2017-10-121-3/+8
| | | | | | | | | reduction. If the reduction is an array or an array section and reduction operation is declare reduction without initializer, it may lead to crash. llvm-svn: 315611
* [OPENMP] Fix PR34925: Fix getting thread_id lvalue for inlined regionsAlexey Bataev2017-10-121-0/+7
| | | | | | | | | | | in C. If we try to get the lvalue for thread_id variables in inlined regions, we did not use the correct version of function. Fixed this bug by adding overrided version of the function getThreadIDVariableLValue for inlined regions. llvm-svn: 315578
* [CodeGen] Generate TBAA info along with LValue base infoIvan A. Kosarev2017-10-121-3/+6
| | | | | | | | | | | | | | | | | | This patch enables explicit generation of TBAA information in all cases where LValue base info is propagated or constructed in non-trivial ways. Eventually, we will consider each of these cases to make sure the TBAA information is correct and not too conservative. For now, we just fall back to generating TBAA info from the access type. This patch should not bring in any functional changes. This is part of D38126 reworked to be a separate patch to simplify review. Differential Revision: https://reviews.llvm.org/D38733 llvm-svn: 315575
* [OPENMP] Remove extra if, NFC.Alexey Bataev2017-10-111-1/+1
| | | | llvm-svn: 315467
* [OPENMP] Fix PR34916: Crash on mixing taskloop|tasks directives.Alexey Bataev2017-10-111-3/+14
| | | | | | | | If both taskloop and task directives are used at the same time in one program, we may ran into the situation when the particular type for task directive is reused for taskloop directives. Patch fixes this problem. llvm-svn: 315464
* [CodeGen] Do not construct complete LValue base info in trivial casesIvan A. Kosarev2017-10-101-1/+1
| | | | | | | | | | | | | Besides obvious code simplification, avoiding explicit creation of LValueBaseInfo objects makes it easier to make TBAA information to be part of such objects. This is part of D38126 reworked to be a separate patch to simplify review. Differential Revision: https://reviews.llvm.org/D38695 llvm-svn: 315289
* Remove unused variables. No functionality change.Benjamin Kramer2017-10-081-2/+0
| | | | llvm-svn: 315196
* Remove unused variables. No functionality change.Benjamin Kramer2017-10-081-1/+0
| | | | llvm-svn: 315185
* [OPENMP] Simplify codegen for non-offloading code.Alexey Bataev2017-10-021-31/+20
| | | | | | | Simplified and generalized codegen for non-offloading part that works if offloading is failed or condition of the `if` clause is `false`. llvm-svn: 314670
* [OPENMP] Generate implicit map|firstprivate clauses for target-basedAlexey Bataev2017-09-261-46/+43
| | | | | | | | | | directives. If the variable is used in the target-based region but is not found in any private|mapping clause, then generate implicit firstprivate|map clauses for these implicitly mapped variables. llvm-svn: 314205
* [OPENMP] Fix for PR33922: New ident_t flags forAlexey Bataev2017-09-061-2/+10
| | | | | | | | | | | | __kmpc_for_static_fini(). Added special flags for calls of __kmpc_for_static_fini(), like previous ly for __kmpc_for_static_init(). Added flag OMP_IDENT_WORK_DISTRIBUTE for distribute cnstruct, OMP_IDENT_WORK_SECTIONS for sections-based constructs and OMP_IDENT_WORK_LOOP for loop-based constructs in location flags. llvm-svn: 312642
* [OPENMP] Fix for PR34445: Reduction initializer segfaults at runtime inAlexey Bataev2017-09-061-2/+12
| | | | | | | | | | move constructor. Previously user-defined reduction initializer was considered as an assignment expression, not as initializer. Fixed this by treating the initializer expression as an initializer. llvm-svn: 312638
* [OPRNMP] Fix for PR33445: ICE: OpenMP target containing ordered for.Alexey Bataev2017-08-161-12/+15
| | | | | | | | | | | If exceptions are enabled, there may be a problem with the codegen of the finalization functions from OpenMP runtime. It happens because of the problem with the getting of thread identifier value. Patch tries to fix it by using the result of the call of function __kmpc_global_thread_num() rather than loading of value of outlined function parameter. llvm-svn: 311007
* [OPENMP] Fix for PR33922: New ident_t flags forAlexey Bataev2017-08-141-60/+74
| | | | | | | | | | | | | __kmpc_for_static_init(). OpenMP 5.0 will include OpenMP Tools interface that requires distinguishing different worksharing constructs. Since the same entry point (__kmp_for_static_init(ident_t *loc, kmp_int32 global_tid,........)) is called in case static loop/sections/distribute it is suggested using 'flags' field of the ident_t structure to pass the type of the construct. llvm-svn: 310865
* [OPENMP][DEBUG] Fix for PR33676: Debug info for OpenMP region is broken.Alexey Bataev2017-08-141-1/+0
| | | | | | | After some changes in clang/LLVM debug info for task-based regions was not generated at all. Patch fixes this problem. llvm-svn: 310850
* [OPENMP] Generalization of calls of the outlined functions.Alexey Bataev2017-08-141-18/+28
| | | | | | General improvement of the outlined functions calls. llvm-svn: 310840
* [OPENMP][DEBUG] Set proper address space info if required by target.Alexey Bataev2017-08-081-0/+6
| | | | | | | | | | | Arguments, passed to the outlined function, must have correct address space info for proper Debug info support. Patch sets global address space for arguments that are mapped and passed by reference. Also, cuda-gdb does not handle reference types correctly, so reference arguments are represented as pointers. llvm-svn: 310387
* [OPENMP] Unify generation of outlined function calls.Alexey Bataev2017-08-041-4/+16
| | | | llvm-svn: 310098
* [OPENMP] Codegen for reduction clauses in 'taskloop' directives.Alexey Bataev2017-07-171-13/+439
| | | | | | Adds codegen for taskloop-based directives. llvm-svn: 308174
* [OPENMP] Generalization of codegen for reduction clauses.Alexey Bataev2017-07-131-0/+394
| | | | | | | Reworked codegen for reduction clauses for future support of reductions in task-based directives. llvm-svn: 307910
* [OPENMP] Emit implicit taskgroup block around taskloop directives.Alexey Bataev2017-07-121-7/+12
| | | | | | | | | If taskloop directive has no associated nogroup clause, it must emitted inside implicit taskgroup block. Runtime supports it, but we need to generate implicit taskgroup block explicitly to support future reductions codegen. llvm-svn: 307822
* [OPENMP][DEBUG] Generate second function with correct arg types.Alexey Bataev2017-06-291-14/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if the some of the parameters are captured by value, this argument is converted to uintptr_t type and thus we loosing the debug info about real type of the argument (captured variable): ``` void @.outlined_function.(uintptr %par); ... %a = alloca i32 %a.casted = alloca uintptr %cast = bitcast uintptr* %a.casted to i32* %a.val = load i32, i32 *%a store i32 %a.val, i32 *%cast %a.casted.val = load uintptr, uintptr* %a.casted call void @.outlined_function.(uintptr %a.casted.val) ... ``` To resolve this problem, in debug mode a speciall external wrapper function is generated, that calls the outlined function with the correct parameters types: ``` void @.wrapper.(uintptr %par) { %a = alloca i32 %cast = bitcast i32* %a to uintptr* store uintptr %par, uintptr *%cast %a.val = load i32, i32* %a call void @.outlined_function.(i32 %a) ret void } void @.outlined_function.(i32 %par); ... %a = alloca i32 %a.casted = alloca uintptr %cast = bitcast uintptr* %a.casted to i32* %a.val = load i32, i32 *%a store i32 %a.val, i32 *%cast %a.casted.val = load uintptr, uintptr* %a.casted call void @.wrapper.(uintptr %a.casted.val) ... ``` llvm-svn: 306697
* [OPENMP] Use MapVector instead of DenseMap for stable codegen, NFC.Alexey Bataev2017-06-271-1/+1
| | | | llvm-svn: 306419
* Add comma to comment.Gheorghe-Teodor Bercea2017-06-131-1/+1
| | | | llvm-svn: 305294
* [DebugInfo] Add kind of ImplicitParamDecl for emission of FlagObjectPointer.Alexey Bataev2017-06-091-44/+47
| | | | | | | | | | | | | | | | | Summary: If the first parameter of the function is the ImplicitParamDecl, codegen automatically marks it as an implicit argument with `this` or `self` pointer. Added internal kind of the ImplicitParamDecl to separate 'this', 'self', 'vtt' and other implicit parameters from other kind of parameters. Reviewers: rjmccall, aaron.ballman Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D33735 llvm-svn: 305075
* IRGen: Add optnone attribute on function during O0Mehdi Amini2017-05-291-0/+2
| | | | | | | | | | | Amongst other, this will help LTO to correctly handle/honor files compiled with O0, helping debugging failures. It also seems in line with how we handle other options, like how -fnoinline adds the appropriate attribute as well. Differential Revision: https://reviews.llvm.org/D28404 llvm-svn: 304127
* [OpenMP] Create COMDAT group for OpenMP offload registration code to avoid ↵George Rokos2017-05-271-0/+13
| | | | | | | | | | multiple copies Thanks to Sergey Dmitriev for submitting the patch. Differential Revision: https://reviews.llvm.org/D33509 llvm-svn: 304056
* [CodeGen] Propagate LValueBaseInfo instead of AlignmentSourceKrzysztof Parzyszek2017-05-181-2/+4
| | | | | | | | | | | | | The functions creating LValues propagated information about alignment source. Extend the propagated data to also include information about possible unrestricted aliasing. A new class LValueBaseInfo will contain both AlignmentSource and MayAlias info. This patch should not introduce any functional changes. Differential Revision: https://reviews.llvm.org/D33284 llvm-svn: 303358
* Suppress all uses of LLVM_END_WITH_NULL. NFC.Serge Guelton2017-05-091-1/+1
| | | | | | | | | | Use variadic templates instead of relying on <cstdarg> + sentinel. This enforces better type checking and makes code more readable. Differential revision: https://reviews.llvm.org/D32550 llvm-svn: 302572
* Recommit ofCarlo Bertolli2017-04-251-12/+10
| | | | | | | | | | | | | | [OpenMP] Initial implementation of code generation for pragma 'distribute parallel for' on host https://reviews.llvm.org/D29508 This patch makes the following additions: It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation. It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses. It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code. llvm-svn: 301340
* Revert r301223Carlo Bertolli2017-04-241-10/+12
| | | | llvm-svn: 301233
* [OpenMP] Initial implementation of code generation for pragma 'distribute ↵Carlo Bertolli2017-04-241-12/+10
| | | | | | | | | | | | | | | | | parallel for' on host https://reviews.llvm.org/D29508 This patch makes the following additions: 1. It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation. 2. It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses. It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code. Looking forward to comments. llvm-svn: 301223
* Spelling mistakes in comments. NFCI. (PR27635)Simon Pilgrim2017-03-301-2/+2
| | | | llvm-svn: 299083
* Use arg_begin() instead of getArgumentList().begin(), the argument list is ↵Reid Kleckner2017-03-161-3/+1
| | | | | | an implementation detail llvm-svn: 297975
* Promote ConstantInitBuilder to be a public CodeGen API; it'sJohn McCall2017-03-021-1/+1
| | | | | | a generally useful utility for other frontends. NFC. llvm-svn: 296806
* [OpenMP] Fix cancellation point in task with no cancelJonas Hahnfeld2017-02-171-1/+3
| | | | | | | | | With tasks, the cancel may happen in another task. This has a different region info which means that we can't find it here. Differential Revision: https://reviews.llvm.org/D30091 llvm-svn: 295474
* [OpenMP] Remove barriers at cancel and cancellation pointJonas Hahnfeld2017-02-171-6/+0
| | | | | | | | | | | | | | | | This resolves a deadlock with the cancel directive when there is no explicit cancellation point. In that case, the implicit barrier acts as cancellation point. After removing the barrier after cancel, the now unmatched barrier for the explicit cancellation point has to go as well. This has probably worked before rL255992: With the calls for the explicit barrier, it was sure that all threads passed a barrier before exiting. Reported by Simon Convent and Joachim Protze! Differential Revision: https://reviews.llvm.org/D30088 llvm-svn: 295473
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295333
* Revert r295319 while investigating buildbot failure.Arpith Chacko Jacob2017-02-161-17/+14
| | | | llvm-svn: 295323
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295319
* [OpenMP] Codegen support for 'target teams' on the host.Arpith Chacko Jacob2017-01-251-22/+53
| | | | | | | | | | | | | | | This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293005
* Reverting commit because an NVPTX patch sneaked in. Break up into twoArpith Chacko Jacob2017-01-251-53/+22
| | | | | | patches. llvm-svn: 293003
OpenPOWER on IntegriCloud