summaryrefslogtreecommitdiffstats
path: root/clang/test/OpenMP
Commit message (Collapse)AuthorAgeFilesLines
...
* [OpenMP] Generate better diagnostics for cancel and cancellation pointJonas Hahnfeld2017-02-222-0/+16
| | | | | | | | | | | | | | | checkNestingOfRegions uses CancelRegion to determine whether cancel and cancellation point are valid in the given nesting. This leads to unuseful diagnostics if CancelRegion is invalid. The given test case has produced: region cannot be closely nested inside 'parallel' region As a solution, introduce checkCancelRegion and call it first to get the expected error: one of 'for', 'parallel', 'sections' or 'taskgroup' is expected Differential Revision: https://reviews.llvm.org/D30135 llvm-svn: 295808
* [OpenMP] Fix cancellation point in task with no cancelJonas Hahnfeld2017-02-171-0/+15
| | | | | | | | | With tasks, the cancel may happen in another task. This has a different region info which means that we can't find it here. Differential Revision: https://reviews.llvm.org/D30091 llvm-svn: 295474
* [OpenMP] Remove barriers at cancel and cancellation pointJonas Hahnfeld2017-02-172-20/+4
| | | | | | | | | | | | | | | | This resolves a deadlock with the cancel directive when there is no explicit cancellation point. In that case, the implicit barrier acts as cancellation point. After removing the barrier after cancel, the now unmatched barrier for the explicit cancellation point has to go as well. This has probably worked before rL255992: With the calls for the explicit barrier, it was sure that all threads passed a barrier before exiting. Reported by Simon Convent and Joachim Protze! Differential Revision: https://reviews.llvm.org/D30088 llvm-svn: 295473
* [OpenMP] Teams reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-0/+1143
| | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any teams construct for elementary data types. It builds on parallel reductions on the GPU. Subsequently, the team master writes to a unique location in a global memory scratchpad. The last team to do so loads and reduces this array to calculate the final result. This patch emits two helper functions that are used by the OpenMP runtime on the GPU to perform reductions across teams. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29879 llvm-svn: 295335
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-0/+830
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295333
* Revert r295319 while investigating buildbot failure.Arpith Chacko Jacob2017-02-161-830/+0
| | | | llvm-svn: 295323
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-0/+830
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295319
* [CodeGen] Treat auto-generated __dso_handle symbol as HiddenVisibilityReid Kleckner2017-02-131-1/+1
| | | | | | | | | | | | Fixes https://bugs.llvm.org/show_bug.cgi?id=31932 Based on a patch by Roland McGrath Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D29843 llvm-svn: 294978
* [Lit Test] Make tests C++11 compatible - Parse OpenMPCharles Li2017-02-082-5/+37
| | | | | | Differential Revision: https://reviews.llvm.org/D29725 llvm-svn: 294504
* [OpenMP] Remove fixme comment in regression test and related unnecessary ↵Carlo Bertolli2017-02-061-2/+0
| | | | | | | | | | statement https://reviews.llvm.org/D29501 It looks like I forgot to remove a FIXME comment with the associated statement. The test does not need it and it gives the wrong impression of being an incomplete test. llvm-svn: 294195
* [OpenMP] Add missing regression test for pragma distribute, clause firstprivateCarlo Bertolli2017-02-031-0/+382
| | | | | | | | https://reviews.llvm.org/D28243 The regression test was missing from the previous already accepted patch. llvm-svn: 294026
* [Lit Test] Make tests C++11 compatible - OpenMP constant expressionsCharles Li2017-02-034-6/+83
| | | | | | | | C++11 introduced constexpr, hence the change in diagnostics. Differential Revision: https://reviews.llvm.org/D29480 llvm-svn: 294025
* [OpenMP][NVPTX][CUDA] Adding support for printf for an NVPTX OpenMP device.Arpith Chacko Jacob2017-01-291-0/+116
| | | | | | | | | | | | | | | Support for CUDA printf is exploited to support printf for an NVPTX OpenMP device. To reflect the support of both programming models, the file CGCUDABuiltin.cpp has been renamed to CGGPUBuiltin.cpp, and the call EmitCUDADevicePrintfCallExpr has been renamed to EmitGPUDevicePrintfCallExpr. Reviewers: jlebar Differential Revision: https://reviews.llvm.org/D17890 llvm-svn: 293444
* [OpenMP] Codegen support for 'target teams' on the NVPTX device.Arpith Chacko Jacob2017-01-261-0/+222
| | | | | | | | | | This is a simple patch to teach OpenMP codegen to emit the construct in Generic mode. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29143 llvm-svn: 293183
* [OpenMP] Support for the proc_bind-clause on 'target parallel' on the NVPTX ↵Arpith Chacko Jacob2017-01-251-0/+106
| | | | | | | | | | | | | | device. This patch adds support for the proc_bind clause on the Spmd construct 'target parallel' on the NVPTX device. Since the parallel region is created upon kernel launch, this clause can be safely ignored on the NVPTX device at codegen time for level 0 parallelism. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29128 llvm-svn: 293069
* [OpenMP] Support for thread_limit-clause on the 'target teams' directive.Arpith Chacko Jacob2017-01-251-0/+357
| | | | | | | | | | | The thread_limit-clause on the combined directive applies to the 'teams' region of this construct. We modify the ThreadLimitClause class to capture the clause expression within the 'target' region. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29087 llvm-svn: 293049
* [OpenMP] Support for num_teams-clause on the 'target teams' directive.Arpith Chacko Jacob2017-01-251-0/+344
| | | | | | | | | | | The num_teams-clause on the combined directive applies to the 'teams' region of this construct. We modify the NumTeamsClause class to capture the clause expression within the 'target' region. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29085 llvm-svn: 293048
* [OpenMP] Codegen support for 'target teams' on the host.Arpith Chacko Jacob2017-01-253-0/+1305
| | | | | | | | | | | | | | | This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293005
* Reverting commit because an NVPTX patch sneaked in. Break up into twoArpith Chacko Jacob2017-01-253-1305/+0
| | | | | | patches. llvm-svn: 293003
* [OpenMP] Codegen support for 'target teams' on the host.Arpith Chacko Jacob2017-01-253-0/+1305
| | | | | | | | | | | | | | | This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293001
* [OpenMP] Support for the num_threads-clause on 'target parallel' on the ↵Arpith Chacko Jacob2017-01-251-0/+126
| | | | | | | | | | | | | | NVPTX device. This patch adds support for the Spmd construct 'target parallel' on the NVPTX device. This involves ignoring the num_threads clause on the device since the number of threads in this combined construct is already set on the host through the call to __tgt_target_teams(). Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29083 llvm-svn: 292999
* [OpenMP] Support for the num_threads-clause on 'target parallel'.Arpith Chacko Jacob2017-01-253-13/+357
| | | | | | | | | | | | | | | The num_threads-clause on the combined directive applies to the 'parallel' region of this construct. We modify the NumThreadsClause class to capture the clause expression within the 'target' region. The offload runtime call for 'target parallel' is changed to __tgt_target_teams() with 1 team and the number of threads set by this clause or a default if none. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29082 llvm-svn: 292997
* [OpenMP] DSAChecker bug fix for combined directives.Arpith Chacko Jacob2017-01-231-0/+3
| | | | | | | | | | | | | The DSAChecker code in SemaOpenMP looks at the captured statement associated with an OpenMP directive. A combined directive such as 'target parallel' has nested capture statements, which have to be fully traversed before executing the DSAChecker. This is a patch to perform the traversal for such combined directives. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29026 llvm-svn: 292794
* [OPENMP] Fix for PR31643: Clang crashes when compiling code on WindowsAlexey Bataev2017-01-201-0/+18
| | | | | | | | | | with SEH and openmp In some cituations (during codegen for Windows SEH constructs) CodeGenFunction instance may have CurFn equal to nullptr. OpenMP related code does not expect such situation during cleanup. llvm-svn: 292590
* [OpenMP] Support for the if-clause on the combined directive 'target parallel'.Arpith Chacko Jacob2017-01-181-0/+413
| | | | | | | | | | | | | | | | | | | | | | | The if-clause on the combined directive potentially applies to both the 'target' and the 'parallel' regions. Codegen'ing the if-clause on the combined directive requires additional support because the expression in the clause must be captured by the 'target' capture statement but not the 'parallel' capture statement. Note that this situation arises for other clauses such as num_threads. The OMPIfClause class inherits OMPClauseWithPreInit to support capturing of expressions in the clause. A member CaptureRegion is added to OMPClauseWithPreInit to indicate which captured statement (in this case 'target' but not 'parallel') captures these expressions. To ensure correct codegen of captured expressions in the presence of combined 'target' directives, OMPParallelScope was added to 'parallel' codegen. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28781 llvm-svn: 292437
* [OpenMP] Codegen for the 'target parallel' directive on the NVPTX device.Arpith Chacko Jacob2017-01-182-12/+156
| | | | | | | | | | | | | | | | | | This patch adds codegen for the 'target parallel' directive on the NVPTX device. We term offload OpenMP directives such as 'target parallel' and 'target teams distribute parallel for' as SPMD constructs. SPMD constructs, in contrast to Generic ones like the plain 'target', can never contain a serial region. SPMD constructs can be handled more efficiently on the GPU and do not require the Warp Loop of the Generic codegen scheme. This patch adds SPMD codegen support for 'target parallel' on the NVPTX device and can be reused for other SPMD constructs. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28755 llvm-svn: 292428
* [OpenMP] Codegen support for 'target parallel' on the host.Arpith Chacko Jacob2017-01-183-0/+1305
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292419
* Revert r292374 to debug Windows buildbot failure.Arpith Chacko Jacob2017-01-183-1305/+0
| | | | llvm-svn: 292400
* [OpenMP] Codegen support for 'target parallel' on the host.Arpith Chacko Jacob2017-01-183-0/+1305
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292374
* Improve handling of instantiated thread_local variables in Itanium C++ ABI.Richard Smith2017-01-131-7/+7
| | | | | | | | | | | | | | | | | * Do not initialize these variables when initializing the rest of the thread_locals in the TU; they have unordered initialization so they can be initialized by themselves. This fixes a rejects-valid bug: we would make the per-variable initializer function internal, but put it in a comdat keyed off the variable, resulting in link errors when the comdat is selected from a different TU (as the per TU TLS init function tries to call an init function that does not exist). * On Darwin, when we decide that we're not going to emit a thread wrapper function at all, demote its linkage to External. Fixes a verifier failure on explicit instantiation of a thread_local variable on Darwin. llvm-svn: 291865
* [OpenMP] Sema and parsing for 'target teams distribute simd’ pragmaKelvin Li2017-01-1026-1/+5370
| | | | | | | | This patch is to implement sema and parsing for 'target teams distribute simd’ pragma. Differential Revision: https://reviews.llvm.org/D28252 llvm-svn: 291579
* [OpenMP] Basic support for a parallel directive in a target region on an ↵Arpith Chacko Jacob2017-01-101-0/+317
| | | | | | | | | | | | | | | | | | | | | NVPTX device Summary: This patch introduces support for the execution of parallel constructs in a target region on the NVPTX device. Parallel regions must be in the lexical scope of the target directive. The master thread in the master warp signals parallel work for worker threads in worker warps on encountering a parallel region. Note: The patch does not yet support capture of arguments in a parallel region so the test cases are simple. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28145 llvm-svn: 291565
* [OpenMP] Support the 'is_device_ptr' clause with 'target parallel for' pragmaKelvin Li2017-01-102-0/+626
| | | | | | | | This patch is to add support of the 'is_device_ptr' clause with the 'target parallel for' pragma. Differential Revision: https://reviews.llvm.org/D28255 llvm-svn: 291540
* [OpenMP] Support the 'is_device_ptr' clause with 'target parallel for simd' ↵Kelvin Li2017-01-102-0/+655
| | | | | | | | | | pragma This patch is to add support of the 'is_device_ptr' clause with the 'target parallel for simd' pragma. Differential Revision: https://reviews.llvm.org/D28402 llvm-svn: 291537
* Fixing test to work when the compiler defaults to a different C++ standard ↵Douglas Yung2017-01-091-3/+26
| | | | | | | | version. Differential Revision: https://reviews.llvm.org/D28418 llvm-svn: 291491
* [Lit Test] Make tests C++11 compatible - nothrow destructorsCharles Li2017-01-092-5/+8
| | | | | | | | | In C++11, a destructor's implicit exception-spec is nothrow. The IR for the destructor's invocation changed from invoke to call. Differential Revision: https://reviews.llvm.org/D28425 llvm-svn: 291458
* [OpenMP] fix typo - the standalone 'distribute' pragma should be 'teams ↵Kelvin Li2017-01-061-1/+2
| | | | | | distribute' pragma llvm-svn: 291260
* [OpenMP] Add fields for flags in the offload entry descriptor.Samuel Antao2017-01-052-28/+28
| | | | | | | | | | | | | | | | | Summary: This patch adds two fields to the offload entry descriptor. One field is meant to signal Ctors/Dtors and `link` global variables, and the other is reserved for runtime library use. Currently, these fields are only filled with zeros in the current code generation, but that will change when `declare target` is added. The reason, we are adding these fields now is to make the code generation consistent with the runtime library proposal under review in https://reviews.llvm.org/D14031. Reviewers: ABataev, hfinkel, carlo.bertolli, kkwli0, arpith-jacob, Hahnfeld Subscribers: cfe-commits, caomhin, jholewinski Differential Revision: https://reviews.llvm.org/D28298 llvm-svn: 291124
* [OpenMP] Update target codegen for NVPTX device.Arpith Chacko Jacob2017-01-051-161/+193
| | | | | | | | | | | | | | | | | This patch includes updates for codegen of the target region for the NVPTX device. It moves initializers from the compiler to the runtime and updates the worker loop to assume parallel work is retrieved from the runtime. A subsequent patch will update the codegen to retrieve the parallel work using calls to the runtime. It includes the removal of the inline attribute for the worker loop and disabling debug info in it. This allows codegen for a target directive and serial execution on the NVPTX device. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28125 llvm-svn: 291121
* Reverting commit r290983 while debugging test failure on windows.Arpith Chacko Jacob2017-01-041-211/+161
| | | | llvm-svn: 290989
* [OpenMP] Update target codegen for NVPTX device.Arpith Chacko Jacob2017-01-041-161/+211
| | | | | | | | | | | | | | | | | This patch includes updates for codegen of the target region for the NVPTX device. It moves initializers from the compiler to the runtime and updates the worker loop to assume parallel work is retrieved from the runtime. A subsequent patch will update the codegen to retrieve the parallel work using calls to the runtime. It includes the removal of the inline attribute for the worker loop and disabling debug info in it. This allows codegen for a target directive and serial execution on the NVPTX device. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28125 llvm-svn: 290983
* [OPENMP] Private, firstprivate, and lastprivate clauses for distribute, host ↵Carlo Bertolli2017-01-032-0/+585
| | | | | | | | | | | | code generation https://reviews.llvm.org/D17840 This patch enables private, firstprivate, and lastprivate clauses for the OpenMP distribute directive. Regression tests differ from the similar case of the same clauses on the for directive, by removing a reference to two global variables g and g1. This is necessary because: 1. a distribute pragma is only allowed inside a target region; 2. referring a global variable (e.g. g and g1) in a target region requires the program to enclose the variable in a "declare target" region; 3. declare target pragmas, which are used to define a declare target region, are currently unavailable in clang (patch being prepared). For this reason, I moved the global declarations into local variables. llvm-svn: 290898
* [OpenMP] Sema and parsing for 'target teams distribute parallel for simd’ ↵Kelvin Li2017-01-0330-0/+5681
| | | | | | | | | | pragma This patch is to implement sema and parsing for 'target teams distribute parallel for simd’ pragma. Differential Revision: https://reviews.llvm.org/D28202 llvm-svn: 290862
* [OpenMP] Add test cases for the proc_bind and schedule clauses with 'teams ↵Kelvin Li2017-01-022-0/+257
| | | | | | | | distribute parallel for' pragma. https://reviews.llvm.org/D28205 llvm-svn: 290813
* Fix typo in test case. NFCKelvin Li2016-12-311-1/+1
| | | | llvm-svn: 290795
* [OpenMP] Sema and parsing for 'target teams distribute parallel for’ pragmaKelvin Li2016-12-2927-0/+5107
| | | | | | | | This patch is to implement sema and parsing for 'target teams distribute parallel for’ pragma. Differential Revision: https://reviews.llvm.org/D28160 llvm-svn: 290725
* [OpenMP] Sema and parsing for 'target teams distribute' pragmaKelvin Li2016-12-2520-3/+3783
| | | | | | | | This patch is to implement sema and parsing for 'target teams distribute' pragma. Differential Revision: https://reviews.llvm.org/D28015 llvm-svn: 290508
* Make '-disable-llvm-optzns' an alias for '-disable-llvm-passes'.Chandler Carruth2016-12-232-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Much to my surprise, '-disable-llvm-optzns' which I thought was the magical flag I wanted to get at the raw LLVM IR coming out of Clang deosn't do that. It still runs some passes over the IR. I don't want that, I really want the *raw* IR coming out of Clang and I strongly suspect everyone else using it is in the same camp. There is actually a flag that does what I want that I didn't know about called '-disable-llvm-passes'. I suspect many others don't know about it either. It both does what I want and is much simpler. This removes the confusing version and makes that spelling of the flag an alias for '-disable-llvm-passes'. I've also moved everything in Clang to use the 'passes' spelling as it seems both more accurate (*all* LLVM passes are disabled, not just optimizations) and much easier to remember and spell correctly. This is part of simplifying how Clang drives LLVM to make it cleaner to wire up to the new pass manager. Differential Revision: https://reviews.llvm.org/D28047 llvm-svn: 290392
* [OPENMP] Fix for PR31417: assert failure when compiling trivial openmpAlexey Bataev2016-12-221-18/+19
| | | | | | | | | | program Offload related code is not quite ready yet, but some simple examples must not crash the compiler. Patch fixes the problem in offloading code with exceptions. llvm-svn: 290364
* [OPENMP] Fix for PR31416: Clang crashes on OMPCapturedExpr during sourceAlexey Bataev2016-12-201-0/+3
| | | | | | | | | based coverage compilation Added source location info to captured expression declaration + fixed source location info for loop based directives. llvm-svn: 290181
OpenPOWER on IntegriCloud