summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen/CGStmtOpenMP.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Encapsulate FPOptions and use it consistentlyAdam Nemet2017-03-271-2/+1
| | | | | | | | | | | | | | | | | | Sema holds the current FPOptions which is adjusted by 'pragma STDC FP_CONTRACT'. This then gets propagated into expression nodes as they are built. This encapsulates FPOptions so that this propagation happens opaquely rather than directly with the fp_contractable on/off bit. This allows controlled transitioning of fp_contractable to a ternary value (off, on, fast). It will also allow adding more fast-math flags later. This is toward moving fp-contraction=fast from an LLVM TargetOption to a FastMathFlag in order to fix PR25721. Differential Revision: https://reviews.llvm.org/D31166 llvm-svn: 298877
* [OpenMP] Teams reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-0/+4
| | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any teams construct for elementary data types. It builds on parallel reductions on the GPU. Subsequently, the team master writes to a unique location in a global memory scratchpad. The last team to do so loads and reduces this array to calculate the final result. This patch emits two helper functions that are used by the OpenMP runtime on the GPU to perform reductions across teams. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29879 llvm-svn: 295335
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295333
* Revert r295319 while investigating buildbot failure.Arpith Chacko Jacob2017-02-161-22/+10
| | | | llvm-svn: 295323
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295319
* [OpenMP] Codegen support for 'target teams' on the host.Arpith Chacko Jacob2017-01-251-13/+52
| | | | | | | | | | | | | | | This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293005
* Reverting commit because an NVPTX patch sneaked in. Break up into twoArpith Chacko Jacob2017-01-251-52/+13
| | | | | | patches. llvm-svn: 293003
* [OpenMP] Codegen support for 'target teams' on the host.Arpith Chacko Jacob2017-01-251-13/+52
| | | | | | | | | | | | | | | This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293001
* [OpenMP] Support for the if-clause on the combined directive 'target parallel'.Arpith Chacko Jacob2017-01-181-11/+30
| | | | | | | | | | | | | | | | | | | | | | | The if-clause on the combined directive potentially applies to both the 'target' and the 'parallel' regions. Codegen'ing the if-clause on the combined directive requires additional support because the expression in the clause must be captured by the 'target' capture statement but not the 'parallel' capture statement. Note that this situation arises for other clauses such as num_threads. The OMPIfClause class inherits OMPClauseWithPreInit to support capturing of expressions in the clause. A member CaptureRegion is added to OMPClauseWithPreInit to indicate which captured statement (in this case 'target' but not 'parallel') captures these expressions. To ensure correct codegen of captured expressions in the presence of combined 'target' directives, OMPParallelScope was added to 'parallel' codegen. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28781 llvm-svn: 292437
* [OpenMP] Codegen support for 'target parallel' on the host.Arpith Chacko Jacob2017-01-181-9/+37
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292419
* Revert r292374 to debug Windows buildbot failure.Arpith Chacko Jacob2017-01-181-37/+9
| | | | llvm-svn: 292400
* [OpenMP] Codegen support for 'target parallel' on the host.Arpith Chacko Jacob2017-01-181-9/+37
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292374
* [OpenMP] Refactor code that calls codegen for target regions on the device.Arpith Chacko Jacob2017-01-161-32/+48
| | | | | | | | | | | | This patch refactors code that calls codegen for target regions. Currently the codebase only supports the 'target' directive. The patch pulls out common target processing code into a static function that can be called by codegen for any target directive. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28752 llvm-svn: 292134
* Remove unused lambda captures. NFCMalcolm Parsons2017-01-131-7/+6
| | | | llvm-svn: 291939
* [OpenMP] Sema and parsing for 'target teams distribute simd’ pragmaKelvin Li2017-01-101-0/+10
| | | | | | | | This patch is to implement sema and parsing for 'target teams distribute simd’ pragma. Differential Revision: https://reviews.llvm.org/D28252 llvm-svn: 291579
* [OPENMP] Private, firstprivate, and lastprivate clauses for distribute, host ↵Carlo Bertolli2017-01-031-0/+18
| | | | | | | | | | | | code generation https://reviews.llvm.org/D17840 This patch enables private, firstprivate, and lastprivate clauses for the OpenMP distribute directive. Regression tests differ from the similar case of the same clauses on the for directive, by removing a reference to two global variables g and g1. This is necessary because: 1. a distribute pragma is only allowed inside a target region; 2. referring a global variable (e.g. g and g1) in a target region requires the program to enclose the variable in a "declare target" region; 3. declare target pragmas, which are used to define a declare target region, are currently unavailable in clang (patch being prepared). For this reason, I moved the global declarations into local variables. llvm-svn: 290898
* [OpenMP] Sema and parsing for 'target teams distribute parallel for simd’ ↵Kelvin Li2017-01-031-0/+10
| | | | | | | | | | pragma This patch is to implement sema and parsing for 'target teams distribute parallel for simd’ pragma. Differential Revision: https://reviews.llvm.org/D28202 llvm-svn: 290862
* [OpenMP] Sema and parsing for 'target teams distribute parallel for’ pragmaKelvin Li2016-12-291-0/+10
| | | | | | | | This patch is to implement sema and parsing for 'target teams distribute parallel for’ pragma. Differential Revision: https://reviews.llvm.org/D28160 llvm-svn: 290725
* Fix format. NFCKelvin Li2016-12-281-5/+8
| | | | llvm-svn: 290673
* [OpenMP] Sema and parsing for 'target teams distribute' pragmaKelvin Li2016-12-251-0/+8
| | | | | | | | This patch is to implement sema and parsing for 'target teams distribute' pragma. Differential Revision: https://reviews.llvm.org/D28015 llvm-svn: 290508
* [OpenMP] Sema and parsing for 'target teams' pragmaKelvin Li2016-12-171-0/+8
| | | | | | | | This patch is to implement sema and parsing for 'target teams' pragma. Differential Revision: https://reviews.llvm.org/D27818 llvm-svn: 290038
* Fix typo in comment. NFC.Kelvin Li2016-12-151-1/+1
| | | | llvm-svn: 289836
* [OpenMP] Sema and parsing for 'teams distribute parallel for' pragmaKelvin Li2016-12-091-0/+12
| | | | | | | | This patch is to implement sema and parsing for 'teams distribute parallel for' pragma. Differential Revision: https://reviews.llvm.org/D27345 llvm-svn: 289179
* [OpenMP] Sema and parsing for 'teams distribute parallel for simd' pragmaKelvin Li2016-11-301-0/+11
| | | | | | | | This patch is to implement sema and parsing for 'teams distribute parallel for simd' pragma. Differential Revision: https://reviews.llvm.org/D27084 llvm-svn: 288294
* [OPENMP] Fixed codegen for 'omp cancel' construct.Alexey Bataev2016-11-171-8/+24
| | | | | | | If 'omp cancel' construct is used in a worksharing construct it may cause hanging of the software in case if reduction clause is used. Patch fixes this problem by avoiding extra reduction processing for branches that were canceled. llvm-svn: 287227
* Revert "[OPENMP] Fixed codegen for 'omp cancel' construct."Vitaly Buka2016-11-161-34/+6
| | | | | | | | | | | | | | | | Summary: r286944 introduced bugs detected by ASAN as use-after-return. r287025 have not fixed them completely. This reverts commit r286944 and r287025. Reviewers: ABataev Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D26720 llvm-svn: 287069
* [OPENMP] Fix stack use after delete, NFC.Alexey Bataev2016-11-151-5/+4
| | | | | | Fixed possible use of stack variable after deletion. llvm-svn: 287025
* [OPENMP] Fixed codegen for 'omp cancel' construct.Alexey Bataev2016-11-151-6/+35
| | | | | | | | | If 'omp cancel' construct is used in a worksharing construct it may cause hanging of the software in case if reduction clause is used. Patch fixes this problem by avoiding extra reduction processing for branches that were canceled. llvm-svn: 286944
* Add the loop end location to the loop metadata. This additional informationAmara Emerson2016-11-101-2/+6
| | | | | | | | | | | | can be used to improve the locations when generating remarks for loops. Depends on the companion LLVM change r286227. Patch by Florian Hahn. Differential Revision: https://reviews.llvm.org/D25764 llvm-svn: 286456
* [OPENMP] Fixed capturing of VLA variables.Alexey Bataev2016-11-071-1/+1
| | | | | | After some changes in codegen capturing of VLA variables in OpenMP regions was broken, causing compiler crash. Patch fixes this issue. llvm-svn: 286103
* Revert "[OPENMP] Fixed capturing of VLA variables."Diana Picus2016-11-071-1/+1
| | | | | | | This reverts commit r286098 because the modified test breaks on many of the buildbots. llvm-svn: 286102
* [OPENMP] Fixed capturing of VLA variables.Alexey Bataev2016-11-071-1/+1
| | | | | | | After some changes in codegen capturing of VLA variables in OpenMP regions was broken, causing compiler crash. Patch fixes this issue. llvm-svn: 286098
* Re-apply patch r279045.Kelvin Li2016-10-251-0/+13
| | | | llvm-svn: 285066
* [CodeGen][ObjC] Do not call objc_storeStrong when initializing aAkira Hatanaka2016-10-181-2/+2
| | | | | | | | | | | | | | | | | | constexpr variable. When compiling a constexpr NSString initialized with an objective-c string literal, CodeGen emits objc_storeStrong on an uninitialized alloca, which causes a crash. This patch folds the code in EmitScalarInit into EmitStoreThroughLValue and fixes the crash by calling objc_retain on the string instead of using objc_storeStrong. rdar://problem/28562009 Differential Revision: https://reviews.llvm.org/D25547 llvm-svn: 284516
* Fix for PR30639: CGDebugInfo Null dereference with OpenMP arrayAlexey Bataev2016-10-131-4/+17
| | | | | | | | | | | | | access, by Erich Keane OpenMP creates a variable array type with a a null size-expr. The Debug generation failed to due to this. This patch corrects the openmp implementation, updates the tests, and adds a new one for this condition. Differential Revision: https://reviews.llvm.org/D25373 llvm-svn: 284110
* Revert "[OpenMP] Sema and parsing for 'teams distribute simd’ pragma"Diana Picus2016-08-181-13/+0
| | | | | | | | | | | | | | | | | This reverts commit r279003 as it breaks some of our buildbots (e.g. clang-cmake-aarch64-quick, clang-x86_64-linux-selfhost-modules). The error is in OpenMP/teams_distribute_simd_ast_print.cpp: clang: /home/buildslave/buildslave/clang-cmake-aarch64-quick/llvm/include/llvm/ADT/DenseMap.h:527: bool llvm::DenseMapBase<DerivedT, KeyT, ValueT, KeyInfoT, BucketT>::LookupBucketFor(const LookupKeyT&, const BucketT*&) const [with LookupKeyT = clang::Stmt*; DerivedT = llvm::DenseMap<clang::Stmt*, long unsigned int>; KeyT = clang::Stmt*; ValueT = long unsigned int; KeyInfoT = llvm::DenseMapInfo<clang::Stmt*>; BucketT = llvm::detail::DenseMapPair<clang::Stmt*, long unsigned int>]: Assertion `!KeyInfoT::isEqual(Val, EmptyKey) && !KeyInfoT::isEqual(Val, TombstoneKey) && "Empty/Tombstone value shouldn't be inserted into map!"' failed. llvm-svn: 279045
* [OpenMP] Sema and parsing for 'teams distribute simd’ pragmaKelvin Li2016-08-171-0/+13
| | | | | | | | | | This patch is to implement sema and parsing for 'teams distribute simd’ pragma. This patch is originated by Carlo Bertolli. Differential Revision: https://reviews.llvm.org/D23528 llvm-svn: 279003
* [OpenMP] Sema and parsing for 'teams distribute' pragmaKelvin Li2016-08-051-0/+12
| | | | | | | | This patch is to implement sema and parsing for 'teams distribute' pragma. Differential Revision: https://reviews.llvm.org/D23189 llvm-svn: 277818
* [OpenMP] Codegen for use_device_ptr clause.Samuel Antao2016-07-281-9/+129
| | | | | | | | | | | | Summary: This patch adds support for the use_device_ptr clause. It includes changes in SEMA that could not be tested without codegen, namely, the use of the first private logic and mappable expressions support. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: https://reviews.llvm.org/D22691 llvm-svn: 276977
* [OpenMP] Add support for mapping array sections through pointer references.Samuel Antao2016-07-271-1/+11
| | | | | | | | | | | | | | | Summary: This patch fixes a bug in the map of array sections whose base is a reference to a pointer. The existing mapping support was not prepared to deal with it, causing the compiler to crash. Mapping a reference to a pointer enjoys the same characteristics of a regular pointer, i.e., it is passed by value. Therefore, the reference has to be materialized in the target region. Reviewers: hfinkel, carlo.bertolli, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: https://reviews.llvm.org/D22690 llvm-svn: 276933
* [OpenMP] Sema and parsing for 'target simd' pragmaKelvin Li2016-07-201-0/+11
| | | | | | | | This patch is to implement sema and parsing for 'target simd' pragma. Differential Revision: https://reviews.llvm.org/D22479 llvm-svn: 276203
* [OPENMP] Improved processing of 'priority' clause, NFC.Alexey Bataev2016-07-191-3/+1
| | | | | | | Removed some old comments + improved handling of 'priority' clause value during codegen after comments from Richard Smith. llvm-svn: 275945
* [OpenMP] Sema and parsing for 'target parallel for simd' pragmaKelvin Li2016-07-141-0/+12
| | | | | | | | This patch is to implement sema and parsing for 'target parallel for simd' pragma. Differential Revision: http://reviews.llvm.org/D22096 llvm-svn: 275365
* [OpenMP] Initial implementation of parse+sema for OpenMP clause ↵Carlo Bertolli2016-07-131-0/+1
| | | | | | | | 'is_device_ptr' of target http://reviews.llvm.org/D22070 llvm-svn: 275282
* [OpenMP] Initial implementation of parse+sema for clause use_device_ptr of ↵Carlo Bertolli2016-07-131-0/+1
| | | | | | | | | | | | | | 'target data' http://reviews.llvm.org/D21904 This patch is similar to the implementation of 'private' clause: it adds a list of private pointers to be used within the target data region to store the device pointers returned by the runtime. Please refer to the following document for a full description of what the runtime witll return in this case (page 10 and 11): https://github.com/clang-omp/OffloadingDesign I am happy to answer any question related to the runtime interface to help reviewing this patch. llvm-svn: 275271
* [OpenMP] Sema and parsing for 'distribute simd' pragmaKelvin Li2016-07-061-0/+13
| | | | | | | | Summary: This patch is an implementation of sema and parsing for the OpenMP composite pragma 'distribute simd'. Differential Revision: http://reviews.llvm.org/D22007 llvm-svn: 274604
* [OpenMP] Sema and parse for 'distribute parallel for simd'Kelvin Li2016-07-051-0/+11
| | | | | | | | Summary: This patch is an implementation of sema and parsing for the OpenMP composite pragma 'distribute parallel for simd'. Differential Revision: http://reviews.llvm.org/D21977 llvm-svn: 274530
* Resubmission of http://reviews.llvm.org/D21564 after fixes.Carlo Bertolli2016-06-271-0/+12
| | | | | | | | | | | | | [OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for' This patch is an initial implementation for #distribute parallel for. The main differences that affect other pragmas are: The implementation of 'distribute parallel for' requires blocking of the associated loop, where blocks are "distributed" to different teams and iterations within each block are scheduled to parallel threads within each team. To implement blocking, sema creates two additional worksharing directive fields that are used to pass the team assigned block lower and upper bounds through the outlined function resulting from 'parallel'. In this way, scheduling for 'for' to threads can use those bounds. As a consequence of blocking, the stride of 'distribute' is not 1 but it is equal to the blocking size. This is returned by the runtime and sema prepares a DistIncrExpr variable to hold that value. As a consequence of blocking, the global upper bound (EnsureUpperBound) expression of the 'for' is not the original loop upper bound (e.g. in for(i = 0 ; i < N; i++) this is 'N') but it is the team-assigned block upper bound. Sema creates a new expression holding the calculation of the actual upper bound for 'for' as UB = min(UB, PrevUB), where UB is the loop upper bound, and PrevUB is the team-assigned block upper bound. llvm-svn: 273884
* Revert r273705Carlo Bertolli2016-06-241-12/+0
| | | | | | [OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for' llvm-svn: 273709
* [OpenMP] Initial implementation of parse and sema for composite pragma ↵Carlo Bertolli2016-06-241-0/+12
| | | | | | | | | | | | | | | 'distribute parallel for' http://reviews.llvm.org/D21564 This patch is an initial implementation for #distribute parallel for. The main differences that affect other pragmas are: The implementation of 'distribute parallel for' requires blocking of the associated loop, where blocks are "distributed" to different teams and iterations within each block are scheduled to parallel threads within each team. To implement blocking, sema creates two additional worksharing directive fields that are used to pass the team assigned block lower and upper bounds through the outlined function resulting from 'parallel'. In this way, scheduling for 'for' to threads can use those bounds. As a consequence of blocking, the stride of 'distribute' is not 1 but it is equal to the blocking size. This is returned by the runtime and sema prepares a DistIncrExpr variable to hold that value. As a consequence of blocking, the global upper bound (EnsureUpperBound) expression of the 'for' is not the original loop upper bound (e.g. in for(i = 0 ; i < N; i++) this is 'N') but it is the team-assigned block upper bound. Sema creates a new expression holding the calculation of the actual upper bound for 'for' as UB = min(UB, PrevUB), where UB is the loop upper bound, and PrevUB is the team-assigned block upper bound. llvm-svn: 273705
OpenPOWER on IntegriCloud