summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen/CGOpenMPRuntime.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Recommit ofCarlo Bertolli2017-04-251-12/+10
| | | | | | | | | | | | | | [OpenMP] Initial implementation of code generation for pragma 'distribute parallel for' on host https://reviews.llvm.org/D29508 This patch makes the following additions: It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation. It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses. It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code. llvm-svn: 301340
* Revert r301223Carlo Bertolli2017-04-241-10/+12
| | | | llvm-svn: 301233
* [OpenMP] Initial implementation of code generation for pragma 'distribute ↵Carlo Bertolli2017-04-241-12/+10
| | | | | | | | | | | | | | | | | parallel for' on host https://reviews.llvm.org/D29508 This patch makes the following additions: 1. It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation. 2. It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses. It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code. Looking forward to comments. llvm-svn: 301223
* Spelling mistakes in comments. NFCI. (PR27635)Simon Pilgrim2017-03-301-2/+2
| | | | llvm-svn: 299083
* Use arg_begin() instead of getArgumentList().begin(), the argument list is ↵Reid Kleckner2017-03-161-3/+1
| | | | | | an implementation detail llvm-svn: 297975
* Promote ConstantInitBuilder to be a public CodeGen API; it'sJohn McCall2017-03-021-1/+1
| | | | | | a generally useful utility for other frontends. NFC. llvm-svn: 296806
* [OpenMP] Fix cancellation point in task with no cancelJonas Hahnfeld2017-02-171-1/+3
| | | | | | | | | With tasks, the cancel may happen in another task. This has a different region info which means that we can't find it here. Differential Revision: https://reviews.llvm.org/D30091 llvm-svn: 295474
* [OpenMP] Remove barriers at cancel and cancellation pointJonas Hahnfeld2017-02-171-6/+0
| | | | | | | | | | | | | | | | This resolves a deadlock with the cancel directive when there is no explicit cancellation point. In that case, the implicit barrier acts as cancellation point. After removing the barrier after cancel, the now unmatched barrier for the explicit cancellation point has to go as well. This has probably worked before rL255992: With the calls for the explicit barrier, it was sure that all threads passed a barrier before exiting. Reported by Simon Convent and Joachim Protze! Differential Revision: https://reviews.llvm.org/D30088 llvm-svn: 295473
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295333
* Revert r295319 while investigating buildbot failure.Arpith Chacko Jacob2017-02-161-17/+14
| | | | llvm-svn: 295323
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295319
* [OpenMP] Codegen support for 'target teams' on the host.Arpith Chacko Jacob2017-01-251-22/+53
| | | | | | | | | | | | | | | This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293005
* Reverting commit because an NVPTX patch sneaked in. Break up into twoArpith Chacko Jacob2017-01-251-53/+22
| | | | | | patches. llvm-svn: 293003
* [OpenMP] Codegen support for 'target teams' on the host.Arpith Chacko Jacob2017-01-251-22/+53
| | | | | | | | | | | | | | | This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293001
* [OpenMP] Support for the num_threads-clause on 'target parallel'.Arpith Chacko Jacob2017-01-251-25/+98
| | | | | | | | | | | | | | | The num_threads-clause on the combined directive applies to the 'parallel' region of this construct. We modify the NumThreadsClause class to capture the clause expression within the 'target' region. The offload runtime call for 'target parallel' is changed to __tgt_target_teams() with 1 team and the number of threads set by this clause or a default if none. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29082 llvm-svn: 292997
* [OpenMP] Codegen support for 'target parallel' on the host.Arpith Chacko Jacob2017-01-181-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292419
* Revert r292374 to debug Windows buildbot failure.Arpith Chacko Jacob2017-01-181-25/+5
| | | | llvm-svn: 292400
* [OpenMP] Codegen support for 'target parallel' on the host.Arpith Chacko Jacob2017-01-181-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292374
* [OpenMP] Refactor code that calls codegen for target regions on the device.Arpith Chacko Jacob2017-01-161-15/+17
| | | | | | | | | | | | This patch refactors code that calls codegen for target regions. Currently the codebase only supports the 'target' directive. The patch pulls out common target processing code into a static function that can be called by codegen for any target directive. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28752 llvm-svn: 292134
* Remove unused lambda captures. NFCMalcolm Parsons2017-01-131-11/+10
| | | | llvm-svn: 291939
* [OpenMP] Basic support for a parallel directive in a target region on an ↵Arpith Chacko Jacob2017-01-101-7/+9
| | | | | | | | | | | | | | | | | | | | | NVPTX device Summary: This patch introduces support for the execution of parallel constructs in a target region on the NVPTX device. Parallel regions must be in the lexical scope of the target directive. The master thread in the master warp signals parallel work for worker threads in worker warps on encountering a parallel region. Note: The patch does not yet support capture of arguments in a parallel region so the test cases are simple. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28145 llvm-svn: 291565
* [OpenMP] Add fields for flags in the offload entry descriptor.Samuel Antao2017-01-051-5/+18
| | | | | | | | | | | | | | | | | Summary: This patch adds two fields to the offload entry descriptor. One field is meant to signal Ctors/Dtors and `link` global variables, and the other is reserved for runtime library use. Currently, these fields are only filled with zeros in the current code generation, but that will change when `declare target` is added. The reason, we are adding these fields now is to make the code generation consistent with the runtime library proposal under review in https://reviews.llvm.org/D14031. Reviewers: ABataev, hfinkel, carlo.bertolli, kkwli0, arpith-jacob, Hahnfeld Subscribers: cfe-commits, caomhin, jholewinski Differential Revision: https://reviews.llvm.org/D28298 llvm-svn: 291124
* Cleanup the handling of noinline function attributes, -fno-inline,Chandler Carruth2016-12-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -fno-inline-functions, -O0, and optnone. These were really, really tangled together: - We used the noinline LLVM attribute for -fno-inline - But not for -fno-inline-functions (breaking LTO) - But we did use it for -finline-hint-functions (yay, LTO is happy!) - But we didn't for -O0 (LTO is sad yet again...) - We had weird structuring of CodeGenOpts with both an inlining enumeration and a boolean. They interacted in weird ways and needlessly. - A *lot* of set smashing went on with setting these, and then got worse when we considered optnone and other inlining-effecting attributes. - A bunch of inline affecting attributes were managed in a completely different place from -fno-inline. - Even with -fno-inline we failed to put the LLVM noinline attribute onto many generated function definitions because they didn't show up as AST-level functions. - If you passed -O0 but -finline-functions we would run the normal inliner pass in LLVM despite it being in the O0 pipeline, which really doesn't make much sense. - Lastly, we used things like '-fno-inline' to manipulate the pass pipeline which forced the pass pipeline to be much more parameterizable than it really needs to be. Instead we can *just* use the optimization level to select a pipeline and control the rest via attributes. Sadly, this causes a bunch of churn in tests because we don't run the optimizer in the tests and check the contents of attribute sets. It would be awesome if attribute sets were a bit more FileCheck friendly, but oh well. I think this is a significant improvement and should remove the semantic need to change what inliner pass we run in order to comply with the requested inlining semantics by relying completely on attributes. It also cleans up tho optnone and related handling a bit. One unfortunate aspect of this is that for generating alwaysinline routines like those in OpenMP we end up removing noinline and then adding alwaysinline. I tried a bunch of other approaches, but because we recompute function attributes from scratch and don't have a declaration here I couldn't find anything substantially cleaner than this. Differential Revision: https://reviews.llvm.org/D28053 llvm-svn: 290398
* Fix typo and remove unnecessary statement.Samuel Antao2016-12-121-2/+1
| | | | llvm-svn: 289458
* Fix format and a few typos in comments.Samuel Antao2016-12-121-6/+5
| | | | llvm-svn: 289450
* Hide the result of building a constant initializer. NFC.John McCall2016-11-281-1/+1
| | | | llvm-svn: 288080
* ConstantBuilder -> ConstantInitBuilder for clarity, andJohn McCall2016-11-281-4/+4
| | | | | | | move the member classes up to top level to allow forward declarations to name them. NFC. llvm-svn: 288079
* [CodeGen] Pass objects that are expensive to copy by const ref.Benjamin Kramer2016-11-241-1/+1
| | | | | | | No functionality change. Found by clang-tidy's performance-unnecessary-value-param. llvm-svn: 287894
* Introduce a helper class for building complex constant initializers. NFC.John McCall2016-11-191-44/+51
| | | | | | | I've adopted this in most of the places it makes sense, but v-tables and CGObjCMac will need a second pass. llvm-svn: 287437
* Bitcode: Change module reader functions to return an llvm::Expected.Peter Collingbourne2016-11-131-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D26562 llvm-svn: 286752
* Mirror the llvm changes that split Bitcode/ReaderWriter.hTeresa Johnson2016-11-111-1/+1
| | | | | | | | | | The change in D26502 splits ReaderWriter.h, which contains the APIs into both the BitReader and BitWriter libraries, into BitcodeReader.h and BitcodeWriter.h. Change clang uses to the appropriate split header(s). llvm-svn: 286567
* Add FIXMEs for MSVC 2013 hacks in r277211. NFC.Paul Robinson2016-08-011-0/+5
| | | | llvm-svn: 277396
* Fix VS2013 build of CGOpenMPRuntime.cppHans Wennborg2016-07-301-3/+7
| | | | | | | | It seems the compiler was getting confused by the in-class initializers in local struct MapInfo, so moving those to a default constructor instead. llvm-svn: 277256
* Fix CGOpenMPRuntime.cpp for VS2013. NFC.Paul Robinson2016-07-291-10/+11
| | | | | | I don't know why these changes work but they do. llvm-svn: 277211
* [OpenMP] Change name of variable in mappble expression.Samuel Antao2016-07-281-7/+7
| | | | | | This attempts to fix a failure in Windows bots pottentially related with a reserved keyword. llvm-svn: 276988
* [OpenMP] Do not use default argument in lambda from mappable expressions ↵Samuel Antao2016-07-281-4/+7
| | | | | | | | handlers. Windows bots were complaining about that. llvm-svn: 276981
* [OpenMP] Code generation for the is_device_ptr clauseSamuel Antao2016-07-281-4/+40
| | | | | | | | | | | | Summary: This patch adds support for the is_device_ptr clause. It expands SEMA to use the mappable expression logic that can only be tested with code generation in place and check conflicts with other data sharing related clauses using the mappable expressions infrastructure. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: https://reviews.llvm.org/D22788 llvm-svn: 276978
* [OpenMP] Codegen for use_device_ptr clause.Samuel Antao2016-07-281-131/+247
| | | | | | | | | | | | Summary: This patch adds support for the use_device_ptr clause. It includes changes in SEMA that could not be tested without codegen, namely, the use of the first private logic and mappable expressions support. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: https://reviews.llvm.org/D22691 llvm-svn: 276977
* [OpenMP] Add support to map member expressions with references to pointers.Samuel Antao2016-07-271-3/+23
| | | | | | | | | | | | Summary: This patch add support to map pointers through references in class members. Although a reference does not have storage that a user can access, it still has to be mapped in order to get the deep copy right and the dereferencing code work properly. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: https://reviews.llvm.org/D22787 llvm-svn: 276934
* [OpenMP] Add support for mapping array sections through pointer references.Samuel Antao2016-07-271-8/+6
| | | | | | | | | | | | | | | Summary: This patch fixes a bug in the map of array sections whose base is a reference to a pointer. The existing mapping support was not prepared to deal with it, causing the compiler to crash. Mapping a reference to a pointer enjoys the same characteristics of a regular pointer, i.e., it is passed by value. Therefore, the reference has to be materialized in the target region. Reviewers: hfinkel, carlo.bertolli, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: https://reviews.llvm.org/D22690 llvm-svn: 276933
* Use more ArrayRefsDavid Majnemer2016-06-241-1/+1
| | | | | | No functional change is intended, just a small refactoring. llvm-svn: 273647
* Re-apply r272900 - [OpenMP] Cast captures by copy when passed to fork call ↵Samuel Antao2016-06-161-26/+5
| | | | | | | | so that they are compatible to what the runtime library expects. An issue in one of the regression tests was fixed for 32-bit hosts. llvm-svn: 272931
* Revert r272900 - [OpenMP] Cast captures by copy when passed to fork call so ↵Samuel Antao2016-06-161-5/+26
| | | | | | | | that they are compatible to what the runtime library expects. Was causing trouble in one of the regression tests for a 32-bit address space. llvm-svn: 272908
* [OpenMP] Cast captures by copy when passed to fork call so that they are ↵Samuel Antao2016-06-161-26/+5
| | | | | | | | | | | | | | | | | compatible to what the runtime library expects. Summary: This patch fixes an issue detected when firstprivate variables are passed to an OpenMP outlined function vararg list. Currently they are not compatible with what the runtime library expects causing malfunction in some targets. This patch fixes the issue by moving the casting logic already in place for offloading to the common code that creates the outline function and arguments and updates the regression tests accordingly. Reviewers: hfinkel, arpith-jacob, carlo.bertolli, kkwli0, ABataev Subscribers: cfe-commits, caomhin Differential Revision: http://reviews.llvm.org/D21150 llvm-svn: 272900
* Update clang for D20348Peter Collingbourne2016-06-141-5/+5
| | | | | | Differential Revision: http://reviews.llvm.org/D20339 llvm-svn: 272710
* Remove a few gendered pronouns.Nico Weber2016-06-101-1/+1
| | | | llvm-svn: 272415
* [OPENMP 4.5] Additional codegen for statically scheduled loops withAlexey Bataev2016-05-301-6/+22
| | | | | | | | 'simd' modifier. Runtime library defines new schedule constant kmp_sch_static_balanced_chunked = 45 for static loop-based directives static with chunk adjustment (e.g., simd). Added codegen for this kind of schedule. llvm-svn: 271204
* [OPENMP 4.5] Fixed codegen for 'priority' and destructors in task-basedAlexey Bataev2016-05-301-15/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | directives. 'kmp_task_t' record type added a new field for 'priority' clause and changed the representation of pointer to destructors for privates used within loop-based directives. Old representation: typedef struct kmp_task { /* GEH: Shouldn't this be aligned somehow? */ void *shareds; /**< pointer to block of pointers to shared vars */ kmp_routine_entry_t routine; /**< pointer to routine to call for executing task */ kmp_int32 part_id; /**< part id for the task */ kmp_routine_entry_t destructors; /* pointer to function to invoke deconstructors of firstprivate C++ objects */ /* private vars */ } kmp_task_t; New representation: typedef struct kmp_task { /* GEH: Shouldn't this be aligned somehow? */ void *shareds; /**< pointer to block of pointers to shared vars */ kmp_routine_entry_t routine; /**< pointer to routine to call for executing task */ kmp_int32 part_id; /**< part id for the task */ kmp_cmplrdata_t data1; /* Two known optional additions: destructors and priority */ kmp_cmplrdata_t data2; /* Process destructors first, priority second */ /* future data */ /* private vars */ } kmp_task_t; Also excessive initialization of 'destructors' fields to 'null' was removed from codegen if it is known that no destructors shal be used. Currently a special bit is used in 'kmp_tasking_flags_t' bitfields ('destructors_thunk' bitfield). llvm-svn: 271201
* [OpenMP] Codegen for target update directive.Samuel Antao2016-05-261-16/+62
| | | | | | | | | | | | Summary: This patch implements the code generation for the `target update` directive. The implemntation relies on the logic already in place for target data standalone directives, i.e. target enter/exit data. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: http://reviews.llvm.org/D20650 llvm-svn: 270886
* [OpenMP] Add support for the 'private pointer' flag to signal variables ↵Samuel Antao2016-05-261-71/+117
| | | | | | | | | | | | | | captured in target regions and used in first-private clauses. Summary: If a variable is implicitly mapped (doesn't show in a map clause), the runtime library has to be informed if the corresponding capture shows up in first-private clause, so that the storage previously allocated in the device is used. This patch adds the support for that. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: http://reviews.llvm.org/D20112 llvm-svn: 270870
OpenPOWER on IntegriCloud