summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen/CGStmtOpenMP.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Add OpenMP dist_schedule clause to distribute directive and related ↵Carlo Bertolli2016-01-151-0/+1
| | | | | | regression tests. llvm-svn: 257917
* [OpenMP] Reapply rL256842: [OpenMP] Offloading descriptor registration and ↵Samuel Antao2016-01-061-9/+30
| | | | | | | | | | | | device codegen. This patch attempts to fix the regressions identified when the patch was committed initially. Thanks to Michael Liao for identifying the fix in the offloading metadata generation related with side effects in evaluation of function arguments. llvm-svn: 256933
* [OpenMP] Revert rL256842: [OpenMP] Offloading descriptor registration and ↵Samuel Antao2016-01-051-30/+9
| | | | | | | | device codegen. It was causing two regression, so I'm reverting until the cause is found. llvm-svn: 256858
* [OpenMP] Offloading descriptor registration and device codegen.Samuel Antao2016-01-051-9/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In order to offloading work properly two things need to be in place: - a descriptor with all the offloading information (device entry functions, and global variable) has to be created by the host and registered in the OpenMP offloading runtime library. - all the device functions need to be emitted for the device and a convention has to be in place so that the runtime library can easily map the host ID of an entry point with the actual function in the device. This patch adds support for these two things. However, only entry functions are being registered given that 'declare target' directive is not yet implemented. About offloading descriptor: The details of the descriptor are explained with more detail in http://goo.gl/L1rnKJ. Basically the descriptor will have fields that specify the number of devices, the pointers to where the device images begin and end (that will be defined by the linker), and also pointers to a the begin and end of table whose entries contain information about a specific entry point. Each entry has the type: ``` struct __tgt_offload_entry{ void *addr; char *name; int64_t size; }; ``` and will be implemented in a pre determined (ELF) section `.omp_offloading.entries` with 1-byte alignment, so that when all the objects are linked, the table is in that section with no padding in between entries (will be like a C array). The code generation ensures that all `__tgt_offload_entry` entries are emitted in the same order for both host and device so that the runtime can have the corresponding entries in both host and device in same index of the table, and efficiently implement the mapping. The resulting descriptor is registered/unregistered with the runtime library using the calls `__tgt_register_lib` and `__tgt_unregister_lib`. The registration is implemented in a high priority global initializer so that the registration happens always before any initializer (that can potentially include target regions) is run. The driver flag -omptargets= was created to specify a comma separated list of devices the user wants to support so that the new functionality can be exercised. Each device is specified with its triple. About target codegen: The target codegen is pretty much straightforward as it reuses completely the logic of the host version for the same target region. The tricky part is to identify the meaningful target regions in the device side. Unlike other programming models, like CUDA, there are no already outlined functions with attributes that mark what should be emitted or not. So, the information on what to emit is passed in the form of metadata in host bc file. This requires a new option to pass the host bc to the device frontend. Then everything is similar to what happens in CUDA: the global declarations emission is intercepted to check to see if it is an "interesting" declaration. The difference is that instead of checking an attribute, the metadata information in checked. Right now, there is only a form of metadata to pass information about the device entry points (target regions). A class `OffloadEntriesInfoManagerTy` was created to manage all the information and queries related with the metadata. The metadata looks like this: ``` !omp_offload.info = !{!0, !1, !2, !3, !4, !5, !6} !0 = !{i32 0, i32 52, i32 77426347, !"_ZN2S12r1Ei", i32 479, i32 13, i32 4} !1 = !{i32 0, i32 52, i32 77426347, !"_ZL7fstatici", i32 461, i32 11, i32 5} !2 = !{i32 0, i32 52, i32 77426347, !"_Z9ftemplateIiET_i", i32 444, i32 11, i32 6} !3 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 99, i32 11, i32 0} !4 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 272, i32 11, i32 3} !5 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 127, i32 11, i32 1} !6 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 159, i32 11, i32 2} ``` The fields in each metadata entry are (in sequence): Entry 1) an ID of the type of metadata - right now only zero is used meaning "OpenMP target region". Entry 2) a unique ID of the device where the input source file that contain the target region lives. Entry 3) a unique ID of the file where the input source file that contain the target region lives. Entry 4) a mangled name of the function that encloses the target region. Entries 5) and 6) line and column number where the target region was found. Entry 7) is the order the entry was emitted. Entry 2) and 3) are required to distinguish files that have the same function name. Entry 4) is required to distinguish different instances of the same declaration (usually templated ones) Entries 5) and 6) are required to distinguish the particular target region in body of the function (it is possible that a given target region is not an entry point - if clause can evaluate always to zero - and therefore we need to identify the "interesting" target regions. ) This patch replaces http://reviews.llvm.org/D12306. Reviewers: ABataev, hfinkel, tra, rjmccall, sfantao Subscribers: FBrygidyn, piotr.rak, Hahnfeld, cfe-commits Differential Revision: http://reviews.llvm.org/D12614 llvm-svn: 256842
* [OPENMP 4.5] Codegen for 'schedule' clause with monotonic/nonmonotonic ↵Alexey Bataev2015-12-311-26/+51
| | | | | | | | modifiers. OpenMP 4.5 adds support for monotonic/nonmonotonic modifiers in 'schedule' clause. Add codegen for these modifiers. llvm-svn: 256666
* [OPENMP] Remove explicit call for implicit barrierAlexey Bataev2015-12-181-20/+0
| | | | | | | #pragma omp parallel needs an implicit barrier that is currently done by an explicit call to __kmpc_barrier. However, the runtime already ensures a barrier in __kmpc_fork_call which currently leads to two barriers per region per thread. Differential Revision: http://reviews.llvm.org/D15561 llvm-svn: 255992
* [OPENMP] Fix for http://llvm.org/PR25878: Error compiling an OpenMP programAlexey Bataev2015-12-181-6/+32
| | | | | | OpenMP codegen tried to emit the code for its constructs even if it was detected as a dead-code. Added checks to ensure that the code is emitted if the code is not dead. llvm-svn: 255990
* [OPENMP 4.5] Codegen for 'hint' clause of 'critical' directiveAlexey Bataev2015-12-151-2/+6
| | | | | | OpenMP 4.5 defines 'hint' clause for 'critical' directive. Patch adds codegen for this clause. llvm-svn: 255639
* [OPENMP 4.5] Parsing/sema for 'hint' clause of 'critical' directive.Alexey Bataev2015-12-151-0/+1
| | | | | | OpenMP 4.5 adds 'hint' clause to critical directive. Patch adds parsing/semantic analysis for this clause. llvm-svn: 255625
* Add parse and sema of OpenMP distribute directive with all clauses except ↵Carlo Bertolli2015-12-141-0/+5
| | | | | | dist_schedule llvm-svn: 255498
* [OPENMP] Fix debug info for 'atomic' construct.Alexey Bataev2015-12-141-1/+2
| | | | | | Debug info for statement under 'atomic' construct must point exactly to that statement, not the directive itself. llvm-svn: 255487
* Revert r255001, "Add parse and sema for OpenMP distribute directive and all ↵NAKAMURA Takumi2015-12-091-5/+0
| | | | | | | | its clauses excluding dist_schedule." It causes memory leak. Some tests in test/OpenMP would fail. llvm-svn: 255094
* [OPENMP 4.5] Parsing/sema for 'num_tasks' clause.Alexey Bataev2015-12-081-0/+1
| | | | | | OpenMP 4.5 adds directives 'taskloop' and 'taskloop simd'. These directives support clause 'num_tasks'. Patch adds parsing/semantic analysis for this clause. llvm-svn: 255008
* Add parse and sema for OpenMP distribute directive and all its clauses ↵Carlo Bertolli2015-12-081-0/+5
| | | | | | excluding dist_schedule. llvm-svn: 255001
* [OPENMP 4.5] parsing/sema support for 'grainsize' clause.Alexey Bataev2015-12-071-0/+1
| | | | | | OpenMP 4.5 adds 'taksloop' and 'taskloop simd' directives, which have 'grainsize' clause. Patch adds parsing/sema analysis of this clause. llvm-svn: 254903
* [OPENMP 4.5] parsing/sema support for 'nogroup' clause.Alexey Bataev2015-12-071-0/+1
| | | | | | OpenMP 4.5 adds 'taskloop' and 'taskloop simd' directives. These directives have new 'nogroup' clause. Patch adds basic parsing/sema support for this clause. llvm-svn: 254899
* [PGO] Instrument only base constructors and destructors.Serge Pavlov2015-12-061-1/+1
| | | | | | | | | | | | | | | | Constructors and destructors may be represented by several functions in IR. Only base structors correspond to source code, others are small pieces of code and eventually call the base variant. In this case instrumentation of non-base structors has little sense, this fix remove it. Now profile data of a declaration corresponds to exactly one function in IR, it agrees with the current logic of the profile data loading. This change fixes PR24996. Differential Revision: http://reviews.llvm.org/D15158 llvm-svn: 254876
* [OPENMP 4.5] Parsing/sema support for 'omp taskloop simd' directive.Alexey Bataev2015-12-031-0/+9
| | | | | | OpenMP 4.5 adds directive 'taskloop simd'. Patch adds parsing/sema analysis for 'taskloop simd' directive and its clauses. llvm-svn: 254597
* [OpenMP] Update target directive codegen to use 4.5 implicit data mappings.Samuel Antao2015-12-021-23/+67
| | | | | | | | | | | | | | | Summary: This patch implements the 4.5 specification for the implicit data maps. OpenMP 4.5 specification changes the default way data is captured into a target region. All the non-aggregate kinds are passed by value by default. This required activating the capturing by value during SEMA for the target region. All the non-aggregate values that can be encoded in the size of a pointer are properly casted and forwarded to the runtime library. On top of fixing the previous weird behavior for mapping pointers in nested data regions (an explicit map was always required), this also improves performance as the number of allocations/transactions to the device per non-aggregate map are reduced from two to only one - instead of passing a reference and the value, only the value passed. Explicit maps will be added later on once firstprivate, private, and map clauses' SEMA and parsing are available. Reviewers: hfinkel, rjmccall, ABataev Subscribers: cfe-commits, carlo.bertolli Differential Revision: http://reviews.llvm.org/D14940 llvm-svn: 254521
* [OPENMP 4.5] Parsing/sema analysis for 'priority' clause.Alexey Bataev2015-12-011-0/+1
| | | | | | OpenMP 4.5 defines new clause 'priority' for 'task', 'taskloop' and 'taskloop simd' directives. Added parsing and sema analysis for 'priority' clause in 'task' and 'taskloop' directives. llvm-svn: 254398
* [OPENMP 4.5] Parsing/sema analysis for 'taskloop' directive.Alexey Bataev2015-12-011-1/+9
| | | | | | Adds initial parsing and semantic analysis for 'taskloop' directive. llvm-svn: 254367
* [OpenMP] Parsing and sema support for thread_limit clause.Kelvin Li2015-11-271-0/+1
| | | | | | http://reviews.llvm.org/D15029 llvm-svn: 254207
* [OpenMP] Parsing and sema support for num_teams clauseKelvin Li2015-11-241-0/+1
| | | | | | http://reviews.llvm.org/D14802 llvm-svn: 254019
* [OpenMP] Parsing and sema support for map clauseKelvin Li2015-11-231-0/+1
| | | | | | http://reviews.llvm.org/D14134 llvm-svn: 253849
* CGStmtOpenMP.cpp: Prune redundant \param. [-Wdocumentation]NAKAMURA Takumi2015-10-081-1/+0
| | | | llvm-svn: 249698
* [OPENMP 4.1] Codegen for array sections/subscripts in 'reduction' clause.Alexey Bataev2015-10-081-22/+207
| | | | | | OpenMP 4.1 adds support for array sections/subscripts in 'reduction' clause. Patch adds codegen for this feature. llvm-svn: 249672
* [OpenMP] Target directive host codegen.Samuel Antao2015-10-021-5/+52
| | | | | | | | | | | This patch implements the outlining for offloading functions for code annotated with the OpenMP target directive. It uses a temporary naming of the outlined functions that will have to be updated later on once target side codegen and registration of offloading libraries is implemented - the naming needs to be made unique in the produced library. llvm-svn: 249148
* [OPENMP 4.1] Codegen for ‘simd’ clause in ‘ordered’ directive.Alexey Bataev2015-09-291-3/+23
| | | | | | | | | | | Description. If the simd clause is specified, the ordered regions encountered by any thread will use only a single SIMD lane to execute the ordered regions in the order of the loop iterations. Restrictions. An ordered construct with the simd clause is the only OpenMP construct that can appear in the simd region. An ordered directive with ‘simd’ clause is generated as an outlined function and corresponding function call to prevent this part of code from vectorization later in backend. llvm-svn: 248772
* [OPENMP 4.1] Add 'simd' clause for 'ordered' directive.Alexey Bataev2015-09-281-0/+1
| | | | | | | | | | | Parsing and sema analysis for 'simd' clause in 'ordered' directive. Description If the simd clause is specified, the ordered regions encountered by any thread will use only a single SIMD lane to execute the ordered regions in the order of the loop iterations. Restrictions An ordered construct with the simd clause is the only OpenMP construct that can appear in the simd region llvm-svn: 248696
* [OPENMP 4.1] Add 'threads' clause for '#pragma omp ordered'.Alexey Bataev2015-09-251-0/+1
| | | | | | | | OpenMP 4.1 extends format of '#pragma omp ordered'. It adds 3 additional clauses: 'threads', 'simd' and 'depend'. If no clause is specified, the ordered construct behaves as if the threads clause had been specified. If the threads clause is specified, the threads in the team executing the loop region execute ordered regions sequentially in the order of the loop iterations. The loop region to which an ordered region without any clause or with a threads clause binds must have an ordered clause without the parameter specified on the corresponding loop directive. llvm-svn: 248569
* [OPENMP 4.0] Add 'if' clause for 'cancel' directive.Alexey Bataev2015-09-181-1/+9
| | | | | | Add parsing, sema analysis and codegen for 'if' clause in 'cancel' directive. llvm-svn: 247976
* [OPENMP] Emit __kmpc_cancel_barrier() and code for 'cancellation point' only ↵Alexey Bataev2015-09-151-22/+38
| | | | | | | | | | | | | | if 'cancel' is found. Patch improves codegen for OpenMP constructs. If the OpenMP region does not have internal 'cancel' construct, a call to 'void __kmpc_barrier()' runtime function is generated for all implicit/explicit barriers. If the region has inner 'cancel' directive, then ``` if (__kmpc_cancel_barrier()) exit from outer construct; ``` code is generated. Also, the code for 'canellation point' directive is not generated if parent directive does not have 'cancel' directive. llvm-svn: 247681
* [OPENMP] Preserve alignment of the original variables for the captured ↵Alexey Bataev2015-09-111-1/+2
| | | | | | | | references. Patch makes codegen to preserve alignment of the shared variables captured and used in OpenMP regions. llvm-svn: 247401
* [OPENMP] Outlined function for parallel and other regions with list of ↵Alexey Bataev2015-09-101-2/+113
| | | | | | | | | captured variables. Currently all variables used in OpenMP regions are captured into a record and passed to outlined functions in this record. It may result in some poor performance because of too complex analysis later in optimization passes. Patch makes to emit outlined functions for parallel-based regions with a list of captured variables. It reduces code for 2*n GEPs, stores and loads at least. Codegen for task-based regions remains unchanged because runtime requires that all captured variables are passed in captured record. llvm-svn: 247251
* Compute and preserve alignment more faithfully in IR-generation.John McCall2015-09-081-104/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce an Address type to bundle a pointer value with an alignment. Introduce APIs on CGBuilderTy to work with Address values. Change core APIs on CGF/CGM to traffic in Address where appropriate. Require alignments to be non-zero. Update a ton of code to compute and propagate alignment information. As part of this, I've promoted CGBuiltin's EmitPointerWithAlignment helper function to CGF and made use of it in a number of places in the expression emitter. The end result is that we should now be significantly more correct when performing operations on objects that are locally known to be under-aligned. Since alignment is not reliably tracked in the type system, there are inherent limits to this, but at least we are no longer confused by standard operations like derived-to-base conversions and array-to-pointer decay. I've also fixed a large number of bugs where we were applying the complete-object alignment to a pointer instead of the non-virtual alignment, although most of these were hidden by the very conservative approach we took with member alignment. Also, because IRGen now reliably asserts on zero alignments, we should no longer be subject to an absurd but frustrating recurring bug where an incomplete type would report a zero alignment and then we'd naively do a alignmentAtOffset on it and emit code using an alignment equal to the largest power-of-two factor of the offset. We should also now be emitting much more aggressive alignment attributes in the presence of over-alignment. In particular, field access now uses alignmentAtOffset instead of min. Several times in this patch, I had to change the existing code-generation pattern in order to more effectively use the Address APIs. For the most part, this seems to be a strict improvement, like doing pointer arithmetic with GEPs instead of ptrtoint. That said, I've tried very hard to not change semantics, but it is likely that I've failed in a few places, for which I apologize. ABIArgInfo now always carries the assumed alignment of indirect and indirect byval arguments. In order to cut down on what was already a dauntingly large patch, I changed the code to never set align attributes in the IR on non-byval indirect arguments. That is, we still generate code which assumes that indirect arguments have the given alignment, but we don't express this information to the backend except where it's semantically required (i.e. on byvals). This is likely a minor regression for those targets that did provide this information, but it'll be trivial to add it back in a later patch. I partially punted on applying this work to CGBuiltin. Please do not add more uses of the CreateDefaultAligned{Load,Store} APIs; they will be going away eventually. llvm-svn: 246985
* [OPENMP] Fix for http://llvm.org/PR24674: assertion failed and and abort trapAlexey Bataev2015-09-041-0/+6
| | | | | | Fix processing of shared variables with reference types in OpenMP constructs. Previously, if the variable was not marked in one of the private clauses, the reference to this variable was emitted incorrectly and caused an assertion later. llvm-svn: 246846
* [OPENMP 4.1] Codegen for extended format of 'if' clause.Alexey Bataev2015-09-031-4/+12
| | | | | | Fixed codegen for extended format of 'if' clauses with special 'directive-name-modifier' + ast-print tests for extended format of 'if' clause. llvm-svn: 246748
* [OpenMP] Make the filetered clause iterator a real iterator and type safe.Benjamin Kramer2015-08-301-64/+42
| | | | | | | | | This replaces the filtered generic iterator with a type-specfic one based on dyn_cast instead of comparing the kind enum. This allows us to use range-based for loops and eliminates casts. No functionality change intended. llvm-svn: 246384
* [OPENMP 4.1] Add codegen for 'simdlen' clause.Alexey Bataev2015-08-211-4/+14
| | | | | | | | | Add emission of metadata for simd loops in presence of 'simdlen' clause. If 'simdlen' clause is provided without 'safelen' clause, the vectorizer width for the loop is set to value of 'simdlen' clause + all read/write ops in loop are marked with '!llvm.mem.parallel_loop_access' metadata. If 'simdlen' clause is provided along with 'safelen' clause, the vectorizer width for the loop is set to value of 'simdlen' clause + all read/write ops in loop are not marked with '!llvm.mem.parallel_loop_access' metadata. If 'safelen' clause is provided without 'simdlen' clause, the vectorizer width for the loop is set to value of 'safelen' clause + all read/write ops in loop are not marked with '!llvm.mem.parallel_loop_access' metadata. llvm-svn: 245697
* [OPENMP 4.1] Initial support for 'simdlen' clause.Alexey Bataev2015-08-211-0/+1
| | | | | | Add parsing/sema analysis for 'simdlen' clause in simd directives. Also add check that if both 'safelen' and 'simdlen' clauses are specified, the value of 'simdlen' parameter is less than the value of 'safelen' parameter. llvm-svn: 245692
* [OPENMP 4.1] Allow variables with reference types in private clauses.Alexey Bataev2015-08-181-6/+9
| | | | | | OpenMP 4.1 allows to use variables with reference types in all private clauses (private, firstprivate, lastprivate, linear etc.). Patch allows to use such variables and fixes codegen for linear variables with reference types. llvm-svn: 245268
* [OPENMP] Fix for http://llvm.org/PR24371: Assert failure compiling blender 2.75.Alexey Bataev2015-08-141-19/+1
| | | | | | | blender uses statements expression in condition of the loop under control of the '#pragma omp parallel for'. This condition is used several times in different expressions required for codegen of the loop directive. If there are some variables defined in statement expression, it fires an assert during codegen because of redefinition of the same variables. We have to rebuild several expression to be sure that all variables are unique. llvm-svn: 245041
* This patch fixes the assert in emitting captured code in the target data ↵Michael Wong2015-08-111-1/+3
| | | | | | | | | construct. This is on behalf of Kelvin Li. http://reviews.llvm.org/D11475 llvm-svn: 244569
* Propagate SourceLocations through to get a Loc on float_cast_overflowFilipe Cabecinhas2015-08-111-20/+25
| | | | | | | | | | | | | | | Summary: float_cast_overflow is the only UBSan check without a source location attached. This patch propagates SourceLocations where necessary to get them to the EmitCheck() call. Reviewers: rsmith, ABataev, rjmccall Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D11757 llvm-svn: 244568
* This patch commits OpenMP 4 target device clausesMichael Wong2015-08-071-0/+1
| | | | | | | This is committed on behalf of Kelvin Li http://reviews.llvm.org/D11469?id=31227 llvm-svn: 244325
* [OPENMP 4.1] Allow references in init expression for loop-based constructs.Alexey Bataev2015-08-061-8/+18
| | | | | | OpenMP 4.1 allows to use variables with reference types in private clauses and, therefore, in init expressions of the cannonical loop forms. llvm-svn: 244209
* [OpenMP] Add capture for threadprivate variables used in copyin clauseSamuel Antao2015-07-271-4/+16
| | | | | | if TLS is enabled in OpenMP code generation. llvm-svn: 243277
* Commit for http://reviews.llvm.org/D10765Michael Wong2015-07-211-0/+9
| | | | | | | for OpenMP 4 target data directive parsing and sema. This commit is on behalf of Kelvin Li. llvm-svn: 242785
* Make the variable names match the name of the metadata they control.Tyler Nowicki2015-07-141-2/+2
| | | | | | Rename Vectorizer to Vectorize and VectorizeUnroll to InterleaveCount. llvm-svn: 242241
* [OPENMP 4.0] Codegen for 'omp cancel' directive.Alexey Bataev2015-07-061-1/+2
| | | | | | | | | | Add the next codegen for 'omp cancel' directive: if (__kmpc_cancel()) { __kmpc_cancel_barrier(); <exit construct>; } llvm-svn: 241429
OpenPOWER on IntegriCloud