summaryrefslogtreecommitdiffstats
path: root/clang/test/OpenMP
Commit message (Collapse)AuthorAgeFilesLines
...
* [OPENMP][NVPTX]Allow to use shared memory for theAlexey Bataev2018-11-097-42/+43
| | | | | | | | | | target|teams|distribute variables. If the total size of the variables, declared in target|teams|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory. llvm-svn: 346507
* [OPENMP]Make lambda mapping follow reqs for PTR_AND_OBJ mapping.Alexey Bataev2018-11-081-0/+15
| | | | | | | | The base pointer for the lambda mapping must point to the lambda capture placement and pointer must point to the captured variable itself. Patch fixes this problem. llvm-svn: 346408
* [OPENMP]Fix handling of the globals during compilation for the device.Alexey Bataev2018-11-073-5/+42
| | | | | | | | Fixed lookup for the target regions in unused virtual functions + fixed processing of the global variables not marked as declare target but emitted during debug info emission. llvm-svn: 346343
* [OPENMP][NVPTX]Use __kmpc_data_sharing_coalesced_push_stack function.Alexey Bataev2018-11-022-5/+5
| | | | | | | | Coalesced memory access requires use of the new function `__kmpc_data_sharing_coalesced_push_stack` instead of the `__kmpc_data_sharing_push_stack`. llvm-svn: 345991
* [OPENMP]Change the mapping type for lambda captures.Alexey Bataev2018-11-021-4/+4
| | | | | | The previously used combination `PTR_AND_OBJ | PRIVATE` could be used for mapping of some data in Fortran. Changed it to `PTR_AND_OBJ | LITERAL`. llvm-svn: 345982
* [OPENMP][NVPTX]Improve emission of the globalized variables forAlexey Bataev2018-11-027-14/+99
| | | | | | | | | | | | | | | | | | | target/teams/distribute regions. Target/teams/distribute regions exist for all the time the kernel is executed. Thus, if the variable is declared in their context and then escape it, we can allocate global memory statically instead of allocating it dynamically. Patch captures all the globalized variables in target/teams/distribute contexts, merges them into the records, one per each target region. Those records are then joined into the union, one per compilation unit (to save the global memory). Those units are organized into 2 x dimensional arrays, where the first dimension is the number of blocks per SM and the second one is the number of SMs. Runtime functions manage this global memory space between the executing teams. llvm-svn: 345978
* Add support for 'atomic_default_mem_order' clause on 'requires' directive. ↵Patrick Lyster2018-11-024-1/+53
| | | | | | Also renamed test files relating to 'requires'. Differntial review: https://reviews.llvm.org/D53513 llvm-svn: 345967
* [OPENMP] Support for mapping of the lambdas in target regions.Alexey Bataev2018-10-301-0/+132
| | | | | | | | | | Added support for mapping of lambdas in the target regions. It scans all the captures by reference in the lambda, implicitly maps those variables in the target region and then later reinstate the addresses of references in lambda to the correct addresses of the captured|privatized variables. llvm-svn: 345609
* [OPENMP]Fix PR39372: Does not complain about loop bound variable notAlexey Bataev2018-10-2950-348/+354
| | | | | | | | | | | being shared. According to the standard, the variables with unspecified data-sharing attributes in presence of `default(none)` clause must be reported to users. Compiler did not generate error reports for the variables used in other OpenMP regions. Patch fixes this. llvm-svn: 345533
* [OpenMP] Fix condition.Gheorghe-Teodor Bercea2018-10-293-7/+7
| | | | | | | | | | | | | | Summary: Iteration variable must be strictly less than the number of iterations. This fixes a bug introduced by previous patch D53448. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53827 llvm-svn: 345527
* [OpenMP][NVPTX] Use single loops when generating code for distribute ↵Gheorghe-Teodor Bercea2018-10-293-177/+341
| | | | | | | | | | | | | | | | parallel for Summary: This patch adds a new code generation path for bound sharing directives containing distribute parallel for. The new code generation scheme applies to chunked schedules on distribute and parallel for directives. The scheme simplifies the code that is being generated by eliminating the need for an outer for loop over chunks for both distribute and parallel for directives. In the case of distribute it applies to any sized chunk while in the parallel for case it only applies when chunk size is 1. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53448 llvm-svn: 345509
* [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases.Gheorghe-Teodor Bercea2018-10-291-2/+24
| | | | | | | | | | | | | | Summary: This patch enables the choosing of the default schedule for parallel for loops even in non-SPMD cases. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53443 llvm-svn: 345507
* [OPENMP] Do not capture private loop counters.Alexey Bataev2018-10-296-9/+15
| | | | | | | | If the loop counter is not declared in the context of the loop and it is private, such loop counters should not be captured in the outlined regions. llvm-svn: 345505
* [NFC][OpenMP] Add new test for parallel for code generation.Gheorghe-Teodor Bercea2018-10-261-0/+101
| | | | | | | | | | | | | | | | Summary: This is a simple test of the parallel for code generation. It will be used to showcase the change introduced by patch D53443. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53772 llvm-svn: 345417
* [OPENMP]Fix PR39422: variables are not firstprivatized in task context.Alexey Bataev2018-10-252-1/+2
| | | | | | | | | | | According to the OpenMP standard, In a task generating construct, if no default clause is present, a variable for which the data-sharing attribute is not determined by the rules above is firstprivatized. Compiler tries to implement this, but if the variable is not directly used in the task context, this variable may not be firstprivatized. Patch fixes this problem. llvm-svn: 345277
* Do not always request an implicit taskgroup region inside the kmpc_taskloop ↵Alexey Bataev2018-10-2410-34/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | function Summary: For the following code: ``` int i; #pragma omp taskloop for (i = 0; i < 100; ++i) {} #pragma omp taskloop nogroup for (i = 0; i < 100; ++i) {} ``` Clang emits the following LLVM IR: ``` ... call void @__kmpc_taskgroup(%struct.ident_t* @0, i32 %0) %2 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8*)* bitcast (i32 (i32, %struct.kmp_task_t_with_privates*)* @.omp_task_entry. to i32 (i32, i8*)*)) ... call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %2, i32 1, i64* %8, i64* %9, i64 %13, i32 0, i32 0, i64 0, i8* null) call void @__kmpc_end_taskgroup(%struct.ident_t* @0, i32 %0) ... %15 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8*)* bitcast (i32 (i32, %struct.kmp_task_t_with_privates.1*)* @.omp_task_entry..2 to i32 (i32, i8*)*)) ... call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %15, i32 1, i64* %21, i64* %22, i64 %26, i32 0, i32 0, i64 0, i8* null) ``` The first set of instructions corresponds to the first taskloop construct. It is important to note that the implicit taskgroup region associated with the taskloop construct has been materialized in our IR: the `__kmpc_taskloop` occurs inside a taskgroup region. Note also that this taskgroup region does not exist in our second taskloop because we are using the `nogroup` clause. The issue here is the 4th argument of the kmpc_taskloop call, starting from the end, is always a zero. Checking the LLVM OpenMP RT implementation, we see that this argument corresponds to the nogroup parameter: ``` void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val, kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup, int sched, kmp_uint64 grainsize, void *task_dup); ``` So basically we always tell to the RT to do another taskgroup region. For the first taskloop, this means that we create two taskgroup regions. For the second example, it means that despite the fact we had a nogroup clause we are going to have a taskgroup region, so we unnecessary wait until all descendant tasks have been executed. Reviewers: ABataev Reviewed By: ABataev Subscribers: rogfer01, cfe-commits Differential Revision: https://reviews.llvm.org/D53636 llvm-svn: 345180
* [OPENMP]Fix PR39366: do not try to private field if it is not captured.Alexey Bataev2018-10-241-0/+13
| | | | | | | | | The compiler is crashing if we trying to post-capture the fields implicitly captured inside of the task constructs. Seems, this kind of processing is not supported and such fields should not be firstprivatized. llvm-svn: 345177
* Revert "[CodeGenCXX] Treat 'this' as noalias in constructors"Sean Fertile2018-10-1551-311/+311
| | | | | | | This reverts commit https://reviews.llvm.org/rL344150 which causes MachineOutliner related failures on the ppc64le multistage buildbot. llvm-svn: 344526
* [OPENMP][NVPTX]Reduce memory usage in target region.Alexey Bataev2018-10-123-34/+13
| | | | | | | Additional reduction of the global memory usage in the target regions without parallel regions. llvm-svn: 344413
* [OPENMP][NVPTX]Reduce memory usage in orphaned functions.Alexey Bataev2018-10-121-3/+9
| | | | | | | | | | | | if the function has globalized variables and called in context of target/teams/distribute regions, it does not need to globalize 32 copies of the same variables for memory coalescing, it is enough to have just one copy, because there is parallel region. Patch does this by adding call for `__kmpc_parallel_level` function and checking its return value. If the code sees that the parallel level is 0, then only one variable is allocated, not 32. llvm-svn: 344356
* [OPENMP][NVPTX]Reduce memory use for globalized vars inAlexey Bataev2018-10-111-5/+2
| | | | | | | | | | | target/teams/distribute regions. Previously introduced globalization scheme that uses memory coalescing scheme may increase memory usage fr the variables that are devlared in target/teams/distribute contexts. We don't need 32 copies of such variables, just 1. Patch reduces memory use in this case. llvm-svn: 344273
* Add support for 'dynamic_allocators' clause on 'requires' directive. ↵Patrick Lyster2018-10-112-1/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D53079 llvm-svn: 344249
* [CodeGenCXX] Treat 'this' as noalias in constructorsAnton Bikineev2018-10-1051-311/+311
| | | | | | | | | This is currently a clang extension and a resolution of the defect report in the C++ Standard. Differential Revision: https://reviews.llvm.org/D46441 llvm-svn: 344150
* [OPENMP][NVPTX] Support memory coalescing for globalized variables.Alexey Bataev2018-10-097-32/+66
| | | | | | | | | Added support for memory coalescing for better performance for globalized variables. From now on all the globalized variables are represented as arrays of 32 elements and each thread accesses these elements using `tid & 31` as index. llvm-svn: 344049
* [OPENMP][NVPTX] Fix emission of __kmpc_global_thread_num() for non-SPMDAlexey Bataev2018-10-051-2/+2
| | | | | | | | | mode. __kmpc_global_thread_num() should be called before initialization of the runtime. llvm-svn: 343857
* [OPENMP] Fix emission of the __kmpc_global_thread_num.Alexey Bataev2018-10-057-10/+10
| | | | | | | | | Fixed emission of the __kmpc_global_thread_num() so that it is not messed up with alloca instructions anymore. Plus, fixes emission of the __kmpc_global_thread_num() functions in the target outlined regions so that they are not called before runtime is initialized. llvm-svn: 343856
* [OPENMP] Add reverse_offload clause to requires directivePatrick Lyster2018-10-032-1/+8
| | | | llvm-svn: 343711
* [OpenMP][NVPTX] Simplify codegen for orphaned parallel, NFCI.Jonas Hahnfeld2018-10-021-8/+0
| | | | | | | | | | | Worker threads fork off to the compiler generated worker function directly after entering the kernel function. Hence, there is no need to check whether the current thread is the master if we are outside of a parallel region (neither SPMD nor parallel_level > 0). Differential Revision: https://reviews.llvm.org/D52732 llvm-svn: 343618
* [OPENMP][NVPTX] Handle `requires datasharing` flag correctly withAlexey Bataev2018-10-013-3/+3
| | | | | | | | lightweight runtime. The datasharing flag must be set to `1` when executing SPMD-mode compatible directive with reduction|lastprivate clauses. llvm-svn: 343492
* Add support for unified_shared_memory clause on requires directivePatrick Lyster2018-10-012-1/+10
| | | | llvm-svn: 343472
* [OPENMP]Fix PR39084: Check datasharing attributes of reduction variables only.Alexey Bataev2018-09-282-8/+8
| | | | | | | | According to OpenMP, the reduction item must be shared in parent region. But the item can be an array section or array subscript. In this case, we should not check for the datasharing of the base declaration. llvm-svn: 343356
* [OpenMP] Make default parallel for schedule in NVPTX target regions in SPMD ↵Gheorghe-Teodor Bercea2018-09-272-6/+6
| | | | | | | | | | | | | | | | mode achieve coalescing Summary: Set default schedule for parallel for loops to schedule(static, 1) when using SPMD mode on the NVPTX device offloading toolchain to ensure coalescing. Reviewers: ABataev, Hahnfeld, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D52629 llvm-svn: 343260
* [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD ↵Gheorghe-Teodor Bercea2018-09-272-8/+8
| | | | | | | | | | | | | | | | mode achieve coalescing Summary: For the OpenMP NVPTX toolchain choose a default distribute schedule that ensures coalescing on the GPU when in SPMD mode. This significantly increases the performance of offloaded target code and reduces the number of registers used on the GPU side. Reviewers: ABataev, caomhin, Hahnfeld Reviewed By: ABataev, Hahnfeld Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D52434 llvm-svn: 343253
* [OPENMP] Add support for OMP5 requires directive + unified_address clauseKelvin Li2018-09-262-0/+52
| | | | | | | | | Add support for OMP5.0 requires directive and unified_address clause. Patches to follow will include support for additional clauses. Differential Revision: https://reviews.llvm.org/D52359 llvm-svn: 343063
* [OPENMP] Disable emission of the class with vptr if they are not used inAlexey Bataev2018-09-211-0/+10
| | | | | | | | | target constructs. Prevent compilation of the classes with the virtual tables when compiling for the device. llvm-svn: 342741
* [OPENMP][NVPTX] Enable support for lastprivates in SPMD constructs.Alexey Bataev2018-09-214-32/+30
| | | | | | | Previously we could not use lastprivates in SPMD constructs, patch allows supporting lastprivates in SPMD with uninitialized runtime. llvm-svn: 342738
* [OPENMP] Add support for mapping memory pointed by member pointer.Alexey Bataev2018-09-202-2/+157
| | | | | | Added support for map(s, s.ptr[0:1]) kind of mapping. llvm-svn: 342648
* [OPENMP] Fix PR38903: Crash on instantiation of the non-dependentAlexey Bataev2018-09-131-3/+19
| | | | | | | | | | | declare reduction. If the declare reduction construct with the non-dependent type is defined in the template construct, the compiler might crash on the template instantition. Reworked the whole instantiation scheme for the declare reduction constructs to fix this problem correctly. llvm-svn: 342151
* [OPENMP] Fix PR38902: support ADL for declare reduction constructs.Alexey Bataev2018-09-121-0/+14
| | | | | | | Added support for argument-dependent lookup when trying to find the required declare reduction decl. llvm-svn: 342062
* [OPENMP] Simplified checks for declarations in declare target regions.Alexey Bataev2018-09-111-2/+2
| | | | | | | | Sema analysis should not mark functions as an implicit declare target, it may break codegen. Simplified semantic analysis and removed extra code for implicit declare target functions. llvm-svn: 341939
* [OpenMP] Add support for nested 'declare target' directivesKelvin Li2018-09-103-8/+51
| | | | | | | | | | | Add the capability to nest multiple declare target directives - including header files within a declare target region. Differential Revision: https://reviews.llvm.org/D51378 Patch by Patrick Lyster llvm-svn: 341766
* Revert "[OPENMP][NVPTX] Disable runtime-type info for CUDA devices."Alexey Bataev2018-09-071-68/+0
| | | | | | Still need the RTTI for NVPTX target to pass sema checks. llvm-svn: 341668
* [OPENMP] Fix PR38823: Do not emit vtable if it is not used in targetAlexey Bataev2018-09-061-2/+19
| | | | | | | | | | | context. If the explicit template instantiation definition defined outside of the target context, its vtable should not be marked as used. This is true for other situations where the compiler want to emit vtables unconditionally. llvm-svn: 341570
* [OPENMP][NVPTX] Disable runtime-type info for CUDA devices.Alexey Bataev2018-09-051-0/+68
| | | | | | RTTI is not supported by the NVPTX target. llvm-svn: 341483
* [OPENMP] Fix PR38710: static functions are not emitted as implicitlyAlexey Bataev2018-08-301-1/+1
| | | | | | | | | 'declare target'. All the functions, referenced in implicit|explicit target regions must be emitted during code emission for the device. llvm-svn: 341093
* [OPENMP][NVPTX] Add options -f[no-]openmp-cuda-force-full-runtime.Alexey Bataev2018-08-301-0/+328
| | | | | | | Added options -f[no-]openmp-cuda-force-full-runtime to [not] force use of the full runtime for OpenMP offloading to CUDA devices. llvm-svn: 341073
* [OPENMP] Do not create offloading entry for declare target variablesAlexey Bataev2018-08-291-0/+4
| | | | | | | | | declarations. We should not create offloading entries for declare target var declarations as it causes compiler crash. llvm-svn: 340968
* [OPENMP][NVPTX] Add support for lightweight runtime.Alexey Bataev2018-08-299-53/+399
| | | | | | | | If the target construct can be executed in SPMD mode + it is a loop based directive with static scheduling, we can use lightweight runtime support. llvm-svn: 340953
* [OPENMP] Create non-const ident_t objects.Mike Rice2018-08-2943-55/+55
| | | | | | | | | | | Currently ident_t objects are created const when debug info is not enabled, but the libittnotify libray in the OpenMP runtime writes to the reserved_2 field (See __kmp_itt_region_forking in openmp/runtime/src/kmp_itt.inl). Now create ident_t objects non-const. Differential Revision: https://reviews.llvm.org/D51331 llvm-svn: 340934
* [OPENMP] Fix crash on the emission of the weak function declaration.Alexey Bataev2018-08-201-0/+14
| | | | | | | | | If the function is actually a weak reference, it should not be marked as deferred definition as this is only a declaration. Patch adds checks for the definitions if they must be emitted. Otherwise, only declaration is emitted. llvm-svn: 340191
OpenPOWER on IntegriCloud