summaryrefslogtreecommitdiffstats
path: root/clang/test/OpenMP/taskloop_simd_reduction_codegen.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [OPENMP]Emit artificial threprivate vars as threadlocal, if possible.Alexey Bataev2019-12-311-2/+2
| | | | It may improve performance for declare reduction constructs.
* IR: print value numbers for unnamed function argumentsTim Northover2019-08-031-11/+11
| | | | | | | | | | For consistency with normal instructions and clarity when reading IR, it's best to print the %0, %1, ... names of function arguments in definitions. Also modifies the parser to accept IR in that form for obvious reasons. llvm-svn: 367755
* [OPENMP][NVPTX]Mark more functions as always_inline for betterAlexey Bataev2019-05-211-3/+8
| | | | | | | | | | | performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269
* Do not always request an implicit taskgroup region inside the kmpc_taskloop ↵Alexey Bataev2018-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | function Summary: For the following code: ``` int i; #pragma omp taskloop for (i = 0; i < 100; ++i) {} #pragma omp taskloop nogroup for (i = 0; i < 100; ++i) {} ``` Clang emits the following LLVM IR: ``` ... call void @__kmpc_taskgroup(%struct.ident_t* @0, i32 %0) %2 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8*)* bitcast (i32 (i32, %struct.kmp_task_t_with_privates*)* @.omp_task_entry. to i32 (i32, i8*)*)) ... call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %2, i32 1, i64* %8, i64* %9, i64 %13, i32 0, i32 0, i64 0, i8* null) call void @__kmpc_end_taskgroup(%struct.ident_t* @0, i32 %0) ... %15 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8*)* bitcast (i32 (i32, %struct.kmp_task_t_with_privates.1*)* @.omp_task_entry..2 to i32 (i32, i8*)*)) ... call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %15, i32 1, i64* %21, i64* %22, i64 %26, i32 0, i32 0, i64 0, i8* null) ``` The first set of instructions corresponds to the first taskloop construct. It is important to note that the implicit taskgroup region associated with the taskloop construct has been materialized in our IR: the `__kmpc_taskloop` occurs inside a taskgroup region. Note also that this taskgroup region does not exist in our second taskloop because we are using the `nogroup` clause. The issue here is the 4th argument of the kmpc_taskloop call, starting from the end, is always a zero. Checking the LLVM OpenMP RT implementation, we see that this argument corresponds to the nogroup parameter: ``` void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val, kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup, int sched, kmp_uint64 grainsize, void *task_dup); ``` So basically we always tell to the RT to do another taskgroup region. For the first taskloop, this means that we create two taskgroup regions. For the second example, it means that despite the fact we had a nogroup clause we are going to have a taskgroup region, so we unnecessary wait until all descendant tasks have been executed. Reviewers: ABataev Reviewed By: ABataev Subscribers: rogfer01, cfe-commits Differential Revision: https://reviews.llvm.org/D53636 llvm-svn: 345180
* [OPENMP] Fix emission of the __kmpc_global_thread_num.Alexey Bataev2018-10-051-1/+1
| | | | | | | | | Fixed emission of the __kmpc_global_thread_num() so that it is not messed up with alloca instructions anymore. Plus, fixes emission of the __kmpc_global_thread_num() functions in the target outlined regions so that they are not called before runtime is initialized. llvm-svn: 343856
* [OPENMP] General code improvements.Alexey Bataev2018-04-161-5/+5
| | | | llvm-svn: 330140
* Change memcpy/memove/memset to have dest and source alignment attributes ↵Daniel Neilson2018-01-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | (Step 1). Summary: Upstream LLVM is changing the the prototypes of the @llvm.memcpy/memmove/memset intrinsics. This change updates the Clang tests for this change. The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change removes the alignment argument in favour of placing the alignment attribute on the source and destination pointers of the memory intrinsic call. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) At this time the source and destination alignments must be the same (Step 1). Step 2 of the change, to be landed shortly, will relax that contraint and allow the source and destination to have different alignments. llvm-svn: 322964
* [OPENMP] Fix debug info for outlined functions in NVPTX + add more tests.Alexey Bataev2018-01-081-18/+27
| | | | | | | Fixed name of emitted outlined functions in NVPTX target + extra tests for the debug info. llvm-svn: 322022
* [OPENMP] Support for -fopenmp-simd option with compilation of simd loopsAlexey Bataev2017-12-291-0/+3
| | | | | | | | | only. Added support for -fopenmp-simd option that allows compilation of simd-based constructs without emission of OpenMP runtime calls. llvm-svn: 321560
* Switch to gnu++14 as the default dialect.Tim Northover2017-12-091-1/+1
| | | | | | This is C++14 with conforming GNU extensions. llvm-svn: 320250
* [OPENMP] Fix PR35486: crash when collapsing loops with dependent iteration ↵Alexey Bataev2017-12-041-0/+1
| | | | | | | | | | spaces. Though it is incorrect from point of view of OpenMP standard to have dependent iteration space in OpenMP loops, compiler should not crash. Patch fixes this problem. llvm-svn: 319700
* [OPENMP] Pacify windows buildbots, NFC.Alexey Bataev2017-07-181-37/+1
| | | | llvm-svn: 308243
* [OPENMP] Fix reduction combiner testAlexey Bataev2017-07-171-1/+1
| | | | llvm-svn: 308183
* [OPENMP] Further fixes of the reduction codegen testsAlexey Bataev2017-07-171-70/+70
| | | | llvm-svn: 308182
* [OPENMP] Further test fixes.Alexey Bataev2017-07-171-58/+58
| | | | llvm-svn: 308178
* [OPENMP] Rework tests to pacify buildbots.Alexey Bataev2017-07-171-3/+1
| | | | llvm-svn: 308176
* [OPENMP] Codegen for reduction clauses in 'taskloop' directives.Alexey Bataev2017-07-171-0/+235
Adds codegen for taskloop-based directives. llvm-svn: 308174
OpenPOWER on IntegriCloud