summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [OPENMP][NVPTX]Allow to use shared memory for theAlexey Bataev2018-11-091-53/+97
| | | | | | | | | | target|teams|distribute variables. If the total size of the variables, declared in target|teams|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory. llvm-svn: 346507
* [OPENMP][NVPTX]Use __kmpc_data_sharing_coalesced_push_stack function.Alexey Bataev2018-11-021-14/+15
| | | | | | | | Coalesced memory access requires use of the new function `__kmpc_data_sharing_coalesced_push_stack` instead of the `__kmpc_data_sharing_push_stack`. llvm-svn: 345991
* [OPENMP][NVPTX]Improve emission of the globalized variables forAlexey Bataev2018-11-021-8/+257
| | | | | | | | | | | | | | | | | | | target/teams/distribute regions. Target/teams/distribute regions exist for all the time the kernel is executed. Thus, if the variable is declared in their context and then escape it, we can allocate global memory statically instead of allocating it dynamically. Patch captures all the globalized variables in target/teams/distribute contexts, merges them into the records, one per each target region. Those records are then joined into the union, one per compilation unit (to save the global memory). Those units are organized into 2 x dimensional arrays, where the first dimension is the number of blocks per SM and the second one is the number of SMs. Runtime functions manage this global memory space between the executing teams. llvm-svn: 345978
* [OPENMP] Support for mapping of the lambdas in target regions.Alexey Bataev2018-10-301-0/+53
| | | | | | | | | | Added support for mapping of lambdas in the target regions. It scans all the captures by reference in the lambda, implicitly maps those variables in the target region and then later reinstate the addresses of references in lambda to the correct addresses of the captured|privatized variables. llvm-svn: 345609
* [OpenMP][NVPTX] Use single loops when generating code for distribute ↵Gheorghe-Teodor Bercea2018-10-291-3/+6
| | | | | | | | | | | | | | | | parallel for Summary: This patch adds a new code generation path for bound sharing directives containing distribute parallel for. The new code generation scheme applies to chunked schedules on distribute and parallel for directives. The scheme simplifies the code that is being generated by eliminating the need for an outer for loop over chunks for both distribute and parallel for directives. In the case of distribute it applies to any sized chunk while in the parallel for case it only applies when chunk size is 1. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53448 llvm-svn: 345509
* [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases.Gheorghe-Teodor Bercea2018-10-291-5/+6
| | | | | | | | | | | | | | Summary: This patch enables the choosing of the default schedule for parallel for loops even in non-SPMD cases. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53443 llvm-svn: 345507
* [OPENMP][NVPTX]Increment iterator only when it is used, NFC.Alexey Bataev2018-10-161-1/+2
| | | | llvm-svn: 344574
* [OPENMP][NVPTX]Reduce memory usage in target region.Alexey Bataev2018-10-121-12/+17
| | | | | | | Additional reduction of the global memory usage in the target regions without parallel regions. llvm-svn: 344413
* [OPENMP][NVPTX]Reduce memory usage in orphaned functions.Alexey Bataev2018-10-121-8/+71
| | | | | | | | | | | | if the function has globalized variables and called in context of target/teams/distribute regions, it does not need to globalize 32 copies of the same variables for memory coalescing, it is enough to have just one copy, because there is parallel region. Patch does this by adding call for `__kmpc_parallel_level` function and checking its return value. If the code sees that the parallel level is 0, then only one variable is allocated, not 32. llvm-svn: 344356
* [OPENMP][NVPTX]Reduce memory use for globalized vars inAlexey Bataev2018-10-111-7/+15
| | | | | | | | | | | target/teams/distribute regions. Previously introduced globalization scheme that uses memory coalescing scheme may increase memory usage fr the variables that are devlared in target/teams/distribute contexts. We don't need 32 copies of such variables, just 1. Patch reduces memory use in this case. llvm-svn: 344273
* [OPENMP][NVPTX] Support memory coalescing for globalized variables.Alexey Bataev2018-10-091-37/+95
| | | | | | | | | Added support for memory coalescing for better performance for globalized variables. From now on all the globalized variables are represented as arrays of 32 elements and each thread accesses these elements using `tid & 31` as index. llvm-svn: 344049
* [OPENMP][NVPTX] Fix emission of __kmpc_global_thread_num() for non-SPMDAlexey Bataev2018-10-051-4/+7
| | | | | | | | | mode. __kmpc_global_thread_num() should be called before initialization of the runtime. llvm-svn: 343857
* [OPENMP] Fix emission of the __kmpc_global_thread_num.Alexey Bataev2018-10-051-0/+3
| | | | | | | | | Fixed emission of the __kmpc_global_thread_num() so that it is not messed up with alloca instructions anymore. Plus, fixes emission of the __kmpc_global_thread_num() functions in the target outlined regions so that they are not called before runtime is initialized. llvm-svn: 343856
* [OpenMP][NVPTX] Simplify codegen for orphaned parallel, NFCI.Jonas Hahnfeld2018-10-021-25/+7
| | | | | | | | | | | Worker threads fork off to the compiler generated worker function directly after entering the kernel function. Hence, there is no need to check whether the current thread is the master if we are outside of a parallel region (neither SPMD nor parallel_level > 0). Differential Revision: https://reviews.llvm.org/D52732 llvm-svn: 343618
* [OPENMP][NVPTX] Handle `requires datasharing` flag correctly withAlexey Bataev2018-10-011-1/+27
| | | | | | | | lightweight runtime. The datasharing flag must be set to `1` when executing SPMD-mode compatible directive with reduction|lastprivate clauses. llvm-svn: 343492
* [OPENMP] Simplify code, NFC.Alexey Bataev2018-10-011-2/+0
| | | | llvm-svn: 343483
* [OpenMP] Make default parallel for schedule in NVPTX target regions in SPMD ↵Gheorghe-Teodor Bercea2018-09-271-0/+11
| | | | | | | | | | | | | | | | mode achieve coalescing Summary: Set default schedule for parallel for loops to schedule(static, 1) when using SPMD mode on the NVPTX device offloading toolchain to ensure coalescing. Reviewers: ABataev, Hahnfeld, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D52629 llvm-svn: 343260
* [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD ↵Gheorghe-Teodor Bercea2018-09-271-0/+12
| | | | | | | | | | | | | | | | mode achieve coalescing Summary: For the OpenMP NVPTX toolchain choose a default distribute schedule that ensures coalescing on the GPU when in SPMD mode. This significantly increases the performance of offloaded target code and reduces the number of registers used on the GPU side. Reviewers: ABataev, caomhin, Hahnfeld Reviewed By: ABataev, Hahnfeld Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D52434 llvm-svn: 343253
* [OPENMP] Add support for OMP5 requires directive + unified_address clauseKelvin Li2018-09-261-0/+4
| | | | | | | | | Add support for OMP5.0 requires directive and unified_address clause. Patches to follow will include support for additional clauses. Differential Revision: https://reviews.llvm.org/D52359 llvm-svn: 343063
* [OPENMP][NVPTX] Enable support for lastprivates in SPMD constructs.Alexey Bataev2018-09-211-69/+127
| | | | | | | Previously we could not use lastprivates in SPMD constructs, patch allows supporting lastprivates in SPMD with uninitialized runtime. llvm-svn: 342738
* [OPENMP] Fix PR38710: static functions are not emitted as implicitlyAlexey Bataev2018-08-301-6/+10
| | | | | | | | | 'declare target'. All the functions, referenced in implicit|explicit target regions must be emitted during code emission for the device. llvm-svn: 341093
* [OPENMP][NVPTX] Add options -f[no-]openmp-cuda-force-full-runtime.Alexey Bataev2018-08-301-1/+2
| | | | | | | Added options -f[no-]openmp-cuda-force-full-runtime to [not] force use of the full runtime for OpenMP offloading to CUDA devices. llvm-svn: 341073
* [OPENMP][NVPTX] Add support for lightweight runtime.Alexey Bataev2018-08-291-49/+320
| | | | | | | | If the target construct can be executed in SPMD mode + it is a loop based directive with static scheduling, we can use lightweight runtime support. llvm-svn: 340953
* [OPENMP] Fix processing of declare target construct.Alexey Bataev2018-08-141-12/+2
| | | | | | | The attribute marked as inheritable since OpenMP 5.0 supports it + additional fixes to support new functionality. llvm-svn: 339704
* Port getLocStart -> getBeginLocStephen Kelly2018-08-091-8/+8
| | | | | | | | | | Reviewers: teemperor! Subscribers: jholewinski, whisperity, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D50350 llvm-svn: 339385
* [OPENMP] ThreadId in serialized parallel regions is 0.Alexey Bataev2018-07-251-7/+14
| | | | | | | | The first argument for the parallel outlined functions, called as serialized parallel regions, should be a pointer to the global thread id that always is 0. llvm-svn: 337957
* [OPENMP, NVPTX] Globalize only captured variables.Alexey Bataev2018-07-161-1/+1
| | | | | | | Sometimes we can try to globalize non-variable declarations, which may lead to compiler crash. llvm-svn: 337191
* [OpenMP] Initialize data sharing stack for SPMD caseGheorghe-Teodor Bercea2018-07-131-5/+15
| | | | | | | | | | | | | | Summary: In the SPMD case, we need to initialize the data sharing and globalization infrastructure. This covers the case when an SPMD region calls a function in a different compilation unit. Reviewers: ABataev, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D49188 llvm-svn: 337015
* [OPENMP, NVPTX] Do not globalize local variables in parallel regions.Alexey Bataev2018-07-091-10/+3
| | | | | | | | | | In generic data-sharing mode we are allowed to not globalize local variables that escape their declaration context iff they are declared inside of the parallel region. We can do this because L2 parallel regions are executed sequentially and, thus, we do not need to put shared local variables in the global memory. llvm-svn: 336567
* [OPENMP, NVPTX] Reduce the number of the globalized variables.Alexey Bataev2018-06-261-9/+43
| | | | | | | | | Patch tries to make better analysis of the variables that should be globalized. From now, instead of all parallel directives it will check only distribute parallel .. directives and check only for firstprivte/lastprivate variables if they must be globalized. llvm-svn: 335632
* [OPENMP, NVPTX] Fix reduction of the big data types/structures.Alexey Bataev2018-06-221-21/+115
| | | | | | | | If the shuffle is required for the reduced structures/big data type, current code may cause compiler crash because of the loading of the aggregate values. Patch fixes this problem. llvm-svn: 335377
* [OPENMP, NVPTX] Fix globalization of the variables passed to orphanedAlexey Bataev2018-06-211-43/+55
| | | | | | | | | | parallel region. If the current construct requires sharing of the local variable in the inner parallel region, this variable must be globalized to avoid runtime crash. llvm-svn: 335285
* [OPENMP, NVPTX] Emit simple reduction if requested.Alexey Bataev2018-06-181-0/+6
| | | | | | | If simple reduction is requested, use the simple reduction instead of the runtime functions calls. llvm-svn: 334962
* [OPENMP, NVPTX] Fixed codegen for orphaned parallel region.Alexey Bataev2018-05-251-25/+19
| | | | | | | | | | | | | | If orphaned parallel region is found, the next code must be emitted: ``` if(__kmpc_is_spmd_exec_mode() || __kmpc_parallel_level(loc, gtid)) Serialized execution. else if (IsMasterThread()) Prepare and signal worker. else Outined function call. ``` llvm-svn: 333301
* [OPENMP, NVPTX] Add check for SPMD mode in orphaned parallel directives.Alexey Bataev2018-05-161-6/+34
| | | | | | | | If the orphaned directive is executed in SPMD mode, we need to emit the check for the SPMD mode and run the orphaned parallel directive in sequential mode. llvm-svn: 332467
* [OPENMP, NVPTX] Do not globalize variables with reference/pointer types.Alexey Bataev2018-05-151-20/+19
| | | | | | | | In generic data-sharing mode we do not need to globalize variables/parameters of reference/pointer types. They already are placed in the global memory. llvm-svn: 332380
* [OPENMP, NVPTX] Do not use SPMD mode for target simd and target teamsAlexey Bataev2018-05-111-19/+13
| | | | | | | | | distribute simd directives. Directives `target simd` and `target teams distribute simd` must be executed in non-SPMD mode. llvm-svn: 332129
* [OPENMP, NVPTX] Initial support for L2 parallelism in SPMD mode.Alexey Bataev2018-05-101-59/+164
| | | | | | | | Added initial support for L2 parallelism in SPMD mode. Note, though, that the orphaned parallel directives are not currently supported in SPMD mode. llvm-svn: 332016
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-091-20/+20
| | | | | | | | | | | | | | | | | | | This is similar to the LLVM change https://reviews.llvm.org/D46290. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46320 llvm-svn: 331834
* [OPENMP, NVPTX] Codegen for critical construct.Alexey Bataev2018-05-071-0/+60
| | | | | | Added correct codegen for the critical construct on NVPTX devices. llvm-svn: 331652
* [OPENMP, NVPTX] Added support for L2 parallelism.Alexey Bataev2018-05-071-97/+346
| | | | | | | Added initial codegen for level 2, 3 etc. parallelism. Currently, all the second, the third etc. parallel regions will run sequentially. llvm-svn: 331642
* [OPENMP] Add support for reductions on simd directives in targetAlexey Bataev2018-05-021-11/+47
| | | | | | | | regions. Added codegen for `simd reduction()` constructs in target directives. llvm-svn: 331393
* [OPENMP] Emit names of the globals depending on target.Alexey Bataev2018-05-021-1/+2
| | | | | | | | Some symbols are not allowed to be used as names on some targets. Patch ries to unify the emission of the names of LLVM globals so they could be used on different targets. llvm-svn: 331358
* [OPENMP] Do not cast captured by value variables with pointer types inAlexey Bataev2018-04-231-1/+2
| | | | | | | | | | | NVPTX target. When generating the wrapper function for the offloading region, we need to call the outlined function and cast the arguments correctly to follow the ABI. Usually, variables captured by value are casted to `uintptr_t` type. But this should not performed for the variables with pointer type. llvm-svn: 330620
* [OPENMP] General code improvements.Alexey Bataev2018-04-161-132/+136
| | | | llvm-svn: 330154
* [OPENMP] Additional attributes for the pointer parameters.Alexey Bataev2018-04-101-0/+6
| | | | | | Added attributes for better optimization of the OpenMP code. llvm-svn: 329751
* [OPENMP, NVPTX] Fix codegen for the teams reduction.Alexey Bataev2018-04-061-25/+19
| | | | | | | Added NUW flags for all the add|mul|sub operations + replaced sdiv by udiv as we operate on unsigned values only (addresses, converted to integers) llvm-svn: 329411
* [OPENMP] Added emission of offloading data sections for declare targetAlexey Bataev2018-03-301-1/+16
| | | | | | | | | | variables. Added emission of the offloading data sections for the variables within declare target regions + fixes emission of the declare target variables marked as declare target not within the declare target region. llvm-svn: 328888
* [OpenMP][Clang] Add call to global data sharing stack initialization on the ↵Gheorghe-Teodor Bercea2018-03-221-0/+5
| | | | | | | | | | | | | | | | workers side Summary: The workers also need to initialize the global stack. The call to the initialization function needs to happen after the kernel_init() function is called by the master. This ensures that the per-team data structures of the runtime have been initialized. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D44749 llvm-svn: 328219
* [OPENMP, NVPTX] Codegen for target distribute parallel combinedAlexey Bataev2018-03-201-8/+46
| | | | | | | | | | constructs in generic mode. Fixed codegen for distribute parallel combined constructs. We have to pass and read the shared lower and upper bound from the distribute region in the inner parallel region. Patch is for generic mode. llvm-svn: 327990
OpenPOWER on IntegriCloud