summaryrefslogtreecommitdiffstats
path: root/clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [OPENMP]Use different addresses for zeroed thread_id/bound_id.Alexey Bataev2019-10-161-0/+1
| | | | | | | | When the parallel region is called directly in the sequential region, the zeroed tid/bound id are used. But they must point to the different memory locations as the parameters are marked as noalias. llvm-svn: 375017
* IR: print value numbers for unnamed function argumentsTim Northover2019-08-031-24/+24
| | | | | | | | | | For consistency with normal instructions and clarity when reading IR, it's best to print the %0, %1, ... names of function arguments in definitions. Also modifies the parser to accept IR in that form for obvious reasons. llvm-svn: 367755
* [OPENMP][NVPTX]Mark more functions as always_inline for betterAlexey Bataev2019-05-211-6/+10
| | | | | | | | | | | performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269
* [OPENMP][NVPTX]Fix PR40893: Size doesn't match forAlexey Bataev2019-03-131-1/+1
| | | | | | | | | | | '_openmp_teams_reductions_buffer_$_. nvlink does not handle weak linkage correctly, same symbols with the different sizes are reported as erroneous though the largest size must be chosen instead. Patch fixes this problem by using Internal linkage instead of the Common. llvm-svn: 356072
* [OPENMP][NVPTX]Use faster teams reduction algorithm.Alexey Bataev2019-02-201-26/+940
| | | | | | | A faster way to reduce the values in teams reductions was found, the codegen is updated to use this faster algorithm and new runtime functions. llvm-svn: 354479
* [OPENMP][NVPTX]Reduce number of barriers in reductions.Alexey Bataev2019-01-071-1/+0
| | | | | | | After the fix for the syncthreads we don't need to generate extra barriers for the parallel reductions. llvm-svn: 350530
* [OPENMP][NVPTX]Use new functions from the runtime library.Alexey Bataev2019-01-041-3/+3
| | | | | | Updated codegen to use the new functions from the runtime library. llvm-svn: 350415
* [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytesAlexey Bataev2018-12-181-1/+1
| | | | | | | | | | | buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540
* [OPENMP][NVPTX]Improved interwarp copy function.Alexey Bataev2018-12-141-8/+6
| | | | | | | | | Inlined runtime with the current implementation of the interwarp copy function leads to the undefined behavior because of the not quite correct implementation of the barriers. Start using generic __kmpc_barier function instead of the custom made barriers. llvm-svn: 349192
* [OpenMP] Add a new version of the SPMD deinit kernel functionGheorghe-Teodor Bercea2018-11-291-2/+2
| | | | | | | | | | | | | | Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D54970 llvm-svn: 347915
* [OPENMP][NVPTX]Basic support for reductions across the teams.Alexey Bataev2018-11-271-948/+7
| | | | | | Added basic codegen support for the reductions across the teams. llvm-svn: 347715
* [OPENMP][NVPTX]Emit correct reduction code for teams/parallelAlexey Bataev2018-11-161-56/+354
| | | | | | | | | | | reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches. llvm-svn: 347081
* [OPENMP][NVPTX] Support memory coalescing for globalized variables.Alexey Bataev2018-10-091-8/+8
| | | | | | | | | Added support for memory coalescing for better performance for globalized variables. From now on all the globalized variables are represented as arrays of 32 elements and each thread accesses these elements using `tid & 31` as index. llvm-svn: 344049
* [OPENMP, NVPTX] Fix reduction of the big data types/structures.Alexey Bataev2018-06-221-32/+32
| | | | | | | | If the shuffle is required for the reduced structures/big data type, current code may cause compiler crash because of the loading of the aggregate values. Patch fixes this problem. llvm-svn: 335377
* [OPENMP] General code improvements.Alexey Bataev2018-04-161-3/+3
| | | | llvm-svn: 330154
* [OPENMP, NVPTX] Fix codegen for the teams reduction.Alexey Bataev2018-04-061-44/+44
| | | | | | | Added NUW flags for all the add|mul|sub operations + replaced sdiv by udiv as we operate on unsigned values only (addresses, converted to integers) llvm-svn: 329411
* [OPENMP] Fix casting in NVPTX support library.Alexey Bataev2018-01-041-31/+12
| | | | | | | | If the reduction required shuffle in the NVPTX codegen, we may need to cast the reduced value to the integer type. This casting was implemented incorrectly and may cause compiler crash. Patch fixes this problem. llvm-svn: 321818
* [OpenMP] Adjust arguments of nvptx runtime functionsJonas Hahnfeld2017-11-221-3/+3
| | | | | | | | | | | | | In the future the compiler will analyze whether the OpenMP runtime needs to be (fully) initialized and avoid that overhead if possible. The functions already take an argument to transfer that information to the runtime, so pass in the default value 1. (This is needed for binary compatibility with libomptarget-nvptx currently being upstreamed.) Differential Revision: https://reviews.llvm.org/D40354 llvm-svn: 318836
* [OpenMP] Teams reduction on the NVPTX device.Arpith Chacko Jacob2017-02-161-0/+1143
This patch implements codegen for the reduction clause on any teams construct for elementary data types. It builds on parallel reductions on the GPU. Subsequently, the team master writes to a unique location in a global memory scratchpad. The last team to do so loads and reduces this array to calculate the final result. This patch emits two helper functions that are used by the OpenMP runtime on the GPU to perform reductions across teams. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29879 llvm-svn: 295335
OpenPOWER on IntegriCloud