summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [OpenMP 5.0] Parsing/sema support for "omp declare mapper" directive.Michael Kruse2019-02-011-0/+4
| | | | | | | | | | | | | | | | | This patch implements parsing and sema for "omp declare mapper" directive. User defined mapper, i.e., declare mapper directive, is a new feature in OpenMP 5.0. It is introduced to extend existing map clauses for the purpose of simplifying the copy of complex data structures between host and device (i.e., deep copy). An example is shown below: struct S { int len; int *d; }; #pragma omp declare mapper(struct S s) map(s, s.d[0:s.len]) // Memory region that d points to is also mapped using this mapper. Contributed-by: Lingda Li <lildmh@gmail.com> Differential Revision: https://reviews.llvm.org/D56326 llvm-svn: 352906
* [OPENMP][NVPTX]Emit service debug variable for NVPTX.Alexey Bataev2019-01-281-0/+14
| | | | | | | In case of the empty module, the ptxas tool may emit error message about empty debug info sections. This patch fixes this bug. llvm-svn: 352421
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [OPENMP]Add call to __kmpc_push_target_tripcount() function.Alexey Bataev2019-01-071-2/+2
| | | | | | | | | | Each we create the target regions with the teams distribute inner region, we can better estimate number of the teams required to execute the target region. Function __kmpc_push_target_tripcount() is used for purpose, which accepts device_id and the number of the iterations, performed by the associated loop. llvm-svn: 350571
* [OPENMP][NVPTX]Reduce number of barriers in reductions.Alexey Bataev2019-01-071-7/+0
| | | | | | | After the fix for the syncthreads we don't need to generate extra barriers for the parallel reductions. llvm-svn: 350530
* [OPENMP][NVPTX]Use new functions from the runtime library.Alexey Bataev2019-01-041-30/+43
| | | | | | Updated codegen to use the new functions from the runtime library. llvm-svn: 350415
* [OPENMP][NVPTX]Use __kmpc_barrier_simple_spmd(nullptr, 0) instead ofAlexey Bataev2019-01-031-12/+29
| | | | | | | | | | nvvm_barrier0. Use runtime functions instead of the direct call to the nvvm intrinsics. It allows to prevent some dangerous LLVM optimizations, that breaks the code for the NVPTX target. llvm-svn: 350328
* [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytesAlexey Bataev2018-12-181-0/+16
| | | | | | | | | | | buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540
* [OPENMP][NVPTX]Added extra sync point to the inter-warp copy function.Alexey Bataev2018-12-181-0/+5
| | | | | | | The parallel reduction operation requires an extra synchronization point in the inter-warp copy function to avoid divergence. llvm-svn: 349525
* [OPENMP][NVPTX]Improved interwarp copy function.Alexey Bataev2018-12-141-33/+12
| | | | | | | | | Inlined runtime with the current implementation of the interwarp copy function leads to the undefined behavior because of the not quite correct implementation of the barriers. Start using generic __kmpc_barier function instead of the custom made barriers. llvm-svn: 349192
* Misc typos fixes in ./lib folderRaphael Isemann2018-12-101-1/+1
| | | | | | | | | | | | | | Summary: Found via `codespell -q 3 -I ../clang-whitelist.txt -L uint,importd,crasher,gonna,cant,ue,ons,orign,ned` Reviewers: teemperor Reviewed By: teemperor Subscribers: teemperor, jholewinski, jvesely, nhaehnle, whisperity, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D55475 llvm-svn: 348755
* [OPENMP][NVPTX] Fix globalization of the mapped array sections.Alexey Bataev2018-12-061-3/+5
| | | | | | | | | If the array section is based on pointer and this sections is mapped in target region + then it is used in the inner parallel region, it also must be globalized as the pointer itself is passed by value, not by reference. llvm-svn: 348492
* [OPENMP][NVPTX]Fixed emission of the critical region.Alexey Bataev2018-12-041-2/+4
| | | | | | | | | | | | | Critical regions in NVPTX are the constructs, which, generally speaking, are not supported by the NVPTX target. Instead we're using special technique to handle the critical regions. Currently they are supported only within the loop and all the threads in the loop must execute the same critical region. Inside of this special regions the regions still must be emitted as critical, to avoid possible data races between the teams + synchronization must use __kmpc_barrier functions. llvm-svn: 348272
* [OPENMP][NVPTX]Mark __kmpc_barrier functions as convergent.Alexey Bataev2018-12-041-0/+25
| | | | | | | | __kmpc_barrier runtime functions must be marked as convergent to prevent some dangerous optimizations. Also, for NVPTX target all barriers must be emitted as simple barriers. llvm-svn: 348271
* [OPENMP][NVPTX]Call get __kmpc_global_thread_num in worker afterAlexey Bataev2018-11-291-0/+4
| | | | | | | | | initialization. Function __kmpc_global_thread_num should be called only after initialization, not earlier. llvm-svn: 347919
* [OpenMP] Add a new version of the SPMD deinit kernel functionGheorghe-Teodor Bercea2018-11-291-7/+11
| | | | | | | | | | | | | | Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D54970 llvm-svn: 347915
* [OPENMP][NVPTX]Basic support for reductions across the teams.Alexey Bataev2018-11-271-372/+108
| | | | | | Added basic codegen support for the reductions across the teams. llvm-svn: 347715
* [OPENMP][NVPTX]Emit default locations with the correct Exec|RuntimeAlexey Bataev2018-11-261-15/+42
| | | | | | | | | | | modes. If the region is inside target|teams|distribute region, we can emit the locations with the correct info for execution mode and runtime mode. Patch adds this ability to the NVPTX codegen to help the optimizer to produce better code. llvm-svn: 347583
* [OPENMP][NVPTX]Emit default locations as constant with undefined mode.Alexey Bataev2018-11-211-0/+20
| | | | | | | | | | | | For the NVPTX target default locations should be emitted as constants + additional info must be emitted in the reserved_2 field of the ident_t structure. The 1st bit controls the execution mode and the 2nd bit controls use of the lightweight runtime. The combination of the bits for Non-SPMD mode + lightweight runtime represents special undefined mode, used outside of the target regions for orphaned directives or functions. Should allow and additional optimization inside of the target regions. llvm-svn: 347425
* [OpenMP] Check target architecture supports unified shared memory for ↵Patrick Lyster2018-11-191-49/+107
| | | | | | requires directive. Differential Review: https://reviews.llvm.org/D54493 llvm-svn: 347214
* Fix unused variable warning.David L. Jones2018-11-171-0/+2
| | | | llvm-svn: 347133
* [OPENMP][NVPTX]Emit correct reduction code for teams/parallelAlexey Bataev2018-11-161-165/+242
| | | | | | | | | | | reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches. llvm-svn: 347081
* [OPENMP][NVPTX]Extend number of constructs executed in SPMD mode.Alexey Bataev2018-11-091-45/+65
| | | | | | | | | If the statements between target|teams|distribute directives does not require execution in master thread, like constant expressions, null statements, simple declarations, etc., such construct can be xecuted in SPMD mode. llvm-svn: 346551
* [OPENMP][NVPTX]Allow to use shared memory for theAlexey Bataev2018-11-091-53/+97
| | | | | | | | | | target|teams|distribute variables. If the total size of the variables, declared in target|teams|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory. llvm-svn: 346507
* [OPENMP][NVPTX]Use __kmpc_data_sharing_coalesced_push_stack function.Alexey Bataev2018-11-021-14/+15
| | | | | | | | Coalesced memory access requires use of the new function `__kmpc_data_sharing_coalesced_push_stack` instead of the `__kmpc_data_sharing_push_stack`. llvm-svn: 345991
* [OPENMP][NVPTX]Improve emission of the globalized variables forAlexey Bataev2018-11-021-8/+257
| | | | | | | | | | | | | | | | | | | target/teams/distribute regions. Target/teams/distribute regions exist for all the time the kernel is executed. Thus, if the variable is declared in their context and then escape it, we can allocate global memory statically instead of allocating it dynamically. Patch captures all the globalized variables in target/teams/distribute contexts, merges them into the records, one per each target region. Those records are then joined into the union, one per compilation unit (to save the global memory). Those units are organized into 2 x dimensional arrays, where the first dimension is the number of blocks per SM and the second one is the number of SMs. Runtime functions manage this global memory space between the executing teams. llvm-svn: 345978
* [OPENMP] Support for mapping of the lambdas in target regions.Alexey Bataev2018-10-301-0/+53
| | | | | | | | | | Added support for mapping of lambdas in the target regions. It scans all the captures by reference in the lambda, implicitly maps those variables in the target region and then later reinstate the addresses of references in lambda to the correct addresses of the captured|privatized variables. llvm-svn: 345609
* [OpenMP][NVPTX] Use single loops when generating code for distribute ↵Gheorghe-Teodor Bercea2018-10-291-3/+6
| | | | | | | | | | | | | | | | parallel for Summary: This patch adds a new code generation path for bound sharing directives containing distribute parallel for. The new code generation scheme applies to chunked schedules on distribute and parallel for directives. The scheme simplifies the code that is being generated by eliminating the need for an outer for loop over chunks for both distribute and parallel for directives. In the case of distribute it applies to any sized chunk while in the parallel for case it only applies when chunk size is 1. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53448 llvm-svn: 345509
* [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases.Gheorghe-Teodor Bercea2018-10-291-5/+6
| | | | | | | | | | | | | | Summary: This patch enables the choosing of the default schedule for parallel for loops even in non-SPMD cases. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53443 llvm-svn: 345507
* [OPENMP][NVPTX]Increment iterator only when it is used, NFC.Alexey Bataev2018-10-161-1/+2
| | | | llvm-svn: 344574
* [OPENMP][NVPTX]Reduce memory usage in target region.Alexey Bataev2018-10-121-12/+17
| | | | | | | Additional reduction of the global memory usage in the target regions without parallel regions. llvm-svn: 344413
* [OPENMP][NVPTX]Reduce memory usage in orphaned functions.Alexey Bataev2018-10-121-8/+71
| | | | | | | | | | | | if the function has globalized variables and called in context of target/teams/distribute regions, it does not need to globalize 32 copies of the same variables for memory coalescing, it is enough to have just one copy, because there is parallel region. Patch does this by adding call for `__kmpc_parallel_level` function and checking its return value. If the code sees that the parallel level is 0, then only one variable is allocated, not 32. llvm-svn: 344356
* [OPENMP][NVPTX]Reduce memory use for globalized vars inAlexey Bataev2018-10-111-7/+15
| | | | | | | | | | | target/teams/distribute regions. Previously introduced globalization scheme that uses memory coalescing scheme may increase memory usage fr the variables that are devlared in target/teams/distribute contexts. We don't need 32 copies of such variables, just 1. Patch reduces memory use in this case. llvm-svn: 344273
* [OPENMP][NVPTX] Support memory coalescing for globalized variables.Alexey Bataev2018-10-091-37/+95
| | | | | | | | | Added support for memory coalescing for better performance for globalized variables. From now on all the globalized variables are represented as arrays of 32 elements and each thread accesses these elements using `tid & 31` as index. llvm-svn: 344049
* [OPENMP][NVPTX] Fix emission of __kmpc_global_thread_num() for non-SPMDAlexey Bataev2018-10-051-4/+7
| | | | | | | | | mode. __kmpc_global_thread_num() should be called before initialization of the runtime. llvm-svn: 343857
* [OPENMP] Fix emission of the __kmpc_global_thread_num.Alexey Bataev2018-10-051-0/+3
| | | | | | | | | Fixed emission of the __kmpc_global_thread_num() so that it is not messed up with alloca instructions anymore. Plus, fixes emission of the __kmpc_global_thread_num() functions in the target outlined regions so that they are not called before runtime is initialized. llvm-svn: 343856
* [OpenMP][NVPTX] Simplify codegen for orphaned parallel, NFCI.Jonas Hahnfeld2018-10-021-25/+7
| | | | | | | | | | | Worker threads fork off to the compiler generated worker function directly after entering the kernel function. Hence, there is no need to check whether the current thread is the master if we are outside of a parallel region (neither SPMD nor parallel_level > 0). Differential Revision: https://reviews.llvm.org/D52732 llvm-svn: 343618
* [OPENMP][NVPTX] Handle `requires datasharing` flag correctly withAlexey Bataev2018-10-011-1/+27
| | | | | | | | lightweight runtime. The datasharing flag must be set to `1` when executing SPMD-mode compatible directive with reduction|lastprivate clauses. llvm-svn: 343492
* [OPENMP] Simplify code, NFC.Alexey Bataev2018-10-011-2/+0
| | | | llvm-svn: 343483
* [OpenMP] Make default parallel for schedule in NVPTX target regions in SPMD ↵Gheorghe-Teodor Bercea2018-09-271-0/+11
| | | | | | | | | | | | | | | | mode achieve coalescing Summary: Set default schedule for parallel for loops to schedule(static, 1) when using SPMD mode on the NVPTX device offloading toolchain to ensure coalescing. Reviewers: ABataev, Hahnfeld, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D52629 llvm-svn: 343260
* [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD ↵Gheorghe-Teodor Bercea2018-09-271-0/+12
| | | | | | | | | | | | | | | | mode achieve coalescing Summary: For the OpenMP NVPTX toolchain choose a default distribute schedule that ensures coalescing on the GPU when in SPMD mode. This significantly increases the performance of offloaded target code and reduces the number of registers used on the GPU side. Reviewers: ABataev, caomhin, Hahnfeld Reviewed By: ABataev, Hahnfeld Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D52434 llvm-svn: 343253
* [OPENMP] Add support for OMP5 requires directive + unified_address clauseKelvin Li2018-09-261-0/+4
| | | | | | | | | Add support for OMP5.0 requires directive and unified_address clause. Patches to follow will include support for additional clauses. Differential Revision: https://reviews.llvm.org/D52359 llvm-svn: 343063
* [OPENMP][NVPTX] Enable support for lastprivates in SPMD constructs.Alexey Bataev2018-09-211-69/+127
| | | | | | | Previously we could not use lastprivates in SPMD constructs, patch allows supporting lastprivates in SPMD with uninitialized runtime. llvm-svn: 342738
* [OPENMP] Fix PR38710: static functions are not emitted as implicitlyAlexey Bataev2018-08-301-6/+10
| | | | | | | | | 'declare target'. All the functions, referenced in implicit|explicit target regions must be emitted during code emission for the device. llvm-svn: 341093
* [OPENMP][NVPTX] Add options -f[no-]openmp-cuda-force-full-runtime.Alexey Bataev2018-08-301-1/+2
| | | | | | | Added options -f[no-]openmp-cuda-force-full-runtime to [not] force use of the full runtime for OpenMP offloading to CUDA devices. llvm-svn: 341073
* [OPENMP][NVPTX] Add support for lightweight runtime.Alexey Bataev2018-08-291-49/+320
| | | | | | | | If the target construct can be executed in SPMD mode + it is a loop based directive with static scheduling, we can use lightweight runtime support. llvm-svn: 340953
* [OPENMP] Fix processing of declare target construct.Alexey Bataev2018-08-141-12/+2
| | | | | | | The attribute marked as inheritable since OpenMP 5.0 supports it + additional fixes to support new functionality. llvm-svn: 339704
* Port getLocStart -> getBeginLocStephen Kelly2018-08-091-8/+8
| | | | | | | | | | Reviewers: teemperor! Subscribers: jholewinski, whisperity, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D50350 llvm-svn: 339385
* [OPENMP] ThreadId in serialized parallel regions is 0.Alexey Bataev2018-07-251-7/+14
| | | | | | | | The first argument for the parallel outlined functions, called as serialized parallel regions, should be a pointer to the global thread id that always is 0. llvm-svn: 337957
* [OPENMP, NVPTX] Globalize only captured variables.Alexey Bataev2018-07-161-1/+1
| | | | | | | Sometimes we can try to globalize non-variable declarations, which may lead to compiler crash. llvm-svn: 337191
OpenPOWER on IntegriCloud