summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [OPENMP][NVPTX]Fix critical region codegen.Alexey Bataev2019-08-261-2/+24
| | | | | | | | | | | | | | | | | Summary: Previously critical regions were emitted with the barrier making it a worksharing construct though it is not. Also, it leads to incorrect behavior in Cuda9+. Patch fixes this problem. Reviewers: ABataev, jdoerfert Subscribers: jholewinski, guansong, cfe-commits, grokos Tags: #clang Differential Revision: https://reviews.llvm.org/D66673 llvm-svn: 369946
* [Clang] Migrate llvm::make_unique to std::make_uniqueJonas Devlieghere2019-08-141-2/+2
| | | | | | | | | | Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. Differential revision: https://reviews.llvm.org/D66259 llvm-svn: 368942
* [OPENMP][NVPTX]Mark barrier functions calls as convergent.Alexey Bataev2019-07-181-2/+5
| | | | | | | Added convergent attribute to the barrier functions calls for correct optimizations. llvm-svn: 366437
* [HIP] Add GPU arch gfx1010, gfx1011, and gfx1012Yaxun Liu2019-07-111-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D64364 llvm-svn: 365799
* [AMDGPU] gfx908 clang targetStanislav Mekhanoshin2019-07-091-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D64430 llvm-svn: 365528
* [OpenMP] Add support for registering requires directives with the runtimeGheorghe-Teodor Bercea2019-05-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds support for the registration of the requires directives with the runtime. Each requires directive clause will enable a particular flag to be set. The set of flags is passed to the runtime to be checked for compatibility with other such flags coming from other object files. The registration function is called whenever OpenMP is present even if a requires directive is not present. This helps detect cases in which requires directives are used inconsistently. Reviewers: ABataev, AlexEichenberger, caomhin Reviewed By: ABataev, AlexEichenberger Subscribers: jholewinski, guansong, jfb, jdoerfert, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D60568 llvm-svn: 361298
* [OPENMP][NVPTX]Mark more functions as always_inline for betterAlexey Bataev2019-05-211-3/+16
| | | | | | | | | | | performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269
* Use llvm::stable_sortFangrui Song2019-04-241-6/+5
| | | | llvm-svn: 359098
* [OPENMP][NVPTX] target [teams distribute] simd maybe run withoutAlexey Bataev2019-04-191-1/+8
| | | | | | | | | runtime. target [teams distribute] simd costructs do not require full runtime for the correct execution, we can run them without full runtime. llvm-svn: 358766
* [OPENMP][NVPTX]Run combined constructs with if clause in SPMD mode.Alexey Bataev2019-04-171-26/+5
| | | | | | | | All target-parallel-based constructs can be run in SPMD mode from now on. Even if num_threads clauses or if clauses are used, such constructs can be executed in SPMD mode. llvm-svn: 358595
* [OPENMP][NVPTX]Run combined constructs with if clause in SPMD mode.Alexey Bataev2019-04-161-6/+8
| | | | | | | | | Combined constructs with parallel and if clauses without modifiers may be executed in SPMD mode since if the condition is true for the target region, it is also true for parallel region and the threads must be run in parallel. llvm-svn: 358503
* [OPENMP][NVPTX]Run parallel regions with num_threads clauses in SPMDAlexey Bataev2019-04-151-10/+7
| | | | | | | | | | mode. After the previous patch with the more correct handling of the number of threads in parallel regions, the parallel regions with num_threads clauses can be executed in SPMD mode. llvm-svn: 358445
* [OPENMP]Improve detection of number of teams, threads in targetAlexey Bataev2019-04-101-71/+23
| | | | | | | | | | regions. Added more complex analysis for number of teams and number of threads in the target regions, also merged related common code between CGOpenMPRuntime and CGOpenMPRuntimeNVPTX classes. llvm-svn: 358126
* [OPENMP][NVPTX]Fixed processing of memory management directives.Alexey Bataev2019-04-081-14/+42
| | | | | | | | | | | Added special processing of the memory management directives/clauses for NVPTX target. For private locals, omp_default_mem_alloc and omp_thread_mem_alloc result in allocation in local memory. omp_const_mem_alloc allocates const memory, omp_teams_mem_alloc allocates shared memory, and omp_cgroup_mem_alloc and omp_large_cap_mem_alloc allocate global memory. llvm-svn: 357923
* [OPENMP]Fix a warning about unused variable, NFC.Alexey Bataev2019-03-211-0/+1
| | | | llvm-svn: 356715
* [OPENMP] Simplify codegen for allocate directive on local variables.Alexey Bataev2019-03-211-0/+24
| | | | | | | Simplified codegen for the allocate directive for local variables, initial implementation of the codegen for NVPTX target. llvm-svn: 356710
* [OPENMP]Codegen support for allocate directive on global variables.Alexey Bataev2019-03-211-0/+28
| | | | | | | | | For the global variables the allocate directive must specify only the predefined allocator. This allocator must be translated into the correct form of the address space for the targets that support different address spaces. llvm-svn: 356702
* [OPENMP]Remove unused parameter, NFC.Alexey Bataev2019-03-191-3/+3
| | | | | | | Parameter CodeGenModule &CGM is not required for CGOpenMPRuntime member functions, since class holds the reference to the CGM. llvm-svn: 356480
* [OPENMP] Codegen for local variables with the allocate pragma.Alexey Bataev2019-03-191-0/+3
| | | | | | | | Added initial codegen for the local variables with the #pragma omp allocate directive. Instead of allocating the variables on the stack, __kmpc_alloc|__kmpc_free functions are used for memory (de-)allocation. llvm-svn: 356472
* [OPENMP][NVPTX]Fix PR40893: Size doesn't match forAlexey Bataev2019-03-131-2/+8
| | | | | | | | | | | '_openmp_teams_reductions_buffer_$_. nvlink does not handle weak linkage correctly, same symbols with the different sizes are reported as erroneous though the largest size must be chosen instead. Patch fixes this problem by using Internal linkage instead of the Common. llvm-svn: 356072
* [OPENMP]Remove debug service variable.Alexey Bataev2019-03-081-14/+0
| | | | | | Removed not required service variable for the debug info. llvm-svn: 355729
* [OPENMP 5.0]Add initial support for 'allocate' directive.Alexey Bataev2019-03-071-1/+5
| | | | | | | Added parsing/sema analysis/serialization/deserialization support for 'allocate' directive. llvm-svn: 355614
* [OPENMP]Target region: emit const firstprivates as globals with constantAlexey Bataev2019-03-051-0/+8
| | | | | | | | | | | | memory. If the variable with the constant non-scalar type is firstprivatized in the target region, the local copy is created with the data copying. Instead, we allocate the copy in the constant memory and avoid extra copying in the outlined target regions. This global copy is used in the target regions without loss of the performance. llvm-svn: 355418
* [OPENMP][NVPTX]Use faster teams reduction algorithm.Alexey Bataev2019-02-201-114/+607
| | | | | | | A faster way to reduce the values in teams reductions was found, the codegen is updated to use this faster algorithm and new runtime functions. llvm-svn: 354479
* [opaque pointer types] Cleanup CGBuilder's Create*GEP.James Y Knight2019-02-091-38/+20
| | | | | | | | | | | | | | The various EltSize, Offset, DataLayout, and StructLayout arguments are all computable from the Address's element type and the DataLayout which the CGBuilder already has access to. After having previously asserted that the computed values are the same as those passed in, now remove the redundant arguments from CGBuilder's Create*GEP functions. Differential Revision: https://reviews.llvm.org/D57767 llvm-svn: 353629
* [opaque pointer types] Pass function types for runtime function calls.James Y Knight2019-02-051-27/+28
| | | | | | | | | | | | | Emit{Nounwind,}RuntimeCall{,OrInvoke} have been modified to take a FunctionCallee as an argument, and CreateRuntimeFunction has been modified to return a FunctionCallee. All callers have been updated. Additionally, CreateBuiltinFunction is removed, as it was redundant with CreateRuntimeFunction after some previous changes. Differential Revision: https://reviews.llvm.org/D57668 llvm-svn: 353184
* [OpenMP 5.0] Parsing/sema support for "omp declare mapper" directive.Michael Kruse2019-02-011-0/+4
| | | | | | | | | | | | | | | | | This patch implements parsing and sema for "omp declare mapper" directive. User defined mapper, i.e., declare mapper directive, is a new feature in OpenMP 5.0. It is introduced to extend existing map clauses for the purpose of simplifying the copy of complex data structures between host and device (i.e., deep copy). An example is shown below: struct S { int len; int *d; }; #pragma omp declare mapper(struct S s) map(s, s.d[0:s.len]) // Memory region that d points to is also mapped using this mapper. Contributed-by: Lingda Li <lildmh@gmail.com> Differential Revision: https://reviews.llvm.org/D56326 llvm-svn: 352906
* [OPENMP][NVPTX]Emit service debug variable for NVPTX.Alexey Bataev2019-01-281-0/+14
| | | | | | | In case of the empty module, the ptxas tool may emit error message about empty debug info sections. This patch fixes this bug. llvm-svn: 352421
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [OPENMP]Add call to __kmpc_push_target_tripcount() function.Alexey Bataev2019-01-071-2/+2
| | | | | | | | | | Each we create the target regions with the teams distribute inner region, we can better estimate number of the teams required to execute the target region. Function __kmpc_push_target_tripcount() is used for purpose, which accepts device_id and the number of the iterations, performed by the associated loop. llvm-svn: 350571
* [OPENMP][NVPTX]Reduce number of barriers in reductions.Alexey Bataev2019-01-071-7/+0
| | | | | | | After the fix for the syncthreads we don't need to generate extra barriers for the parallel reductions. llvm-svn: 350530
* [OPENMP][NVPTX]Use new functions from the runtime library.Alexey Bataev2019-01-041-30/+43
| | | | | | Updated codegen to use the new functions from the runtime library. llvm-svn: 350415
* [OPENMP][NVPTX]Use __kmpc_barrier_simple_spmd(nullptr, 0) instead ofAlexey Bataev2019-01-031-12/+29
| | | | | | | | | | nvvm_barrier0. Use runtime functions instead of the direct call to the nvvm intrinsics. It allows to prevent some dangerous LLVM optimizations, that breaks the code for the NVPTX target. llvm-svn: 350328
* [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytesAlexey Bataev2018-12-181-0/+16
| | | | | | | | | | | buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540
* [OPENMP][NVPTX]Added extra sync point to the inter-warp copy function.Alexey Bataev2018-12-181-0/+5
| | | | | | | The parallel reduction operation requires an extra synchronization point in the inter-warp copy function to avoid divergence. llvm-svn: 349525
* [OPENMP][NVPTX]Improved interwarp copy function.Alexey Bataev2018-12-141-33/+12
| | | | | | | | | Inlined runtime with the current implementation of the interwarp copy function leads to the undefined behavior because of the not quite correct implementation of the barriers. Start using generic __kmpc_barier function instead of the custom made barriers. llvm-svn: 349192
* Misc typos fixes in ./lib folderRaphael Isemann2018-12-101-1/+1
| | | | | | | | | | | | | | Summary: Found via `codespell -q 3 -I ../clang-whitelist.txt -L uint,importd,crasher,gonna,cant,ue,ons,orign,ned` Reviewers: teemperor Reviewed By: teemperor Subscribers: teemperor, jholewinski, jvesely, nhaehnle, whisperity, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D55475 llvm-svn: 348755
* [OPENMP][NVPTX] Fix globalization of the mapped array sections.Alexey Bataev2018-12-061-3/+5
| | | | | | | | | If the array section is based on pointer and this sections is mapped in target region + then it is used in the inner parallel region, it also must be globalized as the pointer itself is passed by value, not by reference. llvm-svn: 348492
* [OPENMP][NVPTX]Fixed emission of the critical region.Alexey Bataev2018-12-041-2/+4
| | | | | | | | | | | | | Critical regions in NVPTX are the constructs, which, generally speaking, are not supported by the NVPTX target. Instead we're using special technique to handle the critical regions. Currently they are supported only within the loop and all the threads in the loop must execute the same critical region. Inside of this special regions the regions still must be emitted as critical, to avoid possible data races between the teams + synchronization must use __kmpc_barrier functions. llvm-svn: 348272
* [OPENMP][NVPTX]Mark __kmpc_barrier functions as convergent.Alexey Bataev2018-12-041-0/+25
| | | | | | | | __kmpc_barrier runtime functions must be marked as convergent to prevent some dangerous optimizations. Also, for NVPTX target all barriers must be emitted as simple barriers. llvm-svn: 348271
* [OPENMP][NVPTX]Call get __kmpc_global_thread_num in worker afterAlexey Bataev2018-11-291-0/+4
| | | | | | | | | initialization. Function __kmpc_global_thread_num should be called only after initialization, not earlier. llvm-svn: 347919
* [OpenMP] Add a new version of the SPMD deinit kernel functionGheorghe-Teodor Bercea2018-11-291-7/+11
| | | | | | | | | | | | | | Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D54970 llvm-svn: 347915
* [OPENMP][NVPTX]Basic support for reductions across the teams.Alexey Bataev2018-11-271-372/+108
| | | | | | Added basic codegen support for the reductions across the teams. llvm-svn: 347715
* [OPENMP][NVPTX]Emit default locations with the correct Exec|RuntimeAlexey Bataev2018-11-261-15/+42
| | | | | | | | | | | modes. If the region is inside target|teams|distribute region, we can emit the locations with the correct info for execution mode and runtime mode. Patch adds this ability to the NVPTX codegen to help the optimizer to produce better code. llvm-svn: 347583
* [OPENMP][NVPTX]Emit default locations as constant with undefined mode.Alexey Bataev2018-11-211-0/+20
| | | | | | | | | | | | For the NVPTX target default locations should be emitted as constants + additional info must be emitted in the reserved_2 field of the ident_t structure. The 1st bit controls the execution mode and the 2nd bit controls use of the lightweight runtime. The combination of the bits for Non-SPMD mode + lightweight runtime represents special undefined mode, used outside of the target regions for orphaned directives or functions. Should allow and additional optimization inside of the target regions. llvm-svn: 347425
* [OpenMP] Check target architecture supports unified shared memory for ↵Patrick Lyster2018-11-191-49/+107
| | | | | | requires directive. Differential Review: https://reviews.llvm.org/D54493 llvm-svn: 347214
* Fix unused variable warning.David L. Jones2018-11-171-0/+2
| | | | llvm-svn: 347133
* [OPENMP][NVPTX]Emit correct reduction code for teams/parallelAlexey Bataev2018-11-161-165/+242
| | | | | | | | | | | reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches. llvm-svn: 347081
* [OPENMP][NVPTX]Extend number of constructs executed in SPMD mode.Alexey Bataev2018-11-091-45/+65
| | | | | | | | | If the statements between target|teams|distribute directives does not require execution in master thread, like constant expressions, null statements, simple declarations, etc., such construct can be xecuted in SPMD mode. llvm-svn: 346551
* [OPENMP][NVPTX]Allow to use shared memory for theAlexey Bataev2018-11-091-53/+97
| | | | | | | | | | target|teams|distribute variables. If the total size of the variables, declared in target|teams|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory. llvm-svn: 346507
OpenPOWER on IntegriCloud