summaryrefslogtreecommitdiffstats
path: root/openmp/libomptarget
Commit message (Collapse)AuthorAgeFilesLines
...
* [libomptarget][NVPTX] Drop dead code and data structures, NFCI.Jonas Hahnfeld2018-09-048-189/+2
| | | | | | | | | | | | * cg and HasCancel in WorkDescr were never read and can be removed. * This eliminates the last use of priv in ThreadPrivateContext. * CounterGroup is unused afterwards. * Remove duplicate external declares in omptarget-nvptx.cu that are already in the header omptarget-nvptx.h. Differential Revision: https://reviews.llvm.org/D51622 llvm-svn: 341370
* [libomptarget][NVPTX] Fix __kmpc_spmd_kernel_deinitJonas Hahnfeld2018-09-031-1/+1
| | | | | | | | If the runtime is uninitialized the master thread must Enqueue the state object, and ALL threads must return immediately. Found post-commit of https://reviews.llvm.org/D51222. llvm-svn: 341328
* [OPENMP][NVPTX] Replace assert() by ASSERT0() macro, NFC.Alexey Bataev2018-08-299-64/+71
| | | | | | Required to fix the buildbots. llvm-svn: 340956
* [OPENMP][NVPTX] Lightweight runtime support for SPMD mode.Alexey Bataev2018-08-2911-45/+263
| | | | | | | | | | | | | | | | Summary: Implemented simple and lightweight runtime support for SPMD mode-based constructs. It adds support for L2 sequential parallelism wihtout full runtime support. Also, patch fixes some use cases for uninitialized|lightweight runtime. Reviewers: grokos, kkwli0, Hahnfeld, gtbercea Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51222 llvm-svn: 340944
* [OpenMP][libomptarget] rework of fatal error reportingAlexandre Eichenberger2018-08-273-23/+21
| | | | | | | | | | | | | | | | Summary: Removed the function that used a lock and varargs Used the same mechanism as for debug messages Reviewers: ABataev, gtbercea, grokos, Hahnfeld Reviewed By: gtbercea, Hahnfeld Subscribers: mikerice, ABataev, RaviNarayanaswamy, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51226 llvm-svn: 340767
* [OpenMP][libomptarget] Bringing up to spec with respect to ↵Alexandre Eichenberger2018-08-234-66/+174
| | | | | | | | | | | | | | | | | | OMP_TARGET_OFFLOAD env var Summary: Right now, only the OMP_TARGET_OFFLOAD=DISABLED was implemented. Added support for the other MANDATORY and DEFAULT values. Reviewers: gtbercea, ABataev, grokos, caomhin, Hahnfeld Reviewed By: Hahnfeld Subscribers: protze.joachim, gtbercea, AlexEichenberger, RaviNarayanaswamy, Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D50522 llvm-svn: 340542
* [OPNEMP, NVPTX] Fixed sychronization construct + code cleanup.Alexey Bataev2018-07-234-54/+25
| | | | | | | | | | | | | | | | | Summary: 1. Fixed internal problem in `__kmpc_barrier` function: SPMD mode synchronization function should be called only in L1 parallel level. 2. Removed some extra code for synchronization inside of the code, used `__kmpc_barrier` instead. 3. Some code cleanup. Reviewers: gtbercea, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49564 llvm-svn: 337691
* [OpenMP][libomptarget] New map interface: remove translation code and ensure ↵George Rokos2018-07-193-330/+145
| | | | | | | | | | | | | | | proper alignment of struct members This patch removes the translation code since this functionality is now implemented in the compiler. target_data_begin and target_data_end are also patched to handle some special cases that used to be handled by the obsolete translation function, namely ensure proper alignment of struct members when we have partially mapped structs. Mapping a struct from a higher address (i.e. not from its beginning) can result in distortion of the alignment for some of its member fields. Padding restores the original (proper) alignment. Differential revision: https://reviews.llvm.org/D44186 llvm-svn: 337455
* [libomptarget] Also support several images for elfJoachim Protze2018-07-181-4/+5
| | | | | | | | | | | | | | | | | | | | In revision r336569 (D49036) libomptarget support for multiple nvidia images has been fixed in case a target region resides inside one or multiple libraries and in the compiled application. But the issues is still present for elf images. This fix will also support multiple images for elf. Patch by Jannis Klinkenberg Reviewers: protze.joachim, ABataev, grokos Reviewed By: protze.joachim, ABataev, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49418 llvm-svn: 337355
* [cmake] Fix libomptarget/test/CMakeLists.txtAzharuddin Mohammed2018-07-151-2/+2
| | | | | | | | | | | | | | | | | Summary: Should be variable name instead of variable reference. If the variable is somehow unset, it messes up the if condition expression and causes a CMake error. Reviewers: jlpeyton, AndreyChurbanov, Hahnfeld Reviewed By: Hahnfeld Subscribers: mgorny, llvm-commits, openmp-commits Differential Revision: https://reviews.llvm.org/D47221 llvm-svn: 337133
* [OpenMP][libomptarget] Fix data sharing and globalization infrastructure to ↵Gheorghe-Teodor Bercea2018-07-133-105/+72
| | | | | | | | | | | | | | | | work in SPMD mode Summary: This patch fixes the data sharing infrastructure to work for the SPMD and non-SPMD cases. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: ABataev, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D49204 llvm-svn: 337013
* [OPENMP, NVPTX] Fix loop boundaries calculation for dynamic loops.Alexey Bataev2018-07-124-16/+50
| | | | | | | | | | | | | | | | | | | Summary: Patch fixes the next problems. 1. Removes unused functions from omptarget_nvptx_ThreadPrivateContext class + simplified data members. 2. Fixed calculation of loop boundaries for dynamic loops with static scheduling. 3. Introduced saving/restoring of the dynamic loop boundaries to support several nested parallel dynamic loops. Reviewers: grokos Subscribers: guansong, kkwli0, openmp-commits Differential Revision: https://reviews.llvm.org/D49241 llvm-svn: 336915
* [OPENMP, NVPTX] Support several images in the executable.Alexey Bataev2018-07-091-5/+6
| | | | | | | | | | | | | | | | Summary: Currently Cuda plugin supports loading of the single image, though we may have the executable with the several images, if it has target regions inside of the dynamically loaded library. Patch allows to load multiple images. Reviewers: grokos Subscribers: guansong, openmp-commits, kkwli0 Differential Revision: https://reviews.llvm.org/D49036 llvm-svn: 336569
* [OPENMP, NVPTX] Sync threads before start ordered loops.Alexey Bataev2018-06-291-1/+6
| | | | | | | | | | | | Summary: Threads must be synchronized before starting ordered construct. Reviewers: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D48732 llvm-svn: 335987
* [OPENMP, NVPTX] Fixes for NVPTX RTLAlexey Bataev2018-06-253-32/+36
| | | | | | | | | | | | | | | | | | Summary: Patch fixes several problems in the implementation of NVPTX RTL. 1. Detection of the last iteration for loops with static scheduling, no chunks. 2. Fixes reductions for the serialized parallel constructs. 3. Fixes handling of the barriers. Reviewers: grokos Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D48480 llvm-svn: 335469
* [OpenMP] [CUDA] Expose teamid to the debug pathGuansong Zhang2018-06-191-1/+1
| | | | | | | | | | | | | | | | Summary: Small bug fix for debug build. A previous fix causing trouble for debug build. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D48286 llvm-svn: 335046
* [libomptarget-nvptx] loop: Determine if runtime uninitializedJonas Hahnfeld2018-05-251-38/+42
| | | | | | | | | | | | | | | | | | | | The generic entry points for static loop scheduling previously hardcoded that the runtime was initialized. This can be wrong if the compiler analyzes that the runtime is not needed and calls the init functions accordingly. This didn't affect clang-ykt because they have entry points for different combinations of SPMD x Runtime not needed. I didn't do measurements yet but with inlining we might get away with always calling the generic interface and letting compiler and runtime figure out the rest. In any case, a correct runtime is always better than having functions that may only be called if previous calls passed in a specific set of arguments! Differential Revision: https://reviews.llvm.org/D47131 llvm-svn: 333285
* [CMake] Unify install path for librariesJonas Hahnfeld2018-05-254-6/+6
| | | | | | | | | | Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands. This also fixes installation of libomptarget-nvptx that previously didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX. Differential Revision: https://reviews.llvm.org/D47130 llvm-svn: 333284
* [CUDA]Fix dynamic|guided scheduling.George Rokos2018-05-241-57/+50
| | | | | | | | | | | | | The existing implementation of the dynamic scheduling breaks the contract introduced by the original openmp runtime and, thus, is incorrect. Patch fixes it and introduces correct dynamic scheduling model. Thanks to Alexey Bataev for submitting this patch. Differential Revision: https://reviews.llvm.org/D47333 llvm-svn: 333225
* [libomptarget-nvptx-bc] Pass found CUDA installationsJonas Hahnfeld2018-05-161-1/+1
| | | | | | | | | | | | | We already know where the CUDA SDK is, so there is no point in letting Clang search for it again and possibly finding no or a different installation. --cuda-path is supported since the beginning of CUDA support in Clang, so making this required doesn't impose additional restrictions. Differential Revision: https://reviews.llvm.org/D46930 llvm-svn: 332495
* [libomptarget-nvptx] Test bitcode compiler flags and enable by defaultJonas Hahnfeld2018-05-162-103/+180
| | | | | | | | | | | | | | Move all logic related to selecting the bitcode compiler and linker into a new file and dynamically test required compiler flags. This also adds -fcuda-rdc for Clang trunk as previously attempted in D44992 which fixes the build. As a result this change also enables building the library by default if all prerequisites are met. Differential Revision: https://reviews.llvm.org/D46901 llvm-svn: 332494
* [OpenMP][libomptarget] Add function for checking SPMD modeGheorghe-Teodor Bercea2018-05-152-0/+8
| | | | | | | | | | | | | | Summary: Add function to the NVPTX libomptarget library that will return true if the current target region is being executed in SPMD mode. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D46840 llvm-svn: 332360
* [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages ↵Guansong Zhang2018-05-044-2/+69
| | | | | | | | | | | | | | | | | | | | | | | on the device side Summary: Enable the device side debug messages at compile time, use env var to control at runtime. To achieve this, an environment data block is passed to the device lib when it is loaded. By default, the message is off, to enable it, a user need to set LIBOMPDEVICE_DEBUG=1. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D46210 llvm-svn: 331550
* [OpenMP] Remove compilation warning when using clang to compile bc files.Guansong Zhang2018-04-265-14/+16
| | | | | | | | | | | | | | | | Summary: Minor printf format correction. NVCC ignore those. Clang will give warning on these if debug is enabled. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D45528 llvm-svn: 330944
* [OpenMP] Make bc file compilation sensitive to LIBOMPTARGET_NVPTX_DEBUG flagGuansong Zhang2018-04-201-1/+7
| | | | | | | | | | | | | | Summary: The LIBOMPTARGET_NVPTX_DEBUG flag is inconsistent between using nvcc to generate .a file and clang to generate .bc file. Sync the two setting so we can get debug messages from the bc file path as well. Reviewers: grokos Subscribers: Hahnfeld, openmp-commits, mgorny Tags: #openmp Differential Revision: https://reviews.llvm.org/D45530 llvm-svn: 330477
* [OpenMP] Remove extra warning when we buildGuansong Zhang2018-04-101-1/+1
| | | | | | | | | | | | | | | | | | | Summary: This one line change is to remove this warning message "warning: integer conversion resulted in a change of sign" Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D45415 llvm-svn: 329713
* Revert "[OpenMP] enable bc file compilation using the latest clang"Guansong Zhang2018-04-091-1/+0
| | | | | | This reverts commit 6849e31c36d712d97433bca9af39b7a09c8c1207. llvm-svn: 329576
* [OpenMP] enable bc file compilation using the latest clangGuansong Zhang2018-04-031-0/+1
| | | | | | | | | | | | | | | | Summary: adding cuda-rdc flag to allow extern global data Reviewers: grokos Reviewed By: grokos Subscribers: gregrodgers, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D44992 llvm-svn: 329072
* [OpenMP][libomptarget] Initialize global memory stack only once.Gheorghe-Teodor Bercea2018-03-212-3/+15
| | | | | | | | | | | | | | Summary: The global stack initialization function may be called multiple times. The initialization of the shared memory slots should only happen when the function is called for the first time for a given warp master thread. Reviewers: grokos, carlo.bertolli, ABataev, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44754 llvm-svn: 328148
* [OpenMP][libomptarget] Fix master warp checkGheorghe-Teodor Bercea2018-03-213-13/+24
| | | | | | | | | | | | | | Summary: The check for the master warp must take into consideration the actual number of warps: the master warp is equal to the last active warp not necessarily WARPSIZE - 1. Reviewers: grokos, carlo.bertolli, ABataev, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44537 llvm-svn: 328146
* [OpenMP][libomptarget] Enable globalization for workersGheorghe-Teodor Bercea2018-03-211-33/+35
| | | | | | | | | | | | | | | | Summary: This patch allows worker to have a global memory stack managed by the runtime. This patch is needed for completeness and consistency with the globalization policy: if a worker-side variable escapes the current context it then needs to be globalized. Until now, only the master thread was allowed to have such a stack. These global values can now potentially be shared amongst workers if the semantics of the OpenMP program require it. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44487 llvm-svn: 328144
* Bugfix, extern declarations for libomp functions are `extern "C"` declarationsGeorge Rokos2018-03-171-0/+6
| | | | llvm-svn: 327763
* Moved extern declarations to private header file, they are only used from ↵George Rokos2018-03-162-4/+4
| | | | | | within libomptarget, they don't need to be in omptarget.h. llvm-svn: 327740
* [OpenMP][libomptarget] Enable usage of shared memory slotsGheorghe-Teodor Bercea2018-03-151-15/+1
| | | | | | | | | | | | | | | | | Summary: Allow the runtime to use the existing shared memory statically allocated slots. When a variable is globalized, the underlying memory can be either shared or global memory (both have block-wide visibility). In this case, we allow that the storage to use a limited amount of shared memory that has been statically allocated already. Only if shared memory doesn't prove to be enough do we then invoke malloc() to create a new global memory slot. Reviewers: ABataev, carlo.bertolli, grokos, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44486 llvm-svn: 327639
* [OpenMP][libomptarget] Enable multiple frames per global memory slotGheorghe-Teodor Bercea2018-03-153-47/+121
| | | | | | | | | | | | | | Summary: To save on calls to malloc, this patch enables the re-use of pre-allocated global memory slots. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44470 llvm-svn: 327637
* [libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread.George Rokos2018-03-141-1/+2
| | | | llvm-svn: 327556
* [OpenMP][libomptarget] Add global memory data sharing support for ↵Gheorghe-Teodor Bercea2018-03-135-0/+222
| | | | | | | | | | | | | | | | | | | | | | master-worker sharing. Summary: This patch adds support for the sharing of variables from the master thread of a team to the worker threads of the team. The runtime uses a stack structure implemented as a doubly-linked list of slots with each slot having the exact same size as the size requested. This implementation leverages existing data structures. The runtime functions are added as separate functions to avoid interfering with the current interface. Limitations to be addressed in future patches: - This current patch only employs global memory. In a future patch we will enable to usage for shared memory as an optimization. - Allow the allocation of several requested sizes in the same slot. Reviewers: ABataev, grokos, caomhin, carlo.bertolli Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44260 llvm-svn: 327440
* [OpenMP][libomptarget] Fix union.Gheorghe-Teodor Bercea2018-03-082-45/+41
| | | | | | | | | | | | | | Summary: To make the two parts of the union have the same size, the size of vect needs to be increased by 16 bits. Reviewers: grokos, carlo.bertolli, caomhin, ABataev Reviewed By: grokos, ABataev Subscribers: fedor.sergeev, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44254 llvm-svn: 327040
* [OpenMP] Remove implicit data sharing using device shared memory from ↵Gheorghe-Teodor Bercea2018-03-076-65/+1
| | | | | | | | | | | | | | | | | | | | libomptarget Summary: This patch reverts the changes to libomptarget that were coupled with the changes to Clang code gen for data sharing using shared memory. A similar patch exists for Clang: D43625 Shared memory is meant to be used as an optimization on top of a more general scheme. So far we didn't have a global memory implementation ready so shared memory was a solution which applied to the current level of OpenMP complexity supported by trunk on GPU devices (due to the missing NVPTX backend patch this functionality has never been exercised). Now that we have a global memory solution this patch is "in the way" and needs to be removed (for now). This patch (or an equivalent version of it) will be put out for review once the global memory scheme is in place. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D43626 llvm-svn: 326950
* [OpenMP][libomptarget] Enable the compilation of multiple bc libraries for ↵Gheorghe-Teodor Bercea2018-02-121-39/+49
| | | | | | | | | | | | | | | | | | | runtime inlining Summary: Different NVIDIA GPUs support different compute capabilities. To enable the inlining of runtime functions and the best performance on different generations of NVIDIA GPUs, a bc library for each compute capability needs to be compiled. The same compiler build will then be usable in conjunction with multiple generations of NVIDIA GPUs. To differentiate between versions of the same bc lib, the output file name will contain the compute capability ID. Depends on D14254 Reviewers: Hahnfeld, hfinkel, carlo.bertolli, caomhin, ABataev, grokos Reviewed By: Hahnfeld, grokos Subscribers: guansong, mgorny, openmp-commits Differential Revision: https://reviews.llvm.org/D41724 llvm-svn: 324904
* [libomptarget] Fix detection of CUDA stubs libraryJonas Hahnfeld2018-02-121-1/+10
| | | | | | | CUDA_LIBRARIES contains additional linker arguments since CMake 3.3 which breakes the current way of finding the stubs library. llvm-svn: 324879
* [OpenMP][libomptarget] Add data sharing support in libomptargetGheorghe-Teodor Bercea2018-02-076-1/+65
| | | | | | | | | | | | | | Summary: This patch extends the libomptarget functionality in patch D14254 with support for the data sharing scheme for supporting implicitly shared variables. The runtime therefore maintains a list of references to shared variables. Reviewers: carlo.bertolli, ABataev, Hahnfeld, grokos, caomhin, hfinkel Reviewed By: Hahnfeld, grokos Subscribers: guansong, llvm-commits, openmp-commits Differential Revision: https://reviews.llvm.org/D41485 llvm-svn: 324495
* [OpenMP-RT] Fix debug string for NVPTX runtime libraryCarlo Bertolli2018-02-011-1/+1
| | | | | | | | https://reviews.llvm.org/D42757 The method ThreadsInTeam is used to determine the number of threads to be used in a parallel region under SPMD mode (see line 127 of supporti.h in libomptarget/deviceRTLs/nvptx/src/). This patch fixes the corresponding debug print upon initialization of the kernel in SPMD mode. llvm-svn: 323978
* [libomptarget] Check for library with CUDA Driver APIJonas Hahnfeld2018-01-302-40/+67
| | | | | | | | | | | | | That's what we really need to link the CUDA plugin against, not the CUDA runtime API in CUDA_LIBRARIES! While the latter comes with the CUDA SDK, the Driver API is installed with the kernel driver and there is at most one per system. As fallback we can use the stubs library distributed with the CUDA SDK for linking. Differential Revision: https://reviews.llvm.org/D42643 llvm-svn: 323787
* [libomptarget] Only use CUDA Driver APIJonas Hahnfeld2018-01-301-11/+9
| | | | | | | | | | Use equivalents for the last calls to the Runtime API. Remove stray assert in case of an error found during review, we should only return OFFLOAD_FAIL. Differential Revision: https://reviews.llvm.org/D42686 llvm-svn: 323786
* [OpenMP] Initial implementation of OpenMP offloading library - libomptarget ↵George Rokos2018-01-2926-0/+5862
| | | | | | | | | | | | device RTLs. This patch implements the device runtime library whose interface is used in the code generation for OpenMP offloading devices. Currently there is a single device RTL written in CUDA meant to CUDA enabled GPUs. The interface is a variation of the kmpc interface that includes some extra calls to do thread and storage management that only make sense for a GPU target. Differential revision: https://reviews.llvm.org/D14254 llvm-svn: 323649
* Sprinkle a few <cstdlib> includes, for libomptarget sources usingDimitry Andric2018-01-183-0/+3
| | | | | | malloc, free, alloca and getenv. NFCI. llvm-svn: 322869
* Add missing headers for Debug buildsJonas Hahnfeld2018-01-182-0/+2
| | | | llvm-svn: 322830
* Unify build documentation and convert to reStructuredTextJonas Hahnfeld2017-12-272-149/+6
| | | | | | | | | | | We now have several options that apply for both libraries and they shouldn't be documented in multiple files. When already merging the two Build_With_CMake.txt documents, convert them to reStructuredText which is used for all of LLVM's documentation. Differential Revision: https://reviews.llvm.org/D40920 llvm-svn: 321481
* [libomptarget] Split implementation of interface functionsJonas Hahnfeld2017-12-064-484/+520
| | | | | | | | | | | | This last of four patches adds a new file for the interface functions that Clang uses during code generation. The only change except simply moving the current code is renaming the function CheckDeviceAndCtors() and using the correct type for 64bit device ids. Differential Revision: https://reviews.llvm.org/D40801 llvm-svn: 319972
OpenPOWER on IntegriCloud