summaryrefslogtreecommitdiffstats
path: root/openmp
Commit message (Collapse)AuthorAgeFilesLines
...
* Move blocktime_str variable right before its first useJonathan Peyton2018-03-261-3/+3
| | | | llvm-svn: 328575
* Add summarizeStats.py to tools directoryJonathan Peyton2018-03-261-0/+323
| | | | | | | | | | | | | | | | | | | | | | | | The summarizeStats.py script processes raw data provided by the instrumented (stats-gathering) OpenMP* runtime library. It provides: 1) A radar chart which plots counters as frequency (per GigaTick) of use within the program. The frequencies are plotted as log10, however values less than one are kept as it is and represented in red color. This was done to help visualize the differences better. 2) Pie charts separating total time as compute and non-compute. The compute and non-compute times have their own pie charts showing the constructs that contributed to them. The percentages listed are with respect to the total time. 3) '.csv' file with percentage of time spent within the different constructs. The script can be used as: $ python $PATH_TO_SCRIPT/summarizeStats.py instrumented1.csv instrumented2.csv Patch by Taru Doodi Differential Revision: https://reviews.llvm.org/D41838 llvm-svn: 328568
* Fixed __kmpc_get_target_offload() to call library initialization.Andrey Churbanov2018-03-221-1/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D44793 llvm-svn: 328228
* [OpenMP][libomptarget] Initialize global memory stack only once.Gheorghe-Teodor Bercea2018-03-212-3/+15
| | | | | | | | | | | | | | Summary: The global stack initialization function may be called multiple times. The initialization of the shared memory slots should only happen when the function is called for the first time for a given warp master thread. Reviewers: grokos, carlo.bertolli, ABataev, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44754 llvm-svn: 328148
* [OpenMP][libomptarget] Fix master warp checkGheorghe-Teodor Bercea2018-03-213-13/+24
| | | | | | | | | | | | | | Summary: The check for the master warp must take into consideration the actual number of warps: the master warp is equal to the last active warp not necessarily WARPSIZE - 1. Reviewers: grokos, carlo.bertolli, ABataev, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44537 llvm-svn: 328146
* [OpenMP][libomptarget] Enable globalization for workersGheorghe-Teodor Bercea2018-03-211-33/+35
| | | | | | | | | | | | | | | | Summary: This patch allows worker to have a global memory stack managed by the runtime. This patch is needed for completeness and consistency with the globalization policy: if a worker-side variable escapes the current context it then needs to be globalized. Until now, only the master thread was allowed to have such a stack. These global values can now potentially be shared amongst workers if the semantics of the OpenMP program require it. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44487 llvm-svn: 328144
* Read OMP_TARGET_OFFLOAD and provide API to access ICVJonathan Peyton2018-03-205-0/+64
| | | | | | | | | | | | | | Added settings code to read OMP_TARGET_OFFLOAD environment variable. Added target-offload-var ICV as __kmp_target_offload, set via OMP_TARGET_OFFLOAD, if available, otherwise defaulting to DEFAULT. Valid values for the ICV are specified as enum values {0,1,2} for disabled, default, and mandatory. An internal API access function __kmpc_get_target_offload is provided. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D44577 llvm-svn: 328046
* Fix for Fix for https://bugs.llvm.org/show_bug.cgi?id=36705.Andrey Churbanov2018-03-191-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D44637 llvm-svn: 327875
* Bugfix, extern declarations for libomp functions are `extern "C"` declarationsGeorge Rokos2018-03-171-0/+6
| | | | llvm-svn: 327763
* Moved extern declarations to private header file, they are only used from ↵George Rokos2018-03-162-4/+4
| | | | | | within libomptarget, they don't need to be in omptarget.h. llvm-svn: 327740
* [OpenMP][libomptarget] Enable usage of shared memory slotsGheorghe-Teodor Bercea2018-03-151-15/+1
| | | | | | | | | | | | | | | | | Summary: Allow the runtime to use the existing shared memory statically allocated slots. When a variable is globalized, the underlying memory can be either shared or global memory (both have block-wide visibility). In this case, we allow that the storage to use a limited amount of shared memory that has been statically allocated already. Only if shared memory doesn't prove to be enough do we then invoke malloc() to create a new global memory slot. Reviewers: ABataev, carlo.bertolli, grokos, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44486 llvm-svn: 327639
* [OpenMP][libomptarget] Enable multiple frames per global memory slotGheorghe-Teodor Bercea2018-03-153-47/+121
| | | | | | | | | | | | | | Summary: To save on calls to malloc, this patch enables the re-use of pre-allocated global memory slots. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44470 llvm-svn: 327637
* [libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread.George Rokos2018-03-141-1/+2
| | | | llvm-svn: 327556
* [OpenMP][libomptarget] Add global memory data sharing support for ↵Gheorghe-Teodor Bercea2018-03-135-0/+222
| | | | | | | | | | | | | | | | | | | | | | master-worker sharing. Summary: This patch adds support for the sharing of variables from the master thread of a team to the worker threads of the team. The runtime uses a stack structure implemented as a doubly-linked list of slots with each slot having the exact same size as the size requested. This implementation leverages existing data structures. The runtime functions are added as separate functions to avoid interfering with the current interface. Limitations to be addressed in future patches: - This current patch only employs global memory. In a future patch we will enable to usage for shared memory as an optimization. - Allow the allocation of several requested sizes in the same slot. Reviewers: ABataev, grokos, caomhin, carlo.bertolli Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44260 llvm-svn: 327440
* fix a typo on the websiteSylvestre Ledru2018-03-111-1/+1
| | | | llvm-svn: 327237
* [OpenMP][libomptarget] Fix union.Gheorghe-Teodor Bercea2018-03-082-45/+41
| | | | | | | | | | | | | | Summary: To make the two parts of the union have the same size, the size of vect needs to be increased by 16 bits. Reviewers: grokos, carlo.bertolli, caomhin, ABataev Reviewed By: grokos, ABataev Subscribers: fedor.sergeev, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44254 llvm-svn: 327040
* [OpenMP] Remove implicit data sharing using device shared memory from ↵Gheorghe-Teodor Bercea2018-03-076-65/+1
| | | | | | | | | | | | | | | | | | | | libomptarget Summary: This patch reverts the changes to libomptarget that were coupled with the changes to Clang code gen for data sharing using shared memory. A similar patch exists for Clang: D43625 Shared memory is meant to be used as an optimization on top of a more general scheme. So far we didn't have a global memory implementation ready so shared memory was a solution which applied to the current level of OpenMP complexity supported by trunk on GPU devices (due to the missing NVPTX backend patch this functionality has never been exercised). Now that we have a global memory solution this patch is "in the way" and needs to be removed (for now). This patch (or an equivalent version of it) will be put out for review once the global memory scheme is in place. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D43626 llvm-svn: 326950
* Improve OpenMP threadprivate implementation.Andrey Churbanov2018-03-053-111/+182
| | | | | | | | Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D41914 llvm-svn: 326733
* Fixed build of the OpenMP stubs library.Andrey Churbanov2018-03-051-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D44019 llvm-svn: 326728
* [OMPT] Fix interoperability test with GCCJonas Hahnfeld2018-03-011-2/+14
| | | | | | | | | | | | | | | | We have to ensure that the runtime is initialized _before_ waiting for the two started threads to guarantee that the master threads post their ompt_event_thread_begin before the worker threads. This is not guaranteed in the parallel region where one worker thread could start before the other master thread has invoked the callback. The problem did not happen with Clang becauses the generated code calls __kmpc_global_thread_num() and cashes its result for functions that contain OpenMP pragmas. Differential Revision: https://reviews.llvm.org/D43882 llvm-svn: 326435
* [OMPT] Fix task-type test with GCCJoachim Protze2018-03-011-0/+3
| | | | | | | | | | This is similar to D43882. The runtime needs to be initialized before calling print_ids(0) http://lab.llvm.org:8011/builders/openmp-gcc-x86_64-linux-debian/builds/60 Differential Revision: https://reviews.llvm.org/D43897 llvm-svn: 326428
* [OMPT] Fix ompt_get_task_info() and add tests for itJoachim Protze2018-02-283-88/+182
| | | | | | | | | | | | | The thread_num parameter of ompt_get_task_info() was not being used previously, but need to be set. The print_task_type() function (form the task-types.c testcase) was merged into the print_ids() function (in callback.h). Testing of ompt_get_task_info() was added to the task-types.c testcase. It was not tested extensively previously. Differential Revision: https://reviews.llvm.org/D42472 llvm-svn: 326338
* [OMPT] Fix inconsistent testcasesJoachim Protze2018-02-282-31/+31
| | | | | | | | | | | The main change of this patch is to insert {{.*}} in current_address=[[RETURN_ADDRESS_END]]. This is needed to match any of the alternatively printed addresses. Additionally, clang-format is applied to the two tests. Differential Revision: https://reviews.llvm.org/D43115 llvm-svn: 326312
* [OMPT] Fix parallel_data in implicit barrier-endJonas Hahnfeld2018-02-234-98/+135
| | | | | | | | | This is required to be NULL for implicit barriers at the end of a parallel region. Noticed in review of D43191. Differential Revision: https://reviews.llvm.org/D43308 llvm-svn: 325922
* [OMPT] Fix test tasks/serialized.c with optimizationJonas Hahnfeld2018-02-232-54/+114
| | | | | | | | | | | | | | The compiler inlines the user code in the task. Check for that case at runtime by comparing the frame addresses and print the expected exit address. Also showcase how I think the OMPT tests could be reformatted to match LLVM's code style. In my opinion it would be great to that kind of change to all tests that need to be touched for whatever reason... Differential Revision: https://reviews.llvm.org/D43191 llvm-svn: 325921
* [OMPT] Omissionin in OMPT FormattingJoachim Protze2018-02-175-7/+9
| | | | | | | | Applying clang-format to the /runtime/src/ folder Differential Revision: https://reviews.llvm.org/D42169 llvm-svn: 325424
* [OMPT] Add interoperability testcaseJoachim Protze2018-02-171-0/+99
| | | | | | | | Test whether OMPT-callbacks for two threads that initiate a parallel region are correct. Differential Revision: https://reviews.llvm.org/D41942 llvm-svn: 325423
* [OMPT] Update api_calls testcaseJoachim Protze2018-02-172-34/+50
| | | | | | | | | | Only use ompt_ functions when testing OMPT in api_calls testcase. Add size parameter to print_list. Fix small bug in implementation of ompt_get_partition_place_nums(): return correct length. Differential Revision: https://reviews.llvm.org/D42162 llvm-svn: 325422
* [CMake] Add -fno-experimental-isel for testingJonas Hahnfeld2018-02-152-2/+23
| | | | | | | | | | | | | GlobalISel doesn't yet implement blockaddress and falls back to SelectionDAG. This results in additional branch instruction to the next basic block which breaks the OMPT tests. Disable GlobalISel for now when compiling the tests because fixing them is not easily possible. See http://llvm.org/PR36313 for full discussion history. Differential Revision: https://reviews.llvm.org/D43195 llvm-svn: 325218
* [OMPT][test] Correct warning about added wrapper functionsJonas Hahnfeld2018-02-141-2/+4
| | | | | | | | | This affects all outlined functions, not just tasks! Only show warning when using Clang 5.0 or later. Differential Revision: https://reviews.llvm.org/D43190 llvm-svn: 325131
* [OpenMP][libomptarget] Enable the compilation of multiple bc libraries for ↵Gheorghe-Teodor Bercea2018-02-122-43/+53
| | | | | | | | | | | | | | | | | | | runtime inlining Summary: Different NVIDIA GPUs support different compute capabilities. To enable the inlining of runtime functions and the best performance on different generations of NVIDIA GPUs, a bc library for each compute capability needs to be compiled. The same compiler build will then be usable in conjunction with multiple generations of NVIDIA GPUs. To differentiate between versions of the same bc lib, the output file name will contain the compute capability ID. Depends on D14254 Reviewers: Hahnfeld, hfinkel, carlo.bertolli, caomhin, ABataev, grokos Reviewed By: Hahnfeld, grokos Subscribers: guansong, mgorny, openmp-commits Differential Revision: https://reviews.llvm.org/D41724 llvm-svn: 324904
* [libomptarget] Fix detection of CUDA stubs libraryJonas Hahnfeld2018-02-121-1/+10
| | | | | | | CUDA_LIBRARIES contains additional linker arguments since CMake 3.3 which breakes the current way of finding the stubs library. llvm-svn: 324879
* [OMPT] Add tool_available_search testcaseJoachim Protze2018-02-081-0/+104
| | | | | | | | | | | | | | | | | | Tests the search for tools as defined in the spec. The OMP_TOOL_LIBRARIES environment variable contains paths to the following files(in that order) -to a nonexisting file -to a shared library that does not have a ompt_start_tool function -to a shared library that has an ompt_start_tool implementation returning NULL -to a shared library that has an ompt_start_tool implementation returning a pointer to a valid instance of ompt_start_tool_result_t The expected result is that the last tool gets active and can print in the thread-begin callback. Differential Revision: https://reviews.llvm.org/D42166 llvm-svn: 324588
* [OMPT] Add tool_not_available testcaseJoachim Protze2018-02-082-0/+69
| | | | | | | | | | | | Add a testcase that checks wheter the runtime can handle an ompt_start_tool method that returns NULL indicating that no tool shall be loaded. All tool_available testcases need a separate folder to avoid file conflicts for the generated tools. Differential Revision: https://reviews.llvm.org/D41904 llvm-svn: 324587
* [OpenMP][libomptarget] Add data sharing support in libomptargetGheorghe-Teodor Bercea2018-02-076-1/+65
| | | | | | | | | | | | | | Summary: This patch extends the libomptarget functionality in patch D14254 with support for the data sharing scheme for supporting implicitly shared variables. The runtime therefore maintains a list of references to shared variables. Reviewers: carlo.bertolli, ABataev, Hahnfeld, grokos, caomhin, hfinkel Reviewed By: Hahnfeld, grokos Subscribers: guansong, llvm-commits, openmp-commits Differential Revision: https://reviews.llvm.org/D41485 llvm-svn: 324495
* [OMPT] Fix tool initialization returning 0Joachim Protze2018-02-061-0/+6
| | | | | | | | | If tool initialization returns 0, OMPT should not be active. The current implementation provided some callback invocations in this case. Differential Revision: https://reviews.llvm.org/D42709 llvm-svn: 324320
* [OpenMP-RT] Fix debug string for NVPTX runtime libraryCarlo Bertolli2018-02-011-1/+1
| | | | | | | | https://reviews.llvm.org/D42757 The method ThreadsInTeam is used to determine the number of threads to be used in a parallel region under SPMD mode (see line 127 of supporti.h in libomptarget/deviceRTLs/nvptx/src/). This patch fixes the corresponding debug print upon initialization of the kernel in SPMD mode. llvm-svn: 323978
* [libomptarget] Check for library with CUDA Driver APIJonas Hahnfeld2018-01-302-40/+67
| | | | | | | | | | | | | That's what we really need to link the CUDA plugin against, not the CUDA runtime API in CUDA_LIBRARIES! While the latter comes with the CUDA SDK, the Driver API is installed with the kernel driver and there is at most one per system. As fallback we can use the stubs library distributed with the CUDA SDK for linking. Differential Revision: https://reviews.llvm.org/D42643 llvm-svn: 323787
* [libomptarget] Only use CUDA Driver APIJonas Hahnfeld2018-01-301-11/+9
| | | | | | | | | | Use equivalents for the last calls to the Runtime API. Remove stray assert in case of an error found during review, we should only return OFFLOAD_FAIL. Differential Revision: https://reviews.llvm.org/D42686 llvm-svn: 323786
* [OpenMP] Initial implementation of OpenMP offloading library - libomptarget ↵George Rokos2018-01-2927-1/+5897
| | | | | | | | | | | | device RTLs. This patch implements the device runtime library whose interface is used in the code generation for OpenMP offloading devices. Currently there is a single device RTL written in CUDA meant to CUDA enabled GPUs. The interface is a variation of the kmpc interface that includes some extra calls to do thread and storage management that only make sense for a GPU target. Differential revision: https://reviews.llvm.org/D14254 llvm-svn: 323649
* [OMPT] Use fuzzy return addresses in lock testcasesJonas Hahnfeld2018-01-264-71/+71
| | | | | | | | | | | Use fuzzy return addresses in lock testcases so that these testcases can also be run using the Intel Compiler. Patch by Simon Convent! Differential Revision: https://reviews.llvm.org/D41896 llvm-svn: 323529
* Fix name of 'macOS' and add asteriks to brands, NFC.Jonas Hahnfeld2018-01-231-1/+1
| | | | llvm-svn: 323180
* Sprinkle a few <cstdlib> includes, for libomptarget sources usingDimitry Andric2018-01-183-0/+3
| | | | | | malloc, free, alloca and getenv. NFCI. llvm-svn: 322869
* Add missing headers for Debug buildsJonas Hahnfeld2018-01-182-0/+2
| | | | llvm-svn: 322830
* Partial revert of [OMPT] Rename ompt_mutex_impl_t to kmp_mutex_implJoachim Protze2018-01-171-3/+3
| | | | | | The previous commit did not revert all replaced ompt_mutex_impl_unknown. llvm-svn: 322631
* [OMPT] Add Workaround for Intel Compiler BugJoachim Protze2018-01-172-1/+2
| | | | | | | | | | | | | | | | | | | | | | Add Workaround for Intel Compiler Bug with Case#: 03138964 A critical region within a nested task causes a segfault in icc 14-18: int main() { #pragma omp parallel num_threads(2) #pragma omp master #pragma omp task #pragma omp task #pragma omp critical printf("test\n"); } When the critical region is in a separate function, the segault does not occur. So we add noinline to make sure that the function call stays there. Differential Revision: https://reviews.llvm.org/D41182 llvm-svn: 322622
* [OMPT] Rename ompt_mutex_impl_t to kmp_mutex_implJoachim Protze2018-01-174-36/+36
| | | | | | | | | | The defintion is not part of the spec and thus should not have the prefix "ompt_" but rather a prefix that indicates that this is implementation specific. Differential Revision: https://reviews.llvm.org/D41166 llvm-svn: 322621
* [OMPT] Return appropiate values for ompt runtime entry points for non-OpenMP ↵Joachim Protze2018-01-173-8/+75
| | | | | | | | | | | threads When the current thread is not an (initialized) OpenMP thread, the runtime entry points return values that correspond to "not available" or similar Differential Revision: https://reviews.llvm.org/D41167 llvm-svn: 322620
* Fixed libomp static build broken by the commit rL322202.Andrey Churbanov2018-01-111-0/+2
| | | | | | | | Patch by simone <simone@cs.utah.edu>. Differential Revision: https://reviews.llvm.org/D41945 llvm-svn: 322282
* Force HWLOC topology method for NUMA-specific topologyJonathan Peyton2018-01-101-0/+9
| | | | | | | | | | | | If user requested affinity with granularity=tile we need to either use HWLOC or ignore the request. The change allows user to not specify KMP_TOPOLOGY_METHOD=hwloc and choose it automatically instead. Patch by Andrey Churbanov Differential Revision: https://reviews.llvm.org/D40905 llvm-svn: 322205
OpenPOWER on IntegriCloud