summaryrefslogtreecommitdiffstats
path: root/openmp/runtime/src/kmp_settings.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [OpenMP] Implement OpenMP 5.0 affinity format functionalityJonathan Peyton2018-12-131-2/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the affinity format functionality introduced in OpenMP 5.0. This patch adds: Two new environment variables: OMP_DISPLAY_AFFINITY=TRUE|FALSE OMP_AFFINITY_FORMAT=<string> and Four new API: 1) omp_set_affinity_format() 2) omp_get_affinity_format() 3) omp_display_affinity() 4) omp_capture_affinity() The affinity format functionality has two ICV's associated with it: affinity-display-var (bool) and affinity-format-var (string). The affinity-display-var enables/disables the functionality through the envirable OMP_DISPLAY_AFFINITY. The affinity-format-var is a formatted string with the special field types beginning with a '%' character similar to printf For example, the affinity-format-var could be: "OMP: host:%H pid:%P OStid:%i num_threads:%N thread_num:%n affinity:{%A}" The affinity-format-var is displayed by every thread implicitly at the beginning of a parallel region when any thread's affinity has changed (including a brand new thread being spawned), or explicitly using the omp_display_affinity() API. The omp_capture_affinity() function can capture the affinity-format-var in a char buffer. And omp_set|get_affinity_format() allow the user to set|get the affinity-format-var explicitly at runtime. omp_capture_affinity() and omp_get_affinity_format() both return the number of characters needed to hold the entire string it tried to make (not including NULL character). If not enough buffer space is available, both these functions truncate their output. Differential Revision: https://reviews.llvm.org/D55148 llvm-svn: 349089
* Support clang compiling under windows-gnu and windows-msvcAndrey Churbanov2018-12-101-1/+1
| | | | | | | | Patch by Peiyuan Song <squallatf@gmail.com> Differential Revision: https://reviews.llvm.org/D53422 llvm-svn: 348756
* [OpenMP] Minor cleanup of debug codeJonathan Peyton2018-11-281-2/+2
| | | | | | | | | | | * Fix calculation of string length. * Remove NULL-check of pointer which has been dereferenced. Patch by Andrey Churbanov Differential Revision: https://reviews.llvm.org/D54948 llvm-svn: 347801
* [OpenMP][OMPT] A few improvementsJonathan Peyton2018-09-261-0/+18
| | | | | | | | | | | | | | This change includes miscellaneous improvements as follows: 1) Added ompt_get_proc_id() implementation for Windows 2) Added parser and print tool for omp-tool-var, just in case it needs to be printed (OMP_DISPLAY_ENV) 3) omp_control_tool is exported on Windows Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D50538 llvm-svn: 343137
* [OpenMP] Initial implementation of OMP 5.0 Memory Management routinesJonathan Peyton2018-09-071-1/+148
| | | | | | | | | | | | | | | | | | | | | | | Implemented omp_alloc, omp_free, omp_{set,get}_default_allocator entries, and OMP_ALLOCATOR environment variable. Added support for HBW memory on Linux if libmemkind.so library is accessible (dynamic library only, no support for static libraries). Only used stable API (hbwmalloc) of the memkind library though we may consider using experimental API in future. The ICV def-allocator-var is implemented per implicit task similar to place-partition-var. In the absence of a requested allocator, the uses the default allocator. Predefined allocators (the only ones currently available) are made similar for C and Fortran, - pointers (long integers) with values 1 to 8. Patch by Andrey Churbanov Differential Revision: https://reviews.llvm.org/D51232 llvm-svn: 341687
* [OpenMP] Cleanup codeJonathan Peyton2018-08-091-2/+7
| | | | | | | | | | | | | | | | This patch cleans up unused functions, variables, sign compare issues, and addresses some -Warning flags which are now enabled including -Wcast-qual. Not all the warning flags in LibompHandleFlags.cmake are enabled, but some are with this patch. Some __kmp_gtid_from_* macros in kmp.h are switched to static inline functions which allows us to remove the awkward definition of KMP_DEBUG_ASSERT() and KMP_ASSERT() macros which used the comma operator. This had to be done for the innumerable -Wunused-value warnings related to KMP_DEBUG_ASSERT() Differential Revision: https://reviews.llvm.org/D49105 llvm-svn: 339393
* [OpenMP] Fix build errors when building with KMP_DEBUG_ADAPTIVE_LOCKS=1Jonathan Peyton2018-07-301-2/+2
| | | | | | | | | | | | | This change fixes build errors when building a runtime with adaptive lock stats enabled. Most of the errors were due to the recent changes in the runtime, but it seems that we have not tried to build this debug runtime on Windows for a long time. Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D49823 llvm-svn: 338277
* [OpenMP] Introduce hierarchical schedulingJonathan Peyton2018-07-091-53/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the logic implementing hierarchical scheduling. First and foremost, hierarchical scheduling is off by default To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage. This work is based off if the IWOMP paper: "Workstealing and Nested Parallelism in SMP Systems" Hierarchical scheduling is the layering of OpenMP schedules for different layers of the memory hierarchy. One can have multiple layers between the threads and the global iterations space. The threads will go up the hierarchy to grab iterations, using possibly a different schedule & chunk for each layer. [ Global iteration space (0-999) ] (use static) [ L1 | L1 | L1 | L1 ] (use dynamic,1) [ T0 T1 | T2 T3 | T4 T5 | T6 T7 ] In the example shown above, there are 8 threads and 4 L1 caches begin targeted. If the topology indicates that there are two threads per core, then two consecutive threads will share the data of one L1 cache unit. This example would have the iteration space (0-999) split statically across the four L1 caches (so the first L1 would get (0-249), the second would get (250-499), etc). Then the threads will use a dynamic,1 schedule to grab iterations from the L1 cache units. There are currently four supported layers: L1, L2, L3, NUMA OMP_SCHEDULE can now read a hierarchical schedule with this syntax: OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h to try to keep it separate from the rest of the code. Differential Revision: https://reviews.llvm.org/D47962 llvm-svn: 336571
* [OpenMP] Fix affinity API for KMP_AFFINITY=none|compact|scatterJonathan Peyton2018-04-181-0/+2
| | | | | | | | | | | | | | | | | | | | | Currently, the affinity API reports garbage for the initial place list and any thread's place lists when using KMP_AFFINITY=none|compact|scatter. This patch does two things: for KMP_AFFINITY=none, Creates a one entry table for the places, this way, the initial place list is just a single place with all the proc ids in it. We also set the initial place of any thread to 0 instead of KMP_PLACE_ALL so that the thread reports that single place (place 0) instead of garbage (-1) when using the affinity API. When non-OMP_PROC_BIND affinity is used (including KMP_AFFINITY=compact|scatter), a thread's place list is populated correctly. We assume that each thread is assigned to a single place. This is implemented in two of the affinity API functions Differential Revision: https://reviews.llvm.org/D45527 llvm-svn: 330283
* Move blocktime_str variable right before its first useJonathan Peyton2018-03-261-3/+3
| | | | llvm-svn: 328575
* Read OMP_TARGET_OFFLOAD and provide API to access ICVJonathan Peyton2018-03-201-0/+44
| | | | | | | | | | | | | | Added settings code to read OMP_TARGET_OFFLOAD environment variable. Added target-offload-var ICV as __kmp_target_offload, set via OMP_TARGET_OFFLOAD, if available, otherwise defaulting to DEFAULT. Valid values for the ICV are specified as enum values {0,1,2} for disabled, default, and mandatory. An internal API access function __kmpc_get_target_offload is provided. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D44577 llvm-svn: 328046
* Force HWLOC topology method for NUMA-specific topologyJonathan Peyton2018-01-101-0/+9
| | | | | | | | | | | | If user requested affinity with granularity=tile we need to either use HWLOC or ignore the request. The change allows user to not specify KMP_TOPOLOGY_METHOD=hwloc and choose it automatically instead. Patch by Andrey Churbanov Differential Revision: https://reviews.llvm.org/D40905 llvm-svn: 322205
* Fix trademarks found by scannerJonathan Peyton2018-01-041-6/+6
| | | | llvm-svn: 321827
* Extension of HWLOC topology discovery with NUMA nodes and tilesAndrey Churbanov2017-11-301-7/+30
| | | | | | | | Patch by Olga Malysheva Differential Revision: https://reviews.llvm.org/D40309 llvm-svn: 319422
* Add const to some variables to avoid const_castsJonas Hahnfeld2017-11-091-4/+4
| | | | | | | | | In these places the const attribute seems correct and doesn't need any other change, so let's do it. Differential Revision: https://reviews.llvm.org/D39756 llvm-svn: 317798
* Remove const from variables with dynamic memoryJonas Hahnfeld2017-11-091-2/+4
| | | | | | | | | | | | | | | | | | | Allocated memory is typically not 'const' if it needs to be freed. This patch removes around 50 wrong const attributes, modifies the corresponding functions and finally gets rid of some const_casts. These have especially been strange for __kmp_str_fname_free() that added a 'const' to call __kmp_str_free() which removed it again. Two minor cleanups that I performed in this process: * __kmp_tool_libraries now lives in kmp_settings.cpp as it is used nowhere else. * __kmp_msg_empty was removed as it was never used and Clang now complained that it was assigned a string literal that is 'const char *'. Differential Revision: https://reviews.llvm.org/D39755 llvm-svn: 317797
* Update implementation of OMPT to the specification OpenMP 5.0 Preview 1 (TR4).Joachim Protze2017-11-011-3/+29
| | | | | | | | | | | | | | The code is tested to work with latest clang, GNU and Intel compiler. The implementation is optimized for low overhead when no tool is attached shifting the cost to execution with tool attached. This patch does not implement OMPT for libomptarget. Patch by Simon Convent and Joachim Protze Differential Revision: https://reviews.llvm.org/D38185 llvm-svn: 317085
* Apply formatting changesJonathan Peyton2017-10-201-5/+2
| | | | | | | | | | .clang-format's comments are removed and a (hopefully) final set of formatting changes are applied. Differential Revision: https://reviews.llvm.org/D38837 Differential Revision: https://reviews.llvm.org/D38920 llvm-svn: 316227
* KMP_HW_SUBSET vs KMP_PLACE_THREADS rival envirables fixJonathan Peyton2017-10-061-5/+22
| | | | | | | | | | | | | | If both KMP_HW_SUBSET and KMP_PLACE_THREADS are set and KMP_PLACE_THREADS gets parsed first, then the current environment variable parser rejects both and neither get used. This patch uses the rivals mechanism that is used for other environment variable groups (e.g., KMP_STACKSIZE, GOMP_STACKSIZE, OMP_STACKSIZE). If both are set, then it tells the user that it is ignoring KMP_PLACE_THREADS in favor of KMP_HW_SUBSET. The message about deprecating KMP_PLACE_THREADS when it is set is still printed regardless. Differential Revision: https://reviews.llvm.org/D38292 llvm-svn: 315091
* Remove unnecessary semicolonsJonathan Peyton2017-09-271-85/+85
| | | | | | | | Removes semicolons after if {} blocks, function definitions, etc. I was able to apply the large OMPT patch cleanly on top of this one with no conflicts. llvm-svn: 314340
* Allow printing of KMP_TOPOLOGY_METHOD when KMP_SETTINGS=trueJonathan Peyton2017-09-261-2/+0
| | | | llvm-svn: 314243
* Add new envirable KMP_TEAMS_THREAD_LIMITJonathan Peyton2017-08-021-0/+14
| | | | | | | | | | | | | | | | | | | | This change adds a new environment variable, KMP_TEAMS_THREAD_LIMIT, which is used to set a new global variable, __kmp_teams_max_nth, which is checked when determining the size and quantity of teams that will be created in the teams construct. Specifically, it is a limit on the total number of threads in a given teams construct. It differentiates the limits for the teams construct from the limits for regular parallel regions (KMP_DEVICE_THREAD_LIMIT/__kmp_max_nth and OMP_THREAD_LIMIT/__kmp_cg_max_nth). When each individual team is formed, it is still subject to those limits. After the clauses to the teams construct are parsed and calculated, we check to make sure we are within this limit, and if not, reduce num_threads per team and/or number of teams, accordingly. The default value is set to the number of available processors on the system. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D36009 llvm-svn: 309874
* Fix implementation of OMP_THREAD_LIMITJonathan Peyton2017-07-271-10/+19
| | | | | | | | | | | | | | | | | | This change fixes the implementation of OMP_THREAD_LIMIT. The implementation of this previously was not restricted to a contention group (but it should be, according to the spec), and this is fixed here. A field is added to root thread to store a counter of the threads in the contention group. An extra check is added when reserving threads for a parallel region that checks this variable and compares to threadlimit-var, which is implemented as a new global variable, kmp_cg_max_nth. Associated settings changes were also made, and clean up of comments that referred to OMP_THREAD_LIMIT, but should refer to the new KMP_DEVICE_THREAD_LIMIT (added in an earlier patch). Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D35912 llvm-svn: 309319
* Introduce KMP_DEVICE_THREAD_LIMITJonathan Peyton2017-07-261-26/+25
| | | | | | | | | | | | | | | | | | | | | | | | This change drops in KMP_DEVICE_THREAD_LIMIT to replace KMP_MAX_THREADS. It's possible there will eventually be a OMP_DEVICE_THREAD_LIMIT, and we need something to distinguish from OMP_THREAD_LIMIT, which is currently implemented incorrectly (the fix for that will be added soon in a separate patch). KMP_ALL_THREADS is deprecated here, but we can keep the "all" option on KMP_DEVICE_THREAD_LIMIT to support that functionality. KMP_DEVICE_THREAD_LIMIT now has priority over its deprecated rival KMP_ALL_THREADS. I also cleaned up some comments that incorrectly referred to non-existent kmp_max_threads variable instead of kmp_max_nth. I've left the name of where this setting eventually ends up as __kmp_max_nth, for now. This change does not change much in the way of functionality. It does NOT change OMP_THREAD_LIMIT. It's just cleaning up and setting up for that. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D35860 llvm-svn: 309168
* Cleanup: __kmp_env_* variablesJonathan Peyton2017-07-251-9/+7
| | | | | | | | | | Removed unused __kmp_env_* variables. Also clangified other people's code. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D35808 llvm-svn: 309000
* Add recursive task scheduling strategy to taskloop implementationJonathan Peyton2017-07-181-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Taskloop implementation is extended by using recursive task scheduling. Envirable KMP_TASKLOOP_MIN_TASKS added as a manual threshold for the user to switch from recursive to linear tasks scheduling. Details: * The calculations for the loop parameters are moved from __kmp_taskloop_linear upper level * Initial calculation is done in the __kmpc_taskloop, further range splitting is done in the __kmp_taskloop_recur. * Added threshold to switch from recursive to linear tasks scheduling; * One half of split range is scheduled as an internal task which just moves sub-range parameters to the stealing thread that continues recursive scheduling (if number of tasks still enough), the other half is processed recursively; * Internal task duplication routine fixed to assign parent task, that was not needed when all tasks were scheduled by same thread, but is needed now. Patch by Andrey Churbanov Differential Revision: https://reviews.llvm.org/D35273 llvm-svn: 308338
* Removed "duplicates" from verbose affinity outputJonathan Peyton2017-07-171-5/+0
| | | | | | | | The internal details of this setting are not meant to be user visible and only create confusion. Differential Revision: https://reviews.llvm.org/D35269 llvm-svn: 308189
* OpenMP RTL cleanup: eliminated warnings with -Wcast-qual, patch 2.Andrey Churbanov2017-07-171-2/+2
| | | | | | | | | Changes are: got all atomics to accept volatile pointers that allowed to simplify many type conversions. Windows specific code fixed correspondingly. Differential Revision: https://reviews.llvm.org/D35417 llvm-svn: 308164
* OpenMP RTL cleanup: eliminated warnings with -Wcast-qual.Andrey Churbanov2017-07-031-68/+74
| | | | | | | | | | | Changes are: replaced C-style casts with cons_cast and reinterpret_cast; type of several counters changed to signed; type of parameters of 32-bit and 64-bit AND and OR intrinsics changes to unsigned; changed files formatted using clang-format version 3.8.1. Differential Revision: https://reviews.llvm.org/D34759 llvm-svn: 307020
* Replace platform macro with KMP_MIC_SUPPORTEDJonathan Peyton2017-06-131-5/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D34119 llvm-svn: 305307
* Fix for KMP_AFFINITY=disabled and KMP_TOPOLOGY_METHOD=hwlocJonathan Peyton2017-05-311-0/+10
| | | | | | | | | | | | With these settings, the create_hwloc_map() method was being called causing an assert(). After some consideration, it was determined that disabling affinity explicitly should just disable hwloc as well. i.e., KMP_AFFINITY overrides KMP_TOPOLOGY_METHOD. This lets the user know that the Hwloc mechanism is being ignored when KMP_AFFINITY=disabled. Differential Revision: https://reviews.llvm.org/D33208 llvm-svn: 304344
* Address default pinning OpenMP process with multiple processor groupsJonathan Peyton2017-05-311-2/+30
| | | | | | | | | | | | | | | | | | | This change checks if the initial affinity mask is equal to exactly one Windows processor group's affinity mask. If it is, then the code does not respect the initial affinity mask and uses the entire machine instead. The reasoning behind this is that, by default, Windows assigns exactly one processor group as the initial affinity mask even when there are multiple Windows processor groups available. User's typically want to use the whole machine, so we ignore this special case and use the entire machine. If the initial affinity mask is a proper subset of one group, or spans multiple groups, then the initial affinity mask is respected since we can assume that the operating system did not assign this initial affinity mask. This change only affects machines with multiple processor groups Differential Revision: https://reviews.llvm.org/D33210 llvm-svn: 304343
* Clang-format and whitespace cleanup of source codeJonathan Peyton2017-05-121-4353/+4157
| | | | | | | | | | | | | This patch contains the clang-format and cleanup of the entire code base. Some of clang-formats changes made the code look worse in places. A best effort was made to resolve the bulk of these problems, but many remain. Most of the problems were mangling line-breaks and tabbing of comments. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D32659 llvm-svn: 302929
* [OpenMP] Add missing parenthesis which triggers a compile errorGeorge Rokos2017-04-251-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D32490 llvm-svn: 301318
* KMP_HW_SUBSET extended with NUMA support when HWLOC enabledAndrey Churbanov2017-04-131-276/+168
| | | | | | Differential Revision: https://reviews.llvm.org/D31600 llvm-svn: 300220
* Fix assertion failure when 'proclist' is used without 'explicit' in KMP_AFFINITYJonathan Peyton2017-03-101-2/+5
| | | | | | | | | | | | This change fixes an assertion failure the in case KMP_AFFINITY is set with 'proclist' specified but without 'explicit' e.g., KMP_AFFINITY=verbose,proclist=[0-31] Patch by Olga Malysheva Differential Revision: https://reviews.llvm.org/D30404 llvm-svn: 297480
* Enable yield cycle on LinuxJonathan Peyton2017-02-151-4/+0
| | | | | | | | | | | | This change allows the runtime to turn __kmp_yield() on/off repeatedly on Linux. This feature was removed when disabling monitor thread, but there are applications that perform better with this feature on. Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D29227 llvm-svn: 295203
* Follow up to r289732: Update comments in source files to reference .cpp filesJonathan Peyton2016-12-141-1/+1
| | | | | | Patch by Hansang Bae llvm-svn: 289739
* Change source files from .c to .cppJonathan Peyton2016-12-141-0/+5631
Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D26688 llvm-svn: 289732
OpenPOWER on IntegriCloud