summaryrefslogtreecommitdiffstats
path: root/openmp/runtime/src/kmp_settings.c
Commit message (Collapse)AuthorAgeFilesLines
* Change source files from .c to .cppJonathan Peyton2016-12-141-5631/+0
| | | | | | | | Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D26688 llvm-svn: 289732
* Cleanup: memory leaks on warnings printing fixed; some memory freeing ↵Andrey Churbanov2016-11-281-4/+4
| | | | | | | | | | cleaned; poor indents and one typo fixed. Patch by Victor Campos. Differential Revision: https://reviews.llvm.org/D26786 llvm-svn: 288054
* Introduce dynamic affinity dispatch capabilitiesJonathan Peyton2016-11-141-34/+3
| | | | | | | | | | | | | | | | | | | | | | | | | This set of changes enables the affinity interface (Either the preexisting native operating system or HWLOC) to be dynamically set at runtime initialization. The point of this change is that we were seeing performance degradations when using HWLOC. This allows the user to use the old affinity mechanisms which on large machines (>64 cores) makes a large difference in initialization time. These changes mostly move affinity code under a small class hierarchy: KMPAffinity class Mask {} KMPNativeAffinity : public KMPAffinity class Mask : public KMPAffinity::Mask KMPHwlocAffinity class Mask : public KMPAffinity::Mask Since all interface functions (for both affinity and the mask implementation) are virtual, the implementation can be chosen at runtime initialization. Differential Revision: https://reviews.llvm.org/D26356 llvm-svn: 286890
* Code cleanup for the runtime without monitor threadJonathan Peyton2016-10-071-1/+3
| | | | | | | | | | This change removes/disables unnecessary code when monitor thread is not used. Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D25102 llvm-svn: 283577
* Disable monitor thread creation by default.Jonathan Peyton2016-09-271-0/+8
| | | | | | | | | | | | | This change set disables creation of the monitor thread by default. The global counter maintained by the monitor thread was replaced by logic that uses system time directly, and cyclic yielding on Linux target was also removed since there was no clear benefit of using it. Turning on KMP_USE_MONITOR variable (=1) enables creation of monitor thread again if it is really necessary for some reasons. Differential Revision: https://reviews.llvm.org/D24739 llvm-svn: 282507
* [OPENMP] Implementation of omp_get_default_device and omp_set_default_deviceGeorge Rokos2016-09-091-0/+16
| | | | | | | | | Implementation of missing OpenMP 4.0 API functions omp_get_default_device and omp_set_default_device. Also, added support for the environment variable OMP_DEFAULT_DEVICE. Differential Revision: https://reviews.llvm.org/D23587 llvm-svn: 281065
* http://reviews.llvm.org/D22134: Implementation of OpenMP 4.5 nonmonotonic ↵Andrey Churbanov2016-07-111-3/+2
| | | | | | schedule modifier llvm-svn: 275052
* Deprecate KMP_PLACE_THREADS and rename as KMP_HW_SUBSETJonathan Peyton2016-06-161-21/+42
| | | | | | | | | | | | | Deprecate KMP_PLACE_THREADS and rename it to KMP_HW_SUBSET due to confusion about its purpose and function among users. KMP_HW_SUBSET is an environment variable which allows users to easily pick a subset of the hardware topology to use. e.g., KMP_HW_SUBSET=30c,2t means use 30 cores, 2 threads per core. Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21340 llvm-svn: 272937
* Renaming change: 41 -> 45 and 4.1 -> 4.5Jonathan Peyton2016-06-141-3/+3
| | | | | | | | OpenMP 4.1 is now OpenMP 4.5. Any mention of 41 or 4.1 is replaced with 45 or 4.5. Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that 41 is deprecated and to use 45 instead. llvm-svn: 272687
* Hwloc refactoring patchJonathan Peyton2016-06-131-25/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These changes remove the hwloc_topology_ignore_type function which doesn't exist in the hwloc 2.0 API. In the existing code, the topology extracted from hwloc has the cache levels stripped out and then assumes the final stripped topology follows the typical three-level topology: packages -> cores -> HW threads. But the code is doing unclean manipulations to determine at what level those resources are located and also assumes too much about what hwloc is detecting (there could be intermediate levels in between socket and core for instance). This new way of extracting the topology doesn't strip out any hardware objects that hwloc detects. It does not assume the three level topology, and instead searches for the relevant three levels within the topology for each bit of information using hwloc interface functions. i.e., the three level topology subset that our affinity code is interested in is extracted from the hwloc topology tree directly. For example, the new __kmp_hwloc_get_nobjs_under_obj function gives the user the number of cores under a socket reliably without worrying if there are unexpected objects between the socket object and core object in the hwloc topology structure. Also, now that all topology information is kept, there are also possibilities of using the caches/numa nodes to determine more sophisticated affinity settings in the future. There is also some cleanup code added for the destruction of the __kmp_hwloc_topology object. Differential Revision: http://reviews.llvm.org/D21195 llvm-svn: 272565
* Offer API for setting number of loop dispatch buffersJonathan Peyton2016-05-311-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem is the lack of dispatch buffers when thousands of loops with nowait, about 10 iterations each, are executed by hundreds of threads. We only have built-in 7 dispatch buffers, but there is a need in dozens or hundreds of buffers. The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order to give users same possibility I changed build-time control into run-time one, adding API just in case. This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API function kmp_set_disp_num_buffers(int num_buffers). The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization, because during the serial initialization we already allocate buffers for the hot team, so it is too late to change the number of buffers later (or we need to reallocate buffers for all teams which sounds too complicated). The kmp_set_defaults() routine does not work for this envirable, because it calls serial initialization before reading the parameter string. So a new routine, kmp_set_disp_num_buffers(), is created so that it can set our internal global variable before the library initialization. If both the envirable and API used the envirable wins. Differential Revision: http://reviews.llvm.org/D20697 llvm-svn: 271318
* Clean all the mess around KMP_USE_FUTEX and kmp_lock.hPaul Osmialowski2016-05-161-2/+3
| | | | | | | | | | | | | | | | | | KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used inconsequently throughout LLVM libomp code. * some .c files that use this define do not include kmp_lock.h file, in effect guarded part of code are never compiled * some places in code use architecture-depending preprocessor logic expressions which effectively disable use of Futex for AArch64 architecture, all these places should use '#if KMP_USE_FUTEX' instead to avoid any further confusions * some places use KMP_HAS_FUTEX which is nowhere defined, KMP_USE_FUTEX should be used instead Differential Revision: http://reviews.llvm.org/D19629 llvm-svn: 269642
* NFC fix indent (relates to my previous commit)Paul Osmialowski2016-05-131-3/+3
| | | | llvm-svn: 269443
* New hwloc API compatibilityPaul Osmialowski2016-05-121-0/+13
| | | | | | Differential Revision: http://reviews.llvm.org/D19628 llvm-svn: 269284
* Exponential back off logic for test-and-set lockJonathan Peyton2016-04-141-0/+97
| | | | | | | | | | | | | | | | | | | | | | | | | This change adds back off logic in the test and set lock for better contended lock performance. It uses a simple truncated binary exponential back off function. The default back off parameters are tuned for x86. The main back off logic has a two loop structure where each is controlled by a user-level parameter: max_backoff - limits the outer loop number of iterations. This parameter should be a power of 2. min_ticks - the inner spin wait loop number of "ticks" which is system dependent and should be tuned for your system if you so choose. The "ticks" on x86 correspond to the time stamp counter, but on other architectures ticks is a timestamp derived from gettimeofday(). The user can modify these via the environment variable: KMP_SPIN_BACKOFF_PARAMS=max_backoff[,min_ticks] Currently, since the default user lock is a queuing lock, one would have to also specify KMP_LOCK_KIND=tas to use the test-and-set locks. Differential Revision: http://reviews.llvm.org/D19020 llvm-svn: 266329
* OMP_WAIT_POLICY changesJonathan Peyton2016-04-041-2/+15
| | | | | | | | | | | | | This change has OMP_WAIT_POLICY=active to mean that threads will busy-wait in spin loops and virtually never go to sleep. OMP_WAIT_POLICY=passive now means that threads will immediately go to sleep inside a spin loop. KMP_BLOCKTIME was the previous mechanism to specify this behavior via KMP_BLOCKTIME=0 or KMP_BLOCKTIME=infinite, but the standard OpenMP environment variable should also be able to specify this behavior. Differential Revision: http://reviews.llvm.org/D18577 llvm-svn: 265339
* Add initial support for OpenMP 4.5 task priority featureJonathan Peyton2016-02-251-1/+19
| | | | | | | | | | | The maximum task priority value is read from envirable: OMP_MAX_TASK_PRIORITY. But as of now, nothing is done with it. We just handle the environment variable and add the new api: omp_get_max_task_priority() which returns that value or zero if it is not set. Differential Revision: http://reviews.llvm.org/D17411 llvm-svn: 261908
* Removing extra empty linesJonathan Peyton2016-01-271-1/+0
| | | | llvm-svn: 258984
* Fix compilations with msvc's /Zc:strictStringsIsmail Donmez2016-01-261-2/+2
| | | | llvm-svn: 258797
* Hinted lock (OpenMP 4.5 feature) Updates/Fixes Part 2Jonathan Peyton2015-12-111-8/+31
| | | | | | | | | | | | | | | | | | | * Added a new user TSX lock implementation, RTM, This implementation is a light-weight version of the adaptive lock implementation, omitting the back-off logic for deciding when to specualte (or not). The fall-back lock is still the queuing lock. * Changed indirect lock table management. The data for indirect lock management was encapsulated in the "kmp_indirect_lock_table_t" type. Also, the lock table dimension was changed to 2D (was linear), and each entry is a kmp_indirect_lock_t object now (was a pointer to an object). * Some clean up in the critical section code * Removed the limits of the tuning parameters read from KMP_ADAPTIVE_LOCK_PROPS * KMP_USE_DYNAMIC_LOCK=1 also turns on these two switches: KMP_USE_TSX, KMP_USE_ADAPTIVE_LOCKS Differential Revision: http://reviews.llvm.org/D15204 llvm-svn: 255375
* Replace DYNA_* names with KMP_* namesJonathan Peyton2015-12-031-8/+8
| | | | llvm-svn: 254637
* Adding Hwloc library option for affinity mechanismJonathan Peyton2015-11-301-0/+37
| | | | | | | | | | | | | | | | | | | These changes allow libhwloc to be used as the topology discovery/affinity mechanism for libomp. It is supported on Unices. The code additions: * Canonicalize KMP_CPU_* interface macros so bitmask operations are implementation independent and work with both hwloc bitmaps and libomp bitmaps. So there are new KMP_CPU_ALLOC_* and KMP_CPU_ITERATE() macros and the like. These are all in kmp.h and appropriately placed. * Hwloc topology discovery code in kmp_affinity.cpp. This uses the hwloc interface to create a libomp address2os object which the rest of libomp knows how to handle already. * To build, use -DLIBOMP_USE_HWLOC=on and -DLIBOMP_HWLOC_INSTALL_DIR=/path/to/install/dir [default /usr/local]. If CMake can't find the library or hwloc.h, then it will tell you and exit. Differential Revision: http://reviews.llvm.org/D13991 llvm-svn: 254320
* Remove some empty lines.Jonathan Peyton2015-11-041-6/+0
| | | | llvm-svn: 252084
* Removed '@' from delimiters, added it as offset designator.Jonathan Peyton2015-10-201-23/+75
| | | | | | | | | | | | Moved '@' from delimiters to offset designators for the KMP_PLACE_THREADS environment variable. Only one of: postfix "o" or prefix @, should be used in the value of KMP_PLACE_THREADS. For example, '2s@2,4c@2,1t'. This is also the format of KMP_SETTINGS=1 output now (removed "o" from there). e.g., 2s,2o,4c,2o,1t. Differential Revision: http://reviews.llvm.org/D13701 llvm-svn: 250846
* Added sockets to the syntax of KMP_PLACE_THREADS environment variable.Jonathan Peyton2015-10-081-76/+169
| | | | | | | | | | | | | | | | | | | Added (optional) sockets to the syntax of the KMP_PLACE_THREADS environment variable. Some limitations: * The number of sockets and then optional offset should be specified first (before other parameters). * The letter designation is mandatory for sockets and then for other parameters. * If number of cores is specified first, then the number of sockets is defaulted to all sockets on the machine; also, the old syntax is partially supported if sockets are skipped. * If number of threads per core is specified first, then the number of sockets and cores per socket are defaulted to all sockets and all cores per socket respectively. * The number of cores per socket cannot be specified before sockets or after threads per core. * The number of threads per core can be specified before or after core-offset (old syntax required it to be before core-offset); * Parameters delimiter can be: empty, comma, lower-case x; * Spaces are allowed around numbers, around letters, around delimiter. Approximate shorthand specification: KMP_PLACE_THREADS="[num_sockets(S|s)[[delim]offset(O|o)][delim]][num_cores_per_socket(C|c)[[delim]offset(O|o)][delim]][num_threads_per_core(T|t)]" Differential Revision: http://reviews.llvm.org/D13175 llvm-svn: 249708
* Fix the OpenMP 3.0 buildJonathan Peyton2015-09-211-3/+4
| | | | | | | | | | | This change adds guards to the code in places where they are missing to enable the OpenMP 3.0 build. Patch by Diego Caballero and Johnny Peyton Mailing List: http://lists.llvm.org/pipermail/openmp-dev/2015-September/000935.html llvm-svn: 248178
* Remove unused variable warnings by deletion.Jonathan Peyton2015-06-081-3/+0
| | | | | | | | | | As an ongoing effort to sanitize the openmp code, these changes delete variables that aren't used at all. http://lists.cs.uiuc.edu/pipermail/openmp-dev/2015-June/000701.html Patch by Jack Howarth llvm-svn: 239334
* Remove unused variable warnings by adding proper macro guards.Jonathan Peyton2015-06-081-1/+6
| | | | | | | | | As an ongoing effort to sanitize the openmp code, these changes remove unused variables by adding proper macros around both variables and functions. Patch by Jack Howarth llvm-svn: 239330
* Removed unused functions.Jonathan Peyton2015-06-081-122/+0
| | | | | | | | | | | As an ongoing effort to sanitize the openmp code, these changes remove unused functions. The unused functions are: __kmp_fini_allocator_thread(), __kmp_env_isDefined(), __kmp_strip_quotes(), __kmp_convert_to_seconds(), and __kmp_convert_to_nanoseconds(). Patch by Jack Howarth llvm-svn: 239323
* Replace some unsafe API calls with safe alternatives on Windows, prepare ↵Andrey Churbanov2015-04-021-13/+13
| | | | | | code for similar actions on other platforms - wrap unsafe API calls into macros. llvm-svn: 233915
* proc_bind_disabled enum value removed, its usage replased with proc_bind_falseAndrey Churbanov2015-03-101-6/+2
| | | | llvm-svn: 231776
* cleanup: usages of mask size wrapped into macrosAndrey Churbanov2015-03-101-1/+1
| | | | llvm-svn: 231775
* Two warning messages fixed.Andrey Churbanov2015-02-201-1/+1
| | | | llvm-svn: 230035
* Detect Intel MIC architecture and set some defaults at run time instead of ↵Andrey Churbanov2015-02-201-31/+45
| | | | | | build time. llvm-svn: 230033
* Added new user-guided lock api, currently disabled. Use ↵Andrey Churbanov2015-02-201-1/+21
| | | | | | KMP_USE_DYNAMIC_LOCK=1 to enable it. llvm-svn: 230030
* enable environment variable KMP_PLACE_THREADS also for non-MIC architecturesAndrey Churbanov2015-01-291-4/+0
| | | | llvm-svn: 227467
* fix that sets proc-bind-var to proc_bind_false if affinity is not supportedAndrey Churbanov2015-01-291-0/+6
| | | | llvm-svn: 227454
* Comments only: removing the Revision and Date svn variables from the top of ↵Andrey Churbanov2015-01-271-2/+0
| | | | | | all the source files. llvm-svn: 227207
* minor formatting changeAndrey Churbanov2015-01-271-2/+1
| | | | llvm-svn: 227205
* Fixes error where proc-bind-var is not set when there is a parsing error of ↵Andrey Churbanov2015-01-271-0/+3
| | | | | | GOMP_AFFINITY environment variable. llvm-svn: 227202
* Replaces KMP_OS_WINDOWS && KMP_ARCH_X86_64 or any combination of those two ↵Andrey Churbanov2015-01-271-12/+12
| | | | | | options with the feature macro KMP_GROUP_AFFINITY. llvm-svn: 227199
* This patch enables the use of KMP_AFFINITY=balanced on non-MIC ↵Andrey Churbanov2015-01-131-13/+15
| | | | | | Architectures. The restriction for using balanced affinity on non-MIC architectures is it only works for one-package machines. llvm-svn: 225794
* I apologise in advance for the size of this check-in. At Intel we doJim Cownie2014-10-071-154/+270
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | understand that this is not friendly, and are working to change our internal code-development to make it easier to make development features available more frequently and in finer (more functional) chunks. Unfortunately we haven't got that in place yet, and unpicking this into multiple separate check-ins would be non-trivial, so please bear with me on this one. We should be better in the future. Apologies over, what do we have here? GGC 4.9 compatibility -------------------- * We have implemented the new entrypoints used by code compiled by GCC 4.9 to implement the same functionality in gcc 4.8. Therefore code compiled with gcc 4.9 that used to work will continue to do so. However, there are some other new entrypoints (associated with task cancellation) which are not implemented. Therefore user code compiled by gcc 4.9 that uses these new features will not link against the LLVM runtime. (It remains unclear how to handle those entrypoints, since the GCC interface has potentially unpleasant performance implications for join barriers even when cancellation is not used) --- new parallel entry points --- new entry points that aren't OpenMP 4.0 related These are implemented fully :- GOMP_parallel_loop_dynamic() GOMP_parallel_loop_guided() GOMP_parallel_loop_runtime() GOMP_parallel_loop_static() GOMP_parallel_sections() GOMP_parallel() --- cancellation entry points --- Currently, these only give a runtime error if OMP_CANCELLATION is true because our plain barriers don't check for cancellation while waiting GOMP_barrier_cancel() GOMP_cancel() GOMP_cancellation_point() GOMP_loop_end_cancel() GOMP_sections_end_cancel() --- taskgroup entry points --- These are implemented fully. GOMP_taskgroup_start() GOMP_taskgroup_end() --- target entry points --- These are empty (as they are in libgomp) GOMP_target() GOMP_target_data() GOMP_target_end_data() GOMP_target_update() GOMP_teams() Improvements in Barriers and Fork/Join -------------------------------------- * Barrier and fork/join code is now in its own file (which makes it easier to understand and modify). * Wait/release code is now templated and in its own file; suspend/resume code is also templated * There's a new, hierarchical, barrier, which exploits the cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve fork/join and barrier performance. ***BEWARE*** the new source files have *not* been added to the legacy Cmake build system. If you want to use that fixes wil be required. Statistics Collection Code -------------------------- * New code has been added to collect application statistics (if this is enabled at library compile time; by default it is not). The statistics code itself is generally useful, the lightweight timing code uses the X86 rdtsc instruction, so will require changes for other architectures. The intent of this code is not for users to tune their codes but rather 1) For timing code-paths inside the runtime 2) For gathering general properties of OpenMP codes to focus attention on which OpenMP features are most used. Nested Hot Teams ---------------- * The runtime now maintains more state to reduce the overhead of creating and destroying inner parallel teams. This improves the performance of code that repeatedly uses nested parallelism with the same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL envirable to a depth to enable this (and, of course, OMP_NESTED=true to enable nested parallelism at all). Improved Intel(r) VTune(Tm) Amplifier support --------------------------------------------- * The runtime provides additional information to Vtune via the itt_notify interface to allow it to display better OpenMP specific analyses of load-imbalance. Support for OpenMP Composite Statements --------------------------------------- * Implement new entrypoints required by some of the OpenMP 4.1 composite statements. Improved ifdefs --------------- * More separation of concepts ("Does this platform do X?") from platforms ("Are we compiling for platform Y?"), which should simplify future porting. ScaleMP* contribution --------------------- Stack padding to improve the performance in their environment where cross-node coherency is managed at the page level. Redesign of wait and release code --------------------------------- The code is simplified and performance improved. Bug Fixes --------- *Fixes for Windows multiple processor groups. *Fix Fortran module build on Linux: offload attribute added. *Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen. *Fix an inconsistent error message for KMP_PLACE_THREADS environment variable. llvm-svn: 219214
* Commit PowerPC64 support from Carlo Bertolli at IBM.Jim Cownie2014-08-071-1/+2
| | | | llvm-svn: 215093
* Make affinity support conditional on KMP_AFFINITY_SUPPORTEDAlp Toker2014-03-021-22/+12
| | | | | | | | | The feature was previously guarded with KMP_OS_LINUX || KMP_OS_WINDOWS but can now be enabled/disabled independently to simplify porting. Completes the work started in r202478. llvm-svn: 202613
* Add support for FreeBSDAlp Toker2014-02-281-16/+12
| | | | | | | | | | | | | | | Port the OpenMP runtime to FreeBSD along with associated build system changes. Also begin to generalize affinity capabilities so they aren't tied explicitly to Windows and Linux. The port builds with stock clang and gmake and has no additional runtime dependencies. All but a handful of the validation suite tests are now passing on FreeBSD 10 x86_64. llvm-svn: 202478
* Restore string match behavior following changes in r202018Alp Toker2014-02-251-2/+2
| | | | llvm-svn: 202197
* Fix typosAlp Toker2014-02-241-4/+4
| | | | llvm-svn: 202018
* For your Christmas hacking pleasure.Jim Cownie2013-12-231-6/+14
| | | | | | | | | | | | | | | | | This release use aligns with Intel(r) Composer XE 2013 SP1 Product Update 2 New features * The library can now be built with clang (though wiht some limitations since clang does not support 128 bit floats) * Support for Vtune analysis of load imbalance * Code contribution from Steven Noonan to build the runtime for ARM* architecture processors * First implementation of runtime API for OpenMP cancellation Bug Fixes * Fixed hang on Windows (only) when using KMP_BLOCKTIME=0 llvm-svn: 197914
* First attempt to import OpenMP runtimeJim Cownie2013-09-271-0/+5244
llvm-svn: 191506
OpenPOWER on IntegriCloud