summaryrefslogtreecommitdiffstats
path: root/openmp/runtime/src/kmp_affinity.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix buffer problem with printing long Hwloc affinity maskJonathan Peyton2016-04-251-1/+1
| | | | | | | | This change has the hwloc_bitmap_list_snprintf() function use the entire buffer to print the mask. There is no need to shorten the buffer length by 7. It only needs to be shortened by one byte. llvm-svn: 267470
* New API for restoring current thread's affinity to init affinity of applicationJonathan Peyton2016-01-121-0/+38
| | | | | | | | | | | | | | | This new API, int kmp_set_thread_affinity_mask_initial(), is available for use by other parallel runtime libraries inside a possibly OpenMP-registered thread. This entry point restores the current thread's affinity mask to the affinity mask of the application when it first began. If -1 is returned it can be assumed that either the thread hasn't called affinity initialization or that the thread isn't registered with the OpenMP library. If 0 is returned then, then the call was successful. Any return value greater than zero indicates an error occurred when setting affinity. Differential Revision: http://reviews.llvm.org/D15867 llvm-svn: 257489
* Adding Hwloc library option for affinity mechanismJonathan Peyton2015-11-301-122/+517
| | | | | | | | | | | | | | | | | | | These changes allow libhwloc to be used as the topology discovery/affinity mechanism for libomp. It is supported on Unices. The code additions: * Canonicalize KMP_CPU_* interface macros so bitmask operations are implementation independent and work with both hwloc bitmaps and libomp bitmaps. So there are new KMP_CPU_ALLOC_* and KMP_CPU_ITERATE() macros and the like. These are all in kmp.h and appropriately placed. * Hwloc topology discovery code in kmp_affinity.cpp. This uses the hwloc interface to create a libomp address2os object which the rest of libomp knows how to handle already. * To build, use -DLIBOMP_USE_HWLOC=on and -DLIBOMP_HWLOC_INSTALL_DIR=/path/to/install/dir [default /usr/local]. If CMake can't find the library or hwloc.h, then it will tell you and exit. Differential Revision: http://reviews.llvm.org/D13991 llvm-svn: 254320
* Improvements to machine_hierarchy code for re-sizingJonathan Peyton2015-11-091-3/+4
| | | | | | | | | | | | | These changes include: 1) Machine hierarchy now uses the base_num_threads field to indicate the maximum number of threads the current hierarchy can handle without a resize. 2) In __kmp_get_hierarchy, we need to get depth after any potential resize is done. 3) Cleanup of hierarchy resize code to support 1 above. Differential Revision: http://reviews.llvm.org/D14455 llvm-svn: 252475
* Fix OMP_PLACES negation operator parsing (!place)Jonathan Peyton2015-10-191-1/+1
| | | | | | | Just moved the *scan++ line up before the recursive call. Otherwise, infinite recursion occurs and leads to a segmentation fault. llvm-svn: 250729
* Added sockets to the syntax of KMP_PLACE_THREADS environment variable.Jonathan Peyton2015-10-081-22/+37
| | | | | | | | | | | | | | | | | | | Added (optional) sockets to the syntax of the KMP_PLACE_THREADS environment variable. Some limitations: * The number of sockets and then optional offset should be specified first (before other parameters). * The letter designation is mandatory for sockets and then for other parameters. * If number of cores is specified first, then the number of sockets is defaulted to all sockets on the machine; also, the old syntax is partially supported if sockets are skipped. * If number of threads per core is specified first, then the number of sockets and cores per socket are defaulted to all sockets and all cores per socket respectively. * The number of cores per socket cannot be specified before sockets or after threads per core. * The number of threads per core can be specified before or after core-offset (old syntax required it to be before core-offset); * Parameters delimiter can be: empty, comma, lower-case x; * Spaces are allowed around numbers, around letters, around delimiter. Approximate shorthand specification: KMP_PLACE_THREADS="[num_sockets(S|s)[[delim]offset(O|o)][delim]][num_cores_per_socket(C|c)[[delim]offset(O|o)][delim]][num_threads_per_core(T|t)]" Differential Revision: http://reviews.llvm.org/D13175 llvm-svn: 249708
* Fix memory corruption in Windows debug libraryJonathan Peyton2015-09-251-5/+5
| | | | | | | | This patch adjusts the buffer size when reducing the buffer used for printing. This solves the memory corruption in Windows debug library, and potential memory corruption in other builds. llvm-svn: 248588
* Fix depth field bug and resize() function in hierarchical barrierJonathan Peyton2015-09-101-6/+3
| | | | | | | | | | | This is a follow up to the hierarchy cleanup patch. Added some clarifying comments to hierarchy_info. Fixed a bug with the depth field not being updated cleanly during a resize. Fixed resize to first check capacity as determined by maxLevels before actually doing the full resize. Differential Revision: http://reviews.llvm.org/D12562 llvm-svn: 247333
* Cleanup of affinity hierarchy code.Jonathan Peyton2015-09-101-456/+28
| | | | | | | | | | | | Some of this is improvement to code suggested by Hal Finkel. Four changes here: 1.Cleanup of hierarchy code to handle all hierarchy cases whether affinity is available or not 2.Separated this and other classes and common functions out to a header file 3.Added a destructor-like fini function for the hierarchy (and call in __kmp_cleanup) 4.Remove some redundant code that is hopefully no longer needed Differential Revision: http://reviews.llvm.org/D12449 llvm-svn: 247326
* Fix machine topology pruning.Jonathan Peyton2015-08-251-17/+22
| | | | | | | | | | | This patch fixes a bug when eliminating layers in the machine topology (namely cores, and threads). Before this patch, if a user specifies using only one thread per socket, then affinity is not set properly due to bad topology pruning. Differential Revision: http://reviews.llvm.org/D11158 llvm-svn: 245966
* Allow machine hierarchy expansionJonathan Peyton2015-06-221-10/+78
| | | | | | | | | This fix allows the machine hierarchy to be expanded in case it needs to handle more threads. It adds a resize function to accomplish this. Differential Revision: http://reviews.llvm.org/D9900 llvm-svn: 240292
* Re-enable Visual Studio Builds.Jonathan Peyton2015-06-221-3/+3
| | | | | | | | | I tried to compile with Visual Studio using CMake and found these two sections of code causing problems for Visual Studio. The first one removes the use of variable length arrays by instead using KMP_ALLOCA(). The second part eliminates a redundant cpuid assembly call by using the already existing __kmp_x86_cpuid() call instead. llvm-svn: 240290
* Apply name change to src/* files.Jonathan Peyton2015-06-011-1/+1
| | | | | | | | | These changes are mostly in comments, but there are a few that aren't. Change libiomp5 => libomp everywhere. One internal function name is changed in kmp_gsupport.c, and in kmp_i18n.c, the static char[] variable 'name' is changed to "libomp". llvm-svn: 238712
* Fix comment about balanced affinityJonathan Peyton2015-05-271-1/+1
| | | | | | | | A while back, Hal mentioned fixing a comment concerning balanced affinity. http://lists.cs.uiuc.edu/pipermail/openmp-dev/2014-December/000358.html I forgot about fixing it until now, but now is better than never. llvm-svn: 238378
* The generation of the hierarchy used by hierarchical barrier improved in how ↵Andrey Churbanov2015-04-131-43/+78
| | | | | | the generation reacts to affinity set to none, or disabled, or no affinity available, or oversubscription. Some cleanup actions based on review comments to follow: need to use meaningful names instead of digital constants, e.g. use enumerators. llvm-svn: 234775
* Replace some unsafe API calls with safe alternatives on Windows, prepare ↵Andrey Churbanov2015-04-021-19/+19
| | | | | | code for similar actions on other platforms - wrap unsafe API calls into macros. llvm-svn: 233915
* Eliminated the write to depth field of the machine_hierarchy data structure ↵Andrey Churbanov2015-04-021-9/+7
| | | | | | in __kmp_get_hierarchy(), thus fixing race condition. Now local variable used by each thread. llvm-svn: 233914
* issuing of incorrect warning fixedAndrey Churbanov2015-03-101-4/+4
| | | | llvm-svn: 231779
* cleanup: usages of mask size wrapped into macrosAndrey Churbanov2015-03-101-2/+2
| | | | llvm-svn: 231775
* changed unsigned types to signed - caused by comments of Hal Finkel on one ↵Andrey Churbanov2015-03-101-3/+3
| | | | | | of earlier patches llvm-svn: 231773
* minor change: comment improvedAndrey Churbanov2015-03-051-1/+1
| | | | llvm-svn: 231381
* Fixed memory corruption problem.Andrey Churbanov2015-02-101-0/+4
| | | | llvm-svn: 228736
* enable environment variable KMP_PLACE_THREADS also for non-MIC architecturesAndrey Churbanov2015-01-291-10/+3
| | | | llvm-svn: 227467
* fixing typo in error messageAndrey Churbanov2015-01-291-1/+1
| | | | llvm-svn: 227451
* Comments only: removing the Revision and Date svn variables from the top of ↵Andrey Churbanov2015-01-271-2/+0
| | | | | | all the source files. llvm-svn: 227207
* Enables a cpuid leaf 4 check for non-MIC x86 architectures.Andrey Churbanov2015-01-271-21/+14
| | | | llvm-svn: 227204
* Removes some unused variables (__kmp_ht_*) and changes__kmp_ncores and ↵Andrey Churbanov2015-01-271-17/+9
| | | | | | __kmp_nThreadsPerCore to static globals within kmp_affinity.cpp. llvm-svn: 227201
* Replaces KMP_OS_WINDOWS && KMP_ARCH_X86_64 or any combination of those two ↵Andrey Churbanov2015-01-271-12/+12
| | | | | | options with the feature macro KMP_GROUP_AFFINITY. llvm-svn: 227199
* This patch enables the use of KMP_AFFINITY=balanced on non-MIC ↵Andrey Churbanov2015-01-131-10/+2
| | | | | | Architectures. The restriction for using balanced affinity on non-MIC architectures is it only works for one-package machines. llvm-svn: 225794
* I apologise in advance for the size of this check-in. At Intel we doJim Cownie2014-10-071-137/+268
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | understand that this is not friendly, and are working to change our internal code-development to make it easier to make development features available more frequently and in finer (more functional) chunks. Unfortunately we haven't got that in place yet, and unpicking this into multiple separate check-ins would be non-trivial, so please bear with me on this one. We should be better in the future. Apologies over, what do we have here? GGC 4.9 compatibility -------------------- * We have implemented the new entrypoints used by code compiled by GCC 4.9 to implement the same functionality in gcc 4.8. Therefore code compiled with gcc 4.9 that used to work will continue to do so. However, there are some other new entrypoints (associated with task cancellation) which are not implemented. Therefore user code compiled by gcc 4.9 that uses these new features will not link against the LLVM runtime. (It remains unclear how to handle those entrypoints, since the GCC interface has potentially unpleasant performance implications for join barriers even when cancellation is not used) --- new parallel entry points --- new entry points that aren't OpenMP 4.0 related These are implemented fully :- GOMP_parallel_loop_dynamic() GOMP_parallel_loop_guided() GOMP_parallel_loop_runtime() GOMP_parallel_loop_static() GOMP_parallel_sections() GOMP_parallel() --- cancellation entry points --- Currently, these only give a runtime error if OMP_CANCELLATION is true because our plain barriers don't check for cancellation while waiting GOMP_barrier_cancel() GOMP_cancel() GOMP_cancellation_point() GOMP_loop_end_cancel() GOMP_sections_end_cancel() --- taskgroup entry points --- These are implemented fully. GOMP_taskgroup_start() GOMP_taskgroup_end() --- target entry points --- These are empty (as they are in libgomp) GOMP_target() GOMP_target_data() GOMP_target_end_data() GOMP_target_update() GOMP_teams() Improvements in Barriers and Fork/Join -------------------------------------- * Barrier and fork/join code is now in its own file (which makes it easier to understand and modify). * Wait/release code is now templated and in its own file; suspend/resume code is also templated * There's a new, hierarchical, barrier, which exploits the cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve fork/join and barrier performance. ***BEWARE*** the new source files have *not* been added to the legacy Cmake build system. If you want to use that fixes wil be required. Statistics Collection Code -------------------------- * New code has been added to collect application statistics (if this is enabled at library compile time; by default it is not). The statistics code itself is generally useful, the lightweight timing code uses the X86 rdtsc instruction, so will require changes for other architectures. The intent of this code is not for users to tune their codes but rather 1) For timing code-paths inside the runtime 2) For gathering general properties of OpenMP codes to focus attention on which OpenMP features are most used. Nested Hot Teams ---------------- * The runtime now maintains more state to reduce the overhead of creating and destroying inner parallel teams. This improves the performance of code that repeatedly uses nested parallelism with the same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL envirable to a depth to enable this (and, of course, OMP_NESTED=true to enable nested parallelism at all). Improved Intel(r) VTune(Tm) Amplifier support --------------------------------------------- * The runtime provides additional information to Vtune via the itt_notify interface to allow it to display better OpenMP specific analyses of load-imbalance. Support for OpenMP Composite Statements --------------------------------------- * Implement new entrypoints required by some of the OpenMP 4.1 composite statements. Improved ifdefs --------------- * More separation of concepts ("Does this platform do X?") from platforms ("Are we compiling for platform Y?"), which should simplify future porting. ScaleMP* contribution --------------------- Stack padding to improve the performance in their environment where cross-node coherency is managed at the page level. Redesign of wait and release code --------------------------------- The code is simplified and performance improved. Bug Fixes --------- *Fixes for Windows multiple processor groups. *Fix Fortran module build on Linux: offload attribute added. *Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen. *Fix an inconsistent error message for KMP_PLACE_THREADS environment variable. llvm-svn: 219214
* Add support for FreeBSDAlp Toker2014-02-281-7/+2
| | | | | | | | | | | | | | | Port the OpenMP runtime to FreeBSD along with associated build system changes. Also begin to generalize affinity capabilities so they aren't tied explicitly to Windows and Linux. The port builds with stock clang and gmake and has no additional runtime dependencies. All but a handful of the validation suite tests are now passing on FreeBSD 10 x86_64. llvm-svn: 202478
* Fix typosAlp Toker2014-02-241-5/+5
| | | | llvm-svn: 202018
* For your Christmas hacking pleasure.Jim Cownie2013-12-231-7/+18
| | | | | | | | | | | | | | | | | This release use aligns with Intel(r) Composer XE 2013 SP1 Product Update 2 New features * The library can now be built with clang (though wiht some limitations since clang does not support 128 bit floats) * Support for Vtune analysis of load imbalance * Code contribution from Steven Noonan to build the runtime for ARM* architecture processors * First implementation of runtime API for OpenMP cancellation Bug Fixes * Fixed hang on Windows (only) when using KMP_BLOCKTIME=0 llvm-svn: 197914
* First attempt to import OpenMP runtimeJim Cownie2013-09-271-0/+4540
llvm-svn: 191506
OpenPOWER on IntegriCloud