| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: jdoerfert, Jim
Reviewed By: Jim
Subscribers: Jim, mgorny, guansong, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D72285
|
|
|
|
|
|
| |
Submitted by: kiszk
Differential Revision: https://reviews.llvm.org/D72171
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nest-var, OMP_NESTED, omp_set_nested()., and omp_get_nested() have been
deprecated in the 5.0 spec. Initial nesting info is now derived from
OMP_MAX_ACTIVE_LEVELS, OMP_NUM_THREADS, and OMP_PROC_BIND.
This patch deprecates the internal ICV that corresponds to nest-var, and
replaces it with the max-active-levels-var ICV to determine nesting. The
change still allows for use of OMP_NESTED (according to 5.0 changes),
omp_get_nested, and omp_set_nested, which have had deprecation messages
added to them. The change allows certain settings of OMP_NUM_THREADS,
OMP_PROC_BIND, and OMP_MAX_ACTIVE_LEVELS to turn on nesting, but
OMP_NESTED=0 will still force nesting to be off.
The runtime now prints informative messages about deprecation of
OMP_NESTED, omp_set_nested(), and omp_get_nested(), when those
environment variables or routines are used. It also prints deprecated
message in output for KMP_SETTINGS and OMP_DISPLAY_ENV for OMP_NESTED.
This patch also fixes OMP_DISPLAY_ENV output for OMP_TARGET_OFFLOAD.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D58408
llvm-svn: 355138
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license. These used slightly different spellings that
defeated my regular expressions.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351648
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the affinity format functionality introduced in OpenMP 5.0.
This patch adds: Two new environment variables:
OMP_DISPLAY_AFFINITY=TRUE|FALSE
OMP_AFFINITY_FORMAT=<string>
and Four new API:
1) omp_set_affinity_format()
2) omp_get_affinity_format()
3) omp_display_affinity()
4) omp_capture_affinity()
The affinity format functionality has two ICV's associated with it:
affinity-display-var (bool) and affinity-format-var (string).
The affinity-display-var enables/disables the functionality through the
envirable OMP_DISPLAY_AFFINITY. The affinity-format-var is a formatted
string with the special field types beginning with a '%' character
similar to printf
For example, the affinity-format-var could be:
"OMP: host:%H pid:%P OStid:%i num_threads:%N thread_num:%n affinity:{%A}"
The affinity-format-var is displayed by every thread implicitly at the beginning
of a parallel region when any thread's affinity has changed (including a brand
new thread being spawned), or explicitly using the omp_display_affinity() API.
The omp_capture_affinity() function can capture the affinity-format-var in a
char buffer. And omp_set|get_affinity_format() allow the user to set|get the
affinity-format-var explicitly at runtime. omp_capture_affinity() and
omp_get_affinity_format() both return the number of characters needed to hold
the entire string it tried to make (not including NULL character). If not
enough buffer space is available,
both these functions truncate their output.
Differential Revision: https://reviews.llvm.org/D55148
llvm-svn: 349089
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implemented omp_alloc, omp_free, omp_{set,get}_default_allocator entries,
and OMP_ALLOCATOR environment variable.
Added support for HBW memory on Linux if libmemkind.so library is accessible
(dynamic library only, no support for static libraries).
Only used stable API (hbwmalloc) of the memkind library
though we may consider using experimental API in future.
The ICV def-allocator-var is implemented per implicit task similar to
place-partition-var. In the absence of a requested allocator, the uses the
default allocator.
Predefined allocators (the only ones currently available) are made similar
for C and Fortran, - pointers (long integers) with values 1 to 8.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D51232
llvm-svn: 341687
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces the logic implementing hierarchical scheduling.
First and foremost, hierarchical scheduling is off by default
To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage.
This work is based off if the IWOMP paper:
"Workstealing and Nested Parallelism in SMP Systems"
Hierarchical scheduling is the layering of OpenMP schedules for different layers
of the memory hierarchy. One can have multiple layers between the threads and
the global iterations space. The threads will go up the hierarchy to grab
iterations, using possibly a different schedule & chunk for each layer.
[ Global iteration space (0-999) ]
(use static)
[ L1 | L1 | L1 | L1 ]
(use dynamic,1)
[ T0 T1 | T2 T3 | T4 T5 | T6 T7 ]
In the example shown above, there are 8 threads and 4 L1 caches begin targeted.
If the topology indicates that there are two threads per core, then two
consecutive threads will share the data of one L1 cache unit. This example
would have the iteration space (0-999) split statically across the four L1
caches (so the first L1 would get (0-249), the second would get (250-499), etc).
Then the threads will use a dynamic,1 schedule to grab iterations from the L1
cache units. There are currently four supported layers: L1, L2, L3, NUMA
OMP_SCHEDULE can now read a hierarchical schedule with this syntax:
OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK
And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before
I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h
to try to keep it separate from the rest of the code.
Differential Revision: https://reviews.llvm.org/D47962
llvm-svn: 336571
|
|
|
|
| |
llvm-svn: 321827
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The format string for hints only prints the second argument (string) and drops
the first argument (hint id). Depending on how you read the POSIX text for
printf, this could be valid. But for practical reason, i.e., unpacking the
va_list passed to printf based on the formating information, it makes sense
to fix the implementation and not pass the id for hint.
Failing testcases were:
misc_bugs/teams-reduction.c
ompt/parallel/not_enough_threads.c
Differential Revision: https://reviews.llvm.org/D41504
llvm-svn: 321361
|
|
|
|
|
|
|
|
| |
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D40309
llvm-svn: 319422
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added two warnings:
1) Before building the topology map check if tiles are requested but the
topo method is not hwloc;
2) After building the topology map check if tiles are requested but not
detected by the library.
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D40340
llvm-svn: 319374
|
|
|
|
|
|
|
|
|
|
|
|
| |
For up-to-date compilers, this assertion is reasonable, but it breaks
compatibility with the typical compiler installed on most systems.
This patch changes the default value to what we had when there was no
compiler support. A warning about the outdated compiler is printed during
runtime, when this point is reached.
Differential Revision: https://reviews.llvm.org/D39890
llvm-svn: 317928
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds a new environment variable, KMP_TEAMS_THREAD_LIMIT, which is
used to set a new global variable, __kmp_teams_max_nth, which is checked when
determining the size and quantity of teams that will be created in the teams
construct. Specifically, it is a limit on the total number of threads in a given
teams construct. It differentiates the limits for the teams construct from the
limits for regular parallel regions (KMP_DEVICE_THREAD_LIMIT/__kmp_max_nth and
OMP_THREAD_LIMIT/__kmp_cg_max_nth). When each individual team is formed, it is
still subject to those limits. After the clauses to the teams construct are
parsed and calculated, we check to make sure we are within this limit, and if
not, reduce num_threads per team and/or number of teams, accordingly. The
default value is set to the number of available processors on the system.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D36009
llvm-svn: 309874
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change fixes the implementation of OMP_THREAD_LIMIT. The implementation of
this previously was not restricted to a contention group (but it should be,
according to the spec), and this is fixed here. A field is added to root thread
to store a counter of the threads in the contention group. An extra check is
added when reserving threads for a parallel region that checks this variable and
compares to threadlimit-var, which is implemented as a new global variable,
kmp_cg_max_nth. Associated settings changes were also made, and clean up of
comments that referred to OMP_THREAD_LIMIT, but should refer to the new
KMP_DEVICE_THREAD_LIMIT (added in an earlier patch).
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D35912
llvm-svn: 309319
|
|
|
|
|
|
|
|
|
|
|
| |
Address user message bug where the messages were sending users to Intel's
website instead of the LLVM OpenMP runtime websites.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=32892
Differential Revision: https://reviews.llvm.org/D35018
llvm-svn: 307206
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D31600
llvm-svn: 300220
|
|
|
|
|
|
|
|
| |
Patch by Vishakha Agrawal
Differential Revision: https://reviews.llvm.org/D28873
llvm-svn: 293315
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case atomic reduction method is not available (the compiler can't generate
it) the assertion failure occurred if KMP_FORCE_REDUCTION=atomic was specified.
This change replaces the assertion with a warning and sets the reduction method
to the default one - 'critical'.
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D23990
llvm-svn: 280519
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deprecate KMP_PLACE_THREADS and rename it to KMP_HW_SUBSET due to confusion
about its purpose and function among users. KMP_HW_SUBSET is an environment
variable which allows users to easily pick a subset of the hardware topology to
use. e.g., KMP_HW_SUBSET=30c,2t means use 30 cores, 2 threads per core.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21340
llvm-svn: 272937
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The problem is the lack of dispatch buffers when thousands of loops with nowait,
about 10 iterations each, are executed by hundreds of threads. We only have
built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
buffers.
The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
to give users same possibility I changed build-time control into run-time one,
adding API just in case.
This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
function kmp_set_disp_num_buffers(int num_buffers).
The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
because during the serial initialization we already allocate buffers for the hot
team, so it is too late to change the number of buffers later (or we need to
reallocate buffers for all teams which sounds too complicated). The
kmp_set_defaults() routine does not work for this envirable, because it calls
serial initialization before reading the parameter string. So a new routine,
kmp_set_disp_num_buffers(), is created so that it can set our internal global
variable before the library initialization. If both the envirable and API used
the envirable wins.
Differential Revision: http://reviews.llvm.org/D20697
llvm-svn: 271318
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These changes allow libhwloc to be used as the topology discovery/affinity
mechanism for libomp. It is supported on Unices. The code additions:
* Canonicalize KMP_CPU_* interface macros so bitmask operations are
implementation independent and work with both hwloc bitmaps and libomp
bitmaps. So there are new KMP_CPU_ALLOC_* and KMP_CPU_ITERATE() macros and
the like. These are all in kmp.h and appropriately placed.
* Hwloc topology discovery code in kmp_affinity.cpp. This uses the hwloc
interface to create a libomp address2os object which the rest of libomp knows
how to handle already.
* To build, use -DLIBOMP_USE_HWLOC=on and
-DLIBOMP_HWLOC_INSTALL_DIR=/path/to/install/dir [default /usr/local]. If CMake
can't find the library or hwloc.h, then it will tell you and exit.
Differential Revision: http://reviews.llvm.org/D13991
llvm-svn: 254320
|
|
|
|
|
|
|
|
|
|
|
|
| |
Moved '@' from delimiters to offset designators for the KMP_PLACE_THREADS
environment variable. Only one of: postfix "o" or prefix @, should be used
in the value of KMP_PLACE_THREADS. For example, '2s@2,4c@2,1t'. This is also
the format of KMP_SETTINGS=1 output now (removed "o" from there).
e.g., 2s,2o,4c,2o,1t.
Differential Revision: http://reviews.llvm.org/D13701
llvm-svn: 250846
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added (optional) sockets to the syntax of the KMP_PLACE_THREADS environment variable.
Some limitations:
* The number of sockets and then optional offset should be specified first (before other parameters).
* The letter designation is mandatory for sockets and then for other parameters.
* If number of cores is specified first, then the number of sockets is defaulted to all sockets on the machine; also, the old syntax is partially supported if sockets are skipped.
* If number of threads per core is specified first, then the number of sockets and cores per socket are defaulted to all sockets and all cores per socket respectively.
* The number of cores per socket cannot be specified before sockets or after threads per core.
* The number of threads per core can be specified before or after core-offset (old syntax required it to be before core-offset);
* Parameters delimiter can be: empty, comma, lower-case x;
* Spaces are allowed around numbers, around letters, around delimiter.
Approximate shorthand specification:
KMP_PLACE_THREADS="[num_sockets(S|s)[[delim]offset(O|o)][delim]][num_cores_per_socket(C|c)[[delim]offset(O|o)][delim]][num_threads_per_core(T|t)]"
Differential Revision: http://reviews.llvm.org/D13175
llvm-svn: 249708
|
|
|
|
|
|
|
|
|
| |
Some old references to RML and IOMP which aren't used anywhere are deleted.
http://lists.cs.uiuc.edu/pipermail/openmp-dev/2015-June/000664.html
Patch by Jack Howarth and Jonathan Peyton
llvm-svn: 238878
|
|
|
|
| |
llvm-svn: 231778
|
|
|
|
| |
llvm-svn: 230035
|
|
|
|
|
|
| |
all the source files.
llvm-svn: 227207
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
understand that this is not friendly, and are working to change our
internal code-development to make it easier to make development
features available more frequently and in finer (more functional)
chunks. Unfortunately we haven't got that in place yet, and unpicking
this into multiple separate check-ins would be non-trivial, so please
bear with me on this one. We should be better in the future.
Apologies over, what do we have here?
GGC 4.9 compatibility
--------------------
* We have implemented the new entrypoints used by code compiled by GCC
4.9 to implement the same functionality in gcc 4.8. Therefore code
compiled with gcc 4.9 that used to work will continue to do so.
However, there are some other new entrypoints (associated with task
cancellation) which are not implemented. Therefore user code compiled
by gcc 4.9 that uses these new features will not link against the LLVM
runtime. (It remains unclear how to handle those entrypoints, since
the GCC interface has potentially unpleasant performance implications
for join barriers even when cancellation is not used)
--- new parallel entry points ---
new entry points that aren't OpenMP 4.0 related
These are implemented fully :-
GOMP_parallel_loop_dynamic()
GOMP_parallel_loop_guided()
GOMP_parallel_loop_runtime()
GOMP_parallel_loop_static()
GOMP_parallel_sections()
GOMP_parallel()
--- cancellation entry points ---
Currently, these only give a runtime error if OMP_CANCELLATION is true
because our plain barriers don't check for cancellation while waiting
GOMP_barrier_cancel()
GOMP_cancel()
GOMP_cancellation_point()
GOMP_loop_end_cancel()
GOMP_sections_end_cancel()
--- taskgroup entry points ---
These are implemented fully.
GOMP_taskgroup_start()
GOMP_taskgroup_end()
--- target entry points ---
These are empty (as they are in libgomp)
GOMP_target()
GOMP_target_data()
GOMP_target_end_data()
GOMP_target_update()
GOMP_teams()
Improvements in Barriers and Fork/Join
--------------------------------------
* Barrier and fork/join code is now in its own file (which makes it
easier to understand and modify).
* Wait/release code is now templated and in its own file; suspend/resume code is also templated
* There's a new, hierarchical, barrier, which exploits the
cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve
fork/join and barrier performance.
***BEWARE*** the new source files have *not* been added to the legacy
Cmake build system. If you want to use that fixes wil be required.
Statistics Collection Code
--------------------------
* New code has been added to collect application statistics (if this
is enabled at library compile time; by default it is not). The
statistics code itself is generally useful, the lightweight timing
code uses the X86 rdtsc instruction, so will require changes for other
architectures.
The intent of this code is not for users to tune their codes but
rather
1) For timing code-paths inside the runtime
2) For gathering general properties of OpenMP codes to focus attention
on which OpenMP features are most used.
Nested Hot Teams
----------------
* The runtime now maintains more state to reduce the overhead of
creating and destroying inner parallel teams. This improves the
performance of code that repeatedly uses nested parallelism with the
same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL
envirable to a depth to enable this (and, of course, OMP_NESTED=true
to enable nested parallelism at all).
Improved Intel(r) VTune(Tm) Amplifier support
---------------------------------------------
* The runtime provides additional information to Vtune via the
itt_notify interface to allow it to display better OpenMP specific
analyses of load-imbalance.
Support for OpenMP Composite Statements
---------------------------------------
* Implement new entrypoints required by some of the OpenMP 4.1
composite statements.
Improved ifdefs
---------------
* More separation of concepts ("Does this platform do X?") from
platforms ("Are we compiling for platform Y?"), which should simplify
future porting.
ScaleMP* contribution
---------------------
Stack padding to improve the performance in their environment where
cross-node coherency is managed at the page level.
Redesign of wait and release code
---------------------------------
The code is simplified and performance improved.
Bug Fixes
---------
*Fixes for Windows multiple processor groups.
*Fix Fortran module build on Linux: offload attribute added.
*Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen.
*Fix an inconsistent error message for KMP_PLACE_THREADS environment variable.
llvm-svn: 219214
|
|
|
|
| |
llvm-svn: 202018
|
|
llvm-svn: 191506
|