| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: jdoerfert, Jim
Reviewed By: Jim
Subscribers: Jim, mgorny, guansong, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D72285
|
|
|
|
|
|
| |
Submitted by: kiszk
Differential Revision: https://reviews.llvm.org/D72171
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The termination function duplicated the functionality of the
__attribute((destructor))-annotated function __kmp_internal_end_fini,
and we have no indication that this doesn't work.
The function might cause issues with link-time optimization turned on:
until very recently, none of the usual linkers was reporting functions
named in -Wl,-fini as used to the LTO plugin, so it might be dropped.
If the function is dropped, -Wl,-fini=__kmp_internal_end_fini doesn't
do what we want: with ld.bfd and lld it drops the FINI attribute from
.dynamic and with gold we get FINI = 0x0, which leads to a crash on
cleanup. This can be reproduced by building with
-DLLVM_ENABLE_PROJECTS="clang;openmp" \
-DLLVM_ENABLE_LTO=Thin \
-DLLVM_USE_LINKER=gold
The issue in lld has been fixed in f95273f75aa, but gold remains without
fix so far.
Fixes PR43927.
Reviewers: JonChesterfield, jdoerfert, AndreyChurbanov
Reviewed By: AndreyChurbanov
Differential Revision: https://reviews.llvm.org/D69927
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: chandlerc, jlpeyton, jdoerfert, dim
Reviewed-By: dim
Differential Revision: https://reviews.llvm.org/D68580
llvm-svn: 374118
|
|
|
|
|
|
|
|
| |
Patch by viroulep (Philippe Virouleau)
Differential Revision: https://reviews.llvm.org/D67447
llvm-svn: 372879
|
|
|
|
|
|
|
|
|
|
| |
Remove all older OMP spec versioning from the runtime and build system.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D64534
llvm-svn: 365963
|
|
|
|
| |
llvm-svn: 365642
|
|
|
|
|
|
|
|
| |
Patch by viroulep (Philippe Virouleau)
Differential Revision: https://reviews.llvm.org/D63196
llvm-svn: 364934
|
|
|
|
|
|
|
|
| |
Patch by Alex Duran
Differential Revision: https://reviews.llvm.org/D62485
llvm-svn: 363799
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add the target task allocation function to the interface.
Reviewers: ABataev, AlexEichenberger, caomhin, jlpeyton, AndreyChurbanov, RaviNarayanaswamy, hbae
Reviewed By: AlexEichenberger, hbae
Subscribers: hbae, RaviNarayanaswamy, cfe-commits, Hahnfeld, guansong, jdoerfert, openmp-commits
Tags: #openmp, #clang
Differential Revision: https://reviews.llvm.org/D63010
llvm-svn: 363449
|
|
|
|
|
|
|
|
| |
Currently implemented only for non-Windows 64-bit platforms.
Differential Revision: https://reviews.llvm.org/D62488
llvm-svn: 362618
|
|
|
|
|
|
|
|
|
|
| |
Made type of depth of hwloc object to correapond with
change from unsigned in hwloc 1,x to int in hwloc 2.x.
This eliminates the warning on signed-unsigned comparison.
Differential Revision: https://reviews.llvm.org/D62332
llvm-svn: 362401
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added synchronization for possible concurrent initialization of mutexes
by multiple threads. The need of synchronization caused by commit r357927
which added the use of mutexes at threads movement to/from common pool
(earlier the mutexes were used only at suspend/resume).
Patch by Johnny Peyton.
Differential Revision: https://reviews.llvm.org/D61995
llvm-svn: 360919
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
To be able to successfully build OpenMP on FreeBSD/i386, which still
uses i486 as its default processor, I had to provide wrappers for the
`__kmp_load_mxcsr` and `__kmp_store_mxcsr` functions.
If the compiler signals that SSE is not available, loading and storing
mxcsr does not make sense anway, so in that case the inline functions
are empty. This gives the minimum amount of code churn.
See also https://svnweb.freebsd.org/changeset/base/345283
Reviewers: emaste, jlpeyton, Hahnfeld
Reviewed By: jlpeyton
Subscribers: hfinkel, krytarowski, jdoerfert, openmp-commits, llvm-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60916
llvm-svn: 360062
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implemented task modifier in two versions - one without taking into account
omp_orig variable (the omp_orig still can be processed by compiler without help
of the library, but each reduction object will need separate initializer with
global access to omp_orig), another with omp_orig variable included into
interface (single initializer can be used for multiple reduction objects of
the same type). Second version can be used when the omp_orig is not globally
accessible, or to optimize code in case of multiple reduction objects
of the same type.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D60976
llvm-svn: 359710
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds:
* New omp_sched_monotonic flag to omp_sched_t which is handled within the runtime
* Parsing of monotonic/nonmonotonic in OMP_SCHEDULE
* Tests for the monotonic flag and envirable parsing
* Logic to force monotonic when hierarchical scheduling is used
Differential Revision: https://reviews.llvm.org/D60979
llvm-svn: 359601
|
|
|
|
|
|
|
|
|
|
| |
This change replaces some of the assembly functions in z_Linux_asm.S
for inline asm in kmp.h. This allows better interaction with compiler
tools and sanitizers.
Differential Revision: https://reviews.llvm.org/D60423
llvm-svn: 358438
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Replace HBWMALLOC API with more general MEMKIND API, new functions
and variables added.
* Have libmemkind.so loaded when accessible.
* Redirect memspaces to default one except for high bandwidth which
is processed separately.
* Ignore some allocator traits e.g., sync_hint, access, pinned, while
others are processed normally e.g., alignment, pool_size, fallback,
fb_data, partition.
* Add tests for memory management
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D59783
llvm-svn: 357929
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch cleans up the bookkeeping code for the load balancing dynamic mode.
When a thread is moved to or from the thread pool, the th_active_in_pool flag
and the __kmp_thread_pool_active_nth global counter are both updated. This
removes the need for the corrective code in the main wait loop. Another global
counter, __kmp_thread_pool_nth, was removed completely, as it was only used for
debugging, but was not under KMP_DEBUG.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D59508
llvm-svn: 357927
|
|
|
|
|
|
|
|
|
|
| |
Remove very old, unused, and deprecated taskq code.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D58989
llvm-svn: 356288
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nest-var, OMP_NESTED, omp_set_nested()., and omp_get_nested() have been
deprecated in the 5.0 spec. Initial nesting info is now derived from
OMP_MAX_ACTIVE_LEVELS, OMP_NUM_THREADS, and OMP_PROC_BIND.
This patch deprecates the internal ICV that corresponds to nest-var, and
replaces it with the max-active-levels-var ICV to determine nesting. The
change still allows for use of OMP_NESTED (according to 5.0 changes),
omp_get_nested, and omp_set_nested, which have had deprecation messages
added to them. The change allows certain settings of OMP_NUM_THREADS,
OMP_PROC_BIND, and OMP_MAX_ACTIVE_LEVELS to turn on nesting, but
OMP_NESTED=0 will still force nesting to be off.
The runtime now prints informative messages about deprecation of
OMP_NESTED, omp_set_nested(), and omp_get_nested(), when those
environment variables or routines are used. It also prints deprecated
message in output for KMP_SETTINGS and OMP_DISPLAY_ENV for OMP_NESTED.
This patch also fixes OMP_DISPLAY_ENV output for OMP_TARGET_OFFLOAD.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D58408
llvm-svn: 355138
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch cleans up the yielding code and makes it optional. An
environment variable, KMP_USE_YIELD, was added. Yielding is still
on by default (KMP_USE_YIELD=1), but can be turned off completely
(KMP_USE_YIELD=0), or turned on only when oversubscription is detected
(KMP_USE_YIELD=2). Note that oversubscription cannot always be detected
by the runtime (for example, when the runtime is initialized and the
process forks, oversubscription cannot be detected currently over
multiple instances of the runtime).
Because yielding can be controlled by user now, the library mode
settings (from KMP_LIBRARY) for throughput and turnaround have been
adjusted by altering blocktime, unless that was also explicitly set.
In the original code, there were a number of places where a double yield
might have been done under oversubscription. This version checks
oversubscription and if that's not going to yield, then it does
the spin check.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D58148
llvm-svn: 355120
|
|
|
|
|
|
|
|
|
|
|
| |
Remove fatal error messages from the cancellation API for GOMP
Add __kmp_barrier_gomp_cancel() to implement cancellation of parallel regions.
This new function uses the linear barrier algorithm with a cancellable
nonsleepable wait loop.
Differential Revision: https://reviews.llvm.org/D57969
llvm-svn: 354367
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The thread-limit-var and omp_get_thread_limit API was not perfectly handled for
teams construct. Now, when modified by thread_limit clause, omp_get_thread_limit
reports the correct value. In addition, the value is restored when leaving the
teams construct to what it was in the encountering context.
This is done partly by creating the notion of a Contention Group root (CG root)
that keeps track of the thread at the root of each separate CG, the
thread-limit-var associated with the CG, and associated counter of active
threads within the contention group.
thread-limits are passed from master to worker threads via an entry in the ICV
data structure. When a "contention group switch" occurs, a new CG root record is
made and passed from master to worker. A thread could potentially have several
CG root records if it encounters multiple nested teams constructs (but at the
moment the spec doesn't allow for nested teams, so the most one could have
currently is 2). The master of the teams masters gets the thread-limit clause
value stored to its local ICV structure, and the other teams masters copy it
from the master. The thread-limit is set from that ICV copy and restored to the
ICV copy when entering and leaving the teams construct.
This change also fixes a bug when the top-level teams construct team gets
reused, and OMP_DYNAMIC was true, which can cause the expected size of this team
to be smaller than what was actually allocated. The fix updates the size of the
team after its threads were reserved.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D56804
llvm-svn: 353747
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license. These used slightly different spellings that
defeated my regular expressions.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351648
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add omp_pause_resource and omp_pause_resource_all API and enum, plus stub for
internal implementation. Implemented callable helper function to do local pause,
and added basic functionality for hard and soft pause.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D55078
llvm-svn: 351372
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Two things:
1. Those two variables had the wrong sigdness, which was resulting in "sign mismatch in comparison" warning.
2. The whole `kmp_debugger.cpp` wasn't being built, or rather, it was being built as-if `USE_DEBUGGER` was off,
thus, nothing provided the definition of `__kmp_omp_debug_struct_info`, `__kmp_debugging`.
Makes sense, because `USE_DEBUGGER` is set in `kmp_config.h`, which is not included explicitly.
It is included by `kmp.h`, but that one is only included inside of the `#if USE_DEBUGGER` block..
I *think* this is the only source file with this issue,
everything else seem to `#include` either `kmp.h` or `kmp_config.h`.
The alternative solution would be to add `add_compile_options(-include kmp_config.h)` in CMake.
I did verify that `__kmp_omp_debug_struct_info` becomes available with this patch.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=38612 | PR38612 ]].
Reviewers: AndreyChurbanov, jlpeyton, Hahnfeld
Reviewed By: jlpeyton
Subscribers: guansong, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D55783
llvm-svn: 351019
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add omp_get_device_num() function for 5.0 which returns the number of the
device the current thread is running on. Currently, we are leaving it to the
compiler to handle this properly if it is called inside target.
Also, did some cleanup and updating of duplicate device API functions (in both
libomp and libomptarget) to make them into weak functions that check for the
symbol from libomptarget, and will call the version in libomptarget if it is
present. If any additional device API functions are implemented also in
libomptarget in the future, we should add the dlsym calls to the host functions.
Also, if the omp_target_* functions are to be implemented for the host (this has
been requested), they should attempt to call the libomptarget versions as well.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D55578
llvm-svn: 350352
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the affinity format functionality introduced in OpenMP 5.0.
This patch adds: Two new environment variables:
OMP_DISPLAY_AFFINITY=TRUE|FALSE
OMP_AFFINITY_FORMAT=<string>
and Four new API:
1) omp_set_affinity_format()
2) omp_get_affinity_format()
3) omp_display_affinity()
4) omp_capture_affinity()
The affinity format functionality has two ICV's associated with it:
affinity-display-var (bool) and affinity-format-var (string).
The affinity-display-var enables/disables the functionality through the
envirable OMP_DISPLAY_AFFINITY. The affinity-format-var is a formatted
string with the special field types beginning with a '%' character
similar to printf
For example, the affinity-format-var could be:
"OMP: host:%H pid:%P OStid:%i num_threads:%N thread_num:%n affinity:{%A}"
The affinity-format-var is displayed by every thread implicitly at the beginning
of a parallel region when any thread's affinity has changed (including a brand
new thread being spawned), or explicitly using the omp_display_affinity() API.
The omp_capture_affinity() function can capture the affinity-format-var in a
char buffer. And omp_set|get_affinity_format() allow the user to set|get the
affinity-format-var explicitly at runtime. omp_capture_affinity() and
omp_get_affinity_format() both return the number of characters needed to hold
the entire string it tried to make (not including NULL character). If not
enough buffer space is available,
both these functions truncate their output.
Differential Revision: https://reviews.llvm.org/D55148
llvm-svn: 349089
|
|
|
|
|
|
|
|
| |
Patch by Peiyuan Song <squallatf@gmail.com>
Differential Revision: https://reviews.llvm.org/D53422
llvm-svn: 348756
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch permits OpenMP to build and work (with both gcc and clang) on OpenBSD. It mostly follows what was done for FreeBSD and NetBSD, except OpenBSD does not have pthread_getattr_np support, so it follows OS X in that one instance.
Reviewers: #openmp, krytarowski
Reviewed By: krytarowski
Subscribers: guansong, jfb, emaste, mgorny, krytarowski, #openmp
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D34280
llvm-svn: 348726
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Additions mostly follow FreeBSD and NetBSD and are not intrusive.
There is similar patch for OpenBSD: https://reviews.llvm.org/D34280
The -lm was being omitted due to -Wl,--as-needed in cmake rule, similar patch is in freebsd-ports/devel/llvm-devel port.
Simple OpenMP programs compile and work as expected:
$ clang-devel ~/omp_hello.c -fopenmp -I/usr/local/llvm-devel/include
$ LD_LIBRARY_PATH=/usr/local/llvm-devel/lib OMP_NUM_THREADS=100 ./a.out
The assertion in LLVMgold.so when -fopenmp was used together with -flto in 20170524 snapshot is no longer triggered on current svn-trunk and works fine as in llvm-4.0 with our local patches.
Reviewers: #openmp, krytarowski
Reviewed By: krytarowski
Subscribers: dexonsmith, jfb, krytarowski, guansong, gregrodgers, emaste, mgorny, mehdi_amini
Differential Revision: https://reviews.llvm.org/D35129
llvm-svn: 348725
|
|
|
|
|
|
|
| |
There is a conflict between libomptarget and libomp concerning some of the
standard OpenMP device API which needs further intestigation.
llvm-svn: 347932
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds __kmpc_omp_reg_task_with_affinity to register affinity
information for tasks. For now, the affinity information is not used,
and the function always succeeds. This also adds the kmp_task_affinity_info_t
structure to store the task affinity information.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D55026
llvm-svn: 347907
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add omp_get_device_num() function for 5.0 which returns the number of the device
the current thread is running on. Also, did some cleanup and updating of device
API functions to make them into weak functions that should be replaced with
libomptarget functions when libomptarget is present.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D54342
llvm-svn: 347799
|
|
|
|
|
|
|
|
| |
Patch by samuel.thibault@ens-lyon.org
Differential Revision: https://reviews.llvm.org/D54079
llvm-svn: 346310
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D53380
llvm-svn: 346307
|
|
|
|
|
|
|
|
|
| |
This patch puts the __kmpc_critical_with_hint function in dllexports
and also replaces some OMP_45_ENABLED to OMP_50_ENABLED
Differential Revision: https://reviews.llvm.org/D52380
llvm-svn: 343143
|
|
|
|
|
|
|
|
|
|
|
| |
Balanced affinity only updated the thread's affinity with the operating system.
This change also has the thread's private mask reflect that change as well so
that any API that probes the thread's affinity mask will report the correct
mask value.
Differential Revision: https://reviews.llvm.org/D52379
llvm-svn: 343142
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change improves the performance of 376.kdtree by giving the compiler an
opportunity to do inlining and other optimizations for the call path,
__kmpc_omp_task_complete_if0()->__kmp_task_finish(), which is one of the hot
paths in the program; some functions in kmp_taskdeps.cpp were moved to the new
header file, kmp_taskdeps.h to achieve this.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D51889
llvm-svn: 343138
|
|
|
|
|
|
|
|
|
|
|
| |
Add atomic hint flags to the enum.
The hint parameter type was changed to uint32_t in __kmpc_critical_with_hint()
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D51235
llvm-svn: 341694
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ident flags reserved for atomic hints.
This patch adds omp_sync_hint_t to omp.h and omp_sync_hint_kind to omp_lib.h.
For better maintainability the list of macros for ident flags was replaced with
a enum. The new KMP_IDENT_ATOMIC_HINT_MASK was added to the enum to
support possible future atomic hints.
Also fix omp_lib.h.var to be under 72 chars again after 5.0 OpenMP Memory commit
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D51233
llvm-svn: 341693
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implemented omp_alloc, omp_free, omp_{set,get}_default_allocator entries,
and OMP_ALLOCATOR environment variable.
Added support for HBW memory on Linux if libmemkind.so library is accessible
(dynamic library only, no support for static libraries).
Only used stable API (hbwmalloc) of the memkind library
though we may consider using experimental API in future.
The ICV def-allocator-var is implemented per implicit task similar to
place-partition-var. In the absence of a requested allocator, the uses the
default allocator.
Predefined allocators (the only ones currently available) are made similar
for C and Fortran, - pointers (long integers) with values 1 to 8.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D51232
llvm-svn: 341687
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch cleans up unused functions, variables, sign compare issues, and
addresses some -Warning flags which are now enabled including -Wcast-qual.
Not all the warning flags in LibompHandleFlags.cmake are enabled, but some
are with this patch.
Some __kmp_gtid_from_* macros in kmp.h are switched to static inline functions
which allows us to remove the awkward definition of KMP_DEBUG_ASSERT() and
KMP_ASSERT() macros which used the comma operator. This had to be done for the
innumerable -Wunused-value warnings related to KMP_DEBUG_ASSERT()
Differential Revision: https://reviews.llvm.org/D49105
llvm-svn: 339393
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change introduces GOMP doacross compatibility. There are 12 new interface
functions 6 for long type and 6 for unsigned long long type:
GOMP_doacross_post, GOMP_doacross_wait, GOMP_loop_doacross_[schedule]_start
where schedule can be static, dynamic, guided, or runtime.
These functions just translate the parameters if necessary and send them
to the corresponding kmp function.
E.g., GOMP_doacross_post() -> __kmpc_doacross_post()
For the GOMP_doacross_post function, there is template specialization to
account for when long is a four byte vs an eight byte type. If it is a
four byte type, then a temporary array has to be created to convert the
four byte integers into eight byte integers and then sending that into
__kmpc_doacross_post(). Because GOMP_doacross_wait uses varargs, it
always needs a temporary array and does not need template specialization.
Differential Revision: https://reviews.llvm.org/D49857
llvm-svn: 338280
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change fixes build errors when building a runtime with adaptive lock stats
enabled. Most of the errors were due to the recent changes in the runtime, but
it seems that we have not tried to build this debug runtime on Windows for a
long time.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D49823
llvm-svn: 338277
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function was not enabled by default and not exported when manually
tweaking the build flags. Additionally it was hard to use since there
is no corresponding __kmp_ft_page_free().
The code itself is questionable because the returned memory address
is padded by an extra pointer which stores the unpadded start of the
allocated region (this would need to be freed).
Differential Revision: https://reviews.llvm.org/D49802
llvm-svn: 338052
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change fixes possibly invalid access to the internal data structure during
library shutdown. In a heavily oversubscribed situation, the library shutdown
sequence can reach the point where resources are deallocated while there still
exist threads in their final spinning loop. The added loop in
__kmp_internal_end() checks if there are such busy-waiting threads and blocks
the shutdown sequence if that is the case. Two versions of kmp_wait_template()
are now used to minimize performance impact.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D49452
llvm-svn: 337486
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces the logic implementing hierarchical scheduling.
First and foremost, hierarchical scheduling is off by default
To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage.
This work is based off if the IWOMP paper:
"Workstealing and Nested Parallelism in SMP Systems"
Hierarchical scheduling is the layering of OpenMP schedules for different layers
of the memory hierarchy. One can have multiple layers between the threads and
the global iterations space. The threads will go up the hierarchy to grab
iterations, using possibly a different schedule & chunk for each layer.
[ Global iteration space (0-999) ]
(use static)
[ L1 | L1 | L1 | L1 ]
(use dynamic,1)
[ T0 T1 | T2 T3 | T4 T5 | T6 T7 ]
In the example shown above, there are 8 threads and 4 L1 caches begin targeted.
If the topology indicates that there are two threads per core, then two
consecutive threads will share the data of one L1 cache unit. This example
would have the iteration space (0-999) split statically across the four L1
caches (so the first L1 would get (0-249), the second would get (250-499), etc).
Then the threads will use a dynamic,1 schedule to grab iterations from the L1
cache units. There are currently four supported layers: L1, L2, L3, NUMA
OMP_SCHEDULE can now read a hierarchical schedule with this syntax:
OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK
And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before
I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h
to try to keep it separate from the rest of the code.
Differential Revision: https://reviews.llvm.org/D47962
llvm-svn: 336571
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reorganizes the loop scheduling code in order to allow hierarchical
scheduling to use it more effectively. In particular, the goal of this patch
is to separate the algorithmic parts of the scheduling from the thread
logistics code.
Moves declarations & structures to kmp_dispatch.h for easier access in
other files. Extracts the algorithmic part of __kmp_dispatch_init() and
__kmp_dispatch_next() into __kmp_dispatch_init_algorithm() and
__kmp_dispatch_next_algorithm(). The thread bookkeeping logic is still kept in
__kmp_dispatch_init() and __kmp_dispatch_next(). This is done because the
hierarchical scheduler needs to access the scheduling logic without the
bookkeeping logic. To prepare for new pointer in dispatch_private_info_t, a
new flags variable is created which stores the ordered and nomerge flags instead
of them being in two separate variables. This will keep the
dispatch_private_info_t structure the same size.
Differential Revision: https://reviews.llvm.org/D47961
llvm-svn: 336568
|