| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When linking with libhwloc, the ORDERED EPCC test slows down on big
machines (> 48 cores). Performance analysis showed that a cache thrash
was occurring and this padding helps alleviate the problem.
Also, inside the main spin-wait loop in kmp_wait_release.h, we can eliminate
the references to the global shared variables by instead creating a local
variable, oversubscribed and instead checking that.
Differential Revision: http://reviews.llvm.org/D22093
llvm-svn: 274894
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows a user to enable Hwloc on windows. There are three main
changes in here:
1.kmp.h - Move definitions/declarations out of KMP_OS_WINDOWS guard (our windows
implementation of affinity) because they need to be defined when
KMP_USE_HWLOC is on as well.
2.teach __kmp_set_system_affinity, __kmp_get_system_affinity,
__kmp_get_proc_group, and __kmp_affinity_bind_thread how to use hwloc.
3.teach CMake how to include hwloc when building Windows
Another minor change in here is to make sure that anything under KMP_USE_HWLOC
is also guarded by KMP_AFFINITY_SUPPORTED as well. This is to prevent Mac
builds from requiring anything from Hwloc.
Differential Revision: http://reviews.llvm.org/D21441
llvm-svn: 272951
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cleanup - unused code removal.
TODO: consider to remove (replace with flag class methods)
also kmp_wait_64 and kmp_release_64 routines.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21332
llvm-svn: 272697
|
|
|
|
|
|
|
|
| |
OpenMP 4.1 is now OpenMP 4.5. Any mention of 41 or 4.1 is replaced with
45 or 4.5. Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that
41 is deprecated and to use 45 instead.
llvm-svn: 272687
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove static specifier from var fullMask and remove kmp_get_fullMask() routine.
When iterating through procs in a mask, always check if proc is in fullMask
(this check was missing in a few places).
Patch by Brian Bliss.
Differential Revision: http://reviews.llvm.org/D21300
llvm-svn: 272589
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bitmask complement operation doesn't consider the max proc id which means
something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a
Linux system even though there aren't 600 processors on said system. This
change has the complement bitmask and-ed with the fullmask so that it will only
contain valid processors.
Differential Revision: http://reviews.llvm.org/D21245
llvm-svn: 272561
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces use of compiler builtin atomics with
C++11 atomics for ticket locks implementation. Ticket locks
are used in critical places of the runtime, e.g. in the tasking
mechanism.
The main reason this change was introduced is the problem
with work stealing function on ARM architecture which suffered
from nasty race condition. It turned out that the root cause of
the problem lies in the way ticket locks are implemented. Changing
compiler builtins into C++11 atomics solves the problem.
Two assertions were added into kmp_tasking.c which are useful
for detecting early symptoms of something wrong going on with
work stealing, which were among the possible outcomes of the
race condition.
Differential Revision: http://reviews.llvm.org/D19878
llvm-svn: 271324
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements the new kmp_sch_static_balanced_chunked schedule kind that
the compiler will generate when it encounters schedule(simd: static). It just
adds the new constant and the new switch case __kmp_for_static_init.
Patch by Alex Duran.
Differential Revision: http://reviews.llvm.org/D20699
llvm-svn: 271320
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an asynchronous offload task is completed, COI calls the runtime to queue
a "destructor task". When the task deques are full, a dead-lock situation
arises where the OpenMP threads are inside but cannot progress because the COI
thread is stuck inside the runtime trying to find a slot in a deque.
This patch implements the solution where the task deques doubled in size when
a task is being queued from a COI thread.
Differential Revision: http://reviews.llvm.org/D20733
llvm-svn: 271319
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The problem is the lack of dispatch buffers when thousands of loops with nowait,
about 10 iterations each, are executed by hundreds of threads. We only have
built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
buffers.
The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
to give users same possibility I changed build-time control into run-time one,
adding API just in case.
This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
function kmp_set_disp_num_buffers(int num_buffers).
The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
because during the serial initialization we already allocate buffers for the hot
team, so it is too late to change the number of buffers later (or we need to
reallocate buffers for all teams which sounds too complicated). The
kmp_set_defaults() routine does not work for this envirable, because it calls
serial initialization before reading the parameter string. So a new routine,
kmp_set_disp_num_buffers(), is created so that it can set our internal global
variable before the library initialization. If both the envirable and API used
the envirable wins.
Differential Revision: http://reviews.llvm.org/D20697
llvm-svn: 271318
|
|
|
|
|
|
|
|
|
|
|
| |
Most of this is modifications to check for differences before updating data
fields in team struct. There is also some rearrangement of the team struct.
Patch by Diego Caballero
Differential Revision: http://reviews.llvm.org/D20487
llvm-svn: 270468
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used
inconsequently throughout LLVM libomp code.
* some .c files that use this define do not include kmp_lock.h file,
in effect guarded part of code are never compiled
* some places in code use architecture-depending preprocessor
logic expressions which effectively disable use of Futex for
AArch64 architecture, all these places should use
'#if KMP_USE_FUTEX' instead to avoid any further confusions
* some places use KMP_HAS_FUTEX which is nowhere defined,
KMP_USE_FUTEX should be used instead
Differential Revision: http://reviews.llvm.org/D19629
llvm-svn: 269642
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds a new entry point,
kmp_aligned_malloc(size_t size, size_t alignment), an entry point corresponding
to kmp_malloc() but with the capability to return aligned memory as well.
Other allocator routines have been adjusted so that kmp_free() can be used for
freeing memory blocks allocated by any kmp_*alloc() routine, including the new
kmp_aligned_malloc() routine.
Differential Revision: http://reviews.llvm.org/D19814
llvm-svn: 269365
|
|
|
|
| |
llvm-svn: 266760
|
|
|
|
|
|
|
|
|
|
|
| |
Introduced a counter of parts of an untied task submitted for execution. The
counter controls whether all parts of the task are already finished. The
compiler should generate re-submission of partially executed untied task by
itself before exiting of each task part except for the lexical last part.
Differential Revision: http://reviews.llvm.org/D19026
llvm-svn: 266675
|
|
|
|
|
|
|
|
|
|
|
|
| |
The problem is that the definition of kmp_cpuinfo_t contains:
char name [3*sizeof (kmp_cpuid_t)]; // CPUID(0x80000002,0x80000003,0x80000004)
and kmp_cpuid_t is only defined when compiling for x86.
Differential Revision: http://reviews.llvm.org/D18245
llvm-svn: 264535
|
|
|
|
|
|
|
|
|
|
| |
This change adds a header to the printout of the statistics which includes the
time, machine name, and processor info if available. This change also includes
some cosmetic changes like using enum casting for timer and counter iteration.
Differential Revision: http://reviews.llvm.org/D18153
llvm-svn: 263580
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the standard: The taskloop construct specifies that the iterations of one
or more associated loops will be executed in parallel using OpenMP tasks. The
iterations are distributed across tasks created by the construct and scheduled
to be executed.
This initial implementation uses a simple linear tasks distribution algorithm.
Later we can add other algorithms to speedup generation of huge number of tasks
(i.e., tree-like tasks generation should be faster).
This needs to be put into the OpenMP runtime library in order for the
compiler team to develop the compiler side of the implementation.
Differential Revision: http://reviews.llvm.org/D17404
llvm-svn: 262535
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the standard: A doacross loop nest is a loop nest that has cross-iteration
dependence. An iteration is dependent on one or more lexicographically earlier
iterations. The ordered clause parameter on a loop directive identifies the
loop(s) associated with the doacross loop nest.
The init/fini routines allocate/free doacross buffer(s) for each loop for each
thread. The wait routine waits for a flag designated by the dependence vector.
The post routine sets the flag designated by current iteration vector. We use
a similar technique of shared buffer indices that covers up to 7 nowait loops
executed simultaneously by different threads (number 7 has no real meaning,
just heuristic value). Also, the size of structures are kept intact via
reducing dummy arrays.
This needs to be put into the OpenMP runtime library in order for the compiler
team to develop the compiler side of the implementation.
Differential Revision: http://reviews.llvm.org/D17399
llvm-svn: 262532
|
|
|
|
|
|
|
|
|
|
|
| |
The maximum task priority value is read from envirable: OMP_MAX_TASK_PRIORITY.
But as of now, nothing is done with it. We just handle the environment variable
and add the new api: omp_get_max_task_priority() which returns that value or
zero if it is not set.
Differential Revision: http://reviews.llvm.org/D17411
llvm-svn: 261908
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The monotonic/non-monotonic flags are sent to the runtime via the sched_type by
setting the 30th (non-monotonic) or 29th (monotonic) bit in the sched_type.
Macros are added to probe if monotonic or non-monotonic is specified
(SCHEDULE_HAS_[NON]MONOTONIC & SCHEDULE_HAS_NO_MODIFIERS)
and also to to get the base sched_type (SCHEDULE_WITHOUT_MODIFIERS)
Currently, nothing is done with the modifiers.
Also, this patch adds some comments on the use of the enumerations in at least
one place where it is subtle.
Differential Revision: http://reviews.llvm.org/D17406
llvm-svn: 261906
|
|
|
|
| |
llvm-svn: 261249
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a target task finishes and it tries to access the th_task_team from the
threads in the team where it was created, th_task_team can be NULL or point to
a different place when that thread started a nested region that is still
running. Finding the exact task_team that the threads were using is difficult
as it would require to unwind the task_state_memo_stack. So a new field was added
in the taskdata structure to point to the active task_team when the task was
created.
llvm-svn: 260615
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In: http://lists.llvm.org/pipermail/openmp-dev/2015-August/000858.html, a
performance issue was found with libomp's task dependencies. The task
dependencies hash table has an issue with collisions. The current table size is
a power of two. This combined with the current hash function causes a large
number of collisions to occurr. Also, the current size (64) is too small for
larger applications so the table size is increased.
This patch creates a two level hash table approach for task dependencies. The
implicit task is considered the "master" or "top-level" task which has a large
static sized hash table (997), and nested tasks will have smaller hash
tables (97). Prime numbers were chosen to help reduce collisions.
Differential Revision: http://reviews.llvm.org/D16640
llvm-svn: 259113
|
|
|
|
| |
llvm-svn: 258984
|
|
|
|
| |
llvm-svn: 256790
|
|
|
|
|
|
| |
The barrier states type doesn't need to be explicitly set.
llvm-svn: 256778
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change set includes all changes to make the code conform to the OMP 4.5 specification:
* Removed hint / hinted_init definitions from include/40 files
* Hint values are powers of 2 to enable composition (4.5 spec)
* Hinted lock initialization functions were renamed (4.5 spec)
kmp_init_lock_hinted -> omp_init_lock_with_hint
kmp_init_nest_lock_hinted -> omp_init_nest_lock_with_hint
* __kmpc_critical_section_with_hint was added to support a critical section with
a hint (4.5 spec)
* __kmp_map_hint_to_lock was added to convert a hint (possibly a composite) to
an internal lock type
* kmpc_init_lock_with_hint and kmpc_init_nest_lock_with_hint were added as
internal entries for the hinted lock initializers. The preivous internal
functions (__kmp_init*) were moved to kmp_csupport.c and reused in multiple
places
* Added the two init functions to dllexports
* KMP_USE_DYNAMIC_LOCK is turned on if OMP_41_ENABLED is turned on
Differential Revision: http://reviews.llvm.org/D15205
llvm-svn: 255376
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These changes allow libhwloc to be used as the topology discovery/affinity
mechanism for libomp. It is supported on Unices. The code additions:
* Canonicalize KMP_CPU_* interface macros so bitmask operations are
implementation independent and work with both hwloc bitmaps and libomp
bitmaps. So there are new KMP_CPU_ALLOC_* and KMP_CPU_ITERATE() macros and
the like. These are all in kmp.h and appropriately placed.
* Hwloc topology discovery code in kmp_affinity.cpp. This uses the hwloc
interface to create a libomp address2os object which the rest of libomp knows
how to handle already.
* To build, use -DLIBOMP_USE_HWLOC=on and
-DLIBOMP_HWLOC_INSTALL_DIR=/path/to/install/dir [default /usr/local]. If CMake
can't find the library or hwloc.h, then it will tell you and exit.
Differential Revision: http://reviews.llvm.org/D13991
llvm-svn: 254320
|
|
|
|
| |
llvm-svn: 252084
|
|
|
|
|
|
|
|
|
|
| |
This is a refactoring of the task_team code that more elegantly handles the two
task_team case. Two task_teams per team are kept in use for the lifetime of the
team. Thus no reference counting is needed.
Differential Revision: http://reviews.llvm.org/D13993
llvm-svn: 252082
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
warnings similar to the following:
runtime/src/kmp_global.c:117:35: warning: implicit conversion from
'unsigned long' to 'int' changes value from 18446744073709551615 to -1
[-Wconstant-conversion]
int __kmp_sys_max_nth = KMP_MAX_NTH;
~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~
runtime/src/kmp.h:849:34: note: expanded from macro 'KMP_MAX_NTH'
# define KMP_MAX_NTH PTHREAD_THREADS_MAX
^~~~~~~~~~~~~~~~~~~
Clamp KMP_MAX_NTH to INT_MAX to avoid these warnings. Also use INT_MAX
whenever PTHREAD_THREADS_MAX is not defined at all.
Differential Revision: http://reviews.llvm.org/D13827
llvm-svn: 250708
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because __kmp_task_init_ompt is called for every initial task in each thread
and always generated task ids, this was a big performance issue on bigger
systems even without any tool attached. After changing the initialization
interface to ompt_tool, we can now rely on already knowing whether a tool is
attached and OMPT is enabled at this point.
Patch by Jonas Hahnfeld
Differential Revision: http://reviews.llvm.org/D13494
llvm-svn: 249855
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added (optional) sockets to the syntax of the KMP_PLACE_THREADS environment variable.
Some limitations:
* The number of sockets and then optional offset should be specified first (before other parameters).
* The letter designation is mandatory for sockets and then for other parameters.
* If number of cores is specified first, then the number of sockets is defaulted to all sockets on the machine; also, the old syntax is partially supported if sockets are skipped.
* If number of threads per core is specified first, then the number of sockets and cores per socket are defaulted to all sockets and all cores per socket respectively.
* The number of cores per socket cannot be specified before sockets or after threads per core.
* The number of threads per core can be specified before or after core-offset (old syntax required it to be before core-offset);
* Parameters delimiter can be: empty, comma, lower-case x;
* Spaces are allowed around numbers, around letters, around delimiter.
Approximate shorthand specification:
KMP_PLACE_THREADS="[num_sockets(S|s)[[delim]offset(O|o)][delim]][num_cores_per_socket(C|c)[[delim]offset(O|o)][delim]][num_threads_per_core(T|t)]"
Differential Revision: http://reviews.llvm.org/D13175
llvm-svn: 249708
|
|
|
|
| |
llvm-svn: 248204
|
|
|
|
|
|
| |
simplify conditional.
llvm-svn: 248199
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of this is improvement to code suggested by Hal Finkel. Four changes here:
1.Cleanup of hierarchy code to handle all hierarchy cases whether affinity is available or not
2.Separated this and other classes and common functions out to a header file
3.Added a destructor-like fini function for the hierarchy (and call in __kmp_cleanup)
4.Remove some redundant code that is hopefully no longer needed
Differential Revision: http://reviews.llvm.org/D12449
llvm-svn: 247326
|
|
|
|
|
|
|
|
|
|
| |
The fix is to make b_arrived flag 64 bit in both structures - kmp_balign_team_t
and kmp_balign_t. Otherwise when flag in kmp_balign_team_t wrapped over
UINT_MAX the library hangs.
Differential Revision: http://reviews.llvm.org/D12563
llvm-svn: 247320
|
|
|
|
|
|
|
|
|
| |
Conditionally include the fork_context parameter to __kmp_join_call()
only if OMPT_SUPPORT=1
Differential Revision: http://reviews.llvm.org/D12495
llvm-svn: 246460
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the libomp CMake build system uses a Perl script to configure files
(tools/expand-vars.pl). This patch replaces the use of the Perl script by using
CMake's configure_file() function. The major changes include:
1. *.var has every $KMP_* variable changed to @LIBOMP_*@
2. kmp_config.h.cmake is a new file which contains all the feature macros and
#cmakedefine lines
3. Most of the -D lines have been moved from LibompDefinitions.cmake but some
OS specific MACROs (e.g., _GNU_SOURCE) remain.
4. All expand-vars.pl related logic is removed from the CMake files.
One important note about this change is that it breaks the old Perl+Makefile
build system because it can't create kmp_config.h properly.
Differential Review: http://reviews.llvm.org/D12211
llvm-svn: 246314
|
|
|
|
|
|
|
| |
This macro and the small amount of code along with it are unused and
can be removed. The macro is never defined in any build script or source file.
llvm-svn: 244899
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes it possible for a performance tool that uses call stack
unwinding to map implementation-level call stacks from master and worker
threads into a unified global view. There are several components to this patch.
include/*/ompt.h.var
Add a new enumeration type that indicates whether the code for a master task
for a parallel region is invoked by the user program or the runtime system
Change the signature for OMPT parallel begin/end callbacks to indicate whether
the master task will be invoked by the program or the runtime system. This
enables a performance tool using call stack unwinding to handle these two
cases differently. For this case, a profiler that uses call stack unwinding
needs to know that the call path prefix for the master task may differ from
those available within the begin/end callbacks if the program invokes the
master.
kmp.h
Change the signature for __kmp_join_call to take an additional parameter
indicating the fork_context type. This is needed to supply the OMPT parallel
end callback with information about whether the compiler or the runtime
invoked the master task for a parallel region.
kmp_csupport.c
Ensure that the OMPT task frame field reenter_runtime_frame is properly set
and cleared before and after calls to fork and join threads for a parallel
region.
Adjust the code for the new signature for __kmp_join_call.
Adjust the OMPT parallel begin callback invocations to carry the extra
parameter indicating whether the program or the runtime invokes the master
task for a parallel region.
kmp_gsupport.c
Apply all of the analogous changes described for kmp_csupport.c for the GOMP
interface
Add OMPT support for the GOMP combined parallel region + loop API to
maintain the OMPT task frame field reenter_runtime_frame.
kmp_runtime.c:
Use the new information passed by __kmp_join_call to adjust the OMPT
parallel end callback invocations to carry the extra parameter indicating
whether the program or the runtime invokes the master task for a parallel
region.
ompt_internal.h:
Use the flavor of the parallel region API (GNU or Intel) to determine who
invokes the master task.
Differential Revision: http://reviews.llvm.org/D11259
llvm-svn: 242817
|
|
|
|
|
|
|
|
|
|
|
|
| |
These changes enable external debuggers to conveniently interface with
the LLVM OpenMP Library. Structures are added which describe the important
internal structures of the OpenMP Library e.g., teams, threads, etc.
This feature is turned on by default (CMake variable LIBOMP_USE_DEBUGGER)
and can be turned off with -DLIBOMP_USE_DEBUGGER=off.
Differential Revision: http://reviews.llvm.org/D10038
llvm-svn: 241832
|
|
|
|
|
|
|
|
|
| |
This change changes kmp_bstate.old_tid to sign integer instead of unsigned integer.
It also defines two new macros KMP_NSEC_PER_SEC and KMP_USEC_PER_SEC which lets us take
control of the sign (we want them to be longs). Also, in kmp_wait_release.h, the byteref()
function's return type is changed from char to unsigned char.
llvm-svn: 239057
|
|
|
|
|
|
|
|
|
| |
Some old references to RML and IOMP which aren't used anywhere are deleted.
http://lists.cs.uiuc.edu/pipermail/openmp-dev/2015-June/000664.html
Patch by Jack Howarth and Jonathan Peyton
llvm-svn: 238878
|
|
|
|
|
|
|
| |
Getting rid of more iomp references.
http://lists.cs.uiuc.edu/pipermail/openmp-dev/2015-June/000659.html
llvm-svn: 238847
|
|
|
|
| |
llvm-svn: 236753
|
|
|
|
|
|
| |
All of them are surrounded by #if OMPT_SUPPORT and can be disabled (which is the default).
llvm-svn: 236122
|
|
|
|
|
|
| |
build infrastructure
llvm-svn: 236117
|
|
|
|
| |
llvm-svn: 231778
|