| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Remove unnecessary data from list node structure
2) Remove timerPair in favor of pushing/popping explicitTimers.
This way, nested timers will work properly.
3) Fix #pragma omp critical timers
4) Add histogram capability
5) Add KMP_STATS_FILE formatting capability
6) Have time partitioned into serial & parallel by introducing
partitionedTimers::exchange(). This also counts the number of serial regions
in the executable.
7) Fix up the timers around OMP loops so that scheduling overhead and work are
both counted correctly.
8) Fix up the iterations statistics so they count the number of iterations the
thread receives at each loop scheduling event
9) Change timers so there is only one RDTSC read per event change
10) Fix up the outdated comments for the timers
Differential Revision: https://reviews.llvm.org/D49699
llvm-svn: 338276
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix the order of callbacks related to the taskloop construct.
Add the iteration_count to work callbacks (according to the spec).
Use kmpc_omp_task() instead of kmp_omp_task() to include OMPT callbacks.
Add a testcase.
Patch by Simon Convent
Reviewed by: protze.joachim, hbae
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D47709
llvm-svn: 338146
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ompt/tasks/task_types.c testcase did not test untied tasks properly. Now,
frame addresses are tested and two scheduling points are added at which the
task can switch to another thread. Due to scheduling effects, the frame address
could be NULL.
This needed a restructure of the way OMPT callbacks are called.
__ompt_task_finish() now as an extra parameter, whether a task is completed.
Its invocation has been moved into __kmp_task_finish(). Thus, the order of the
writes to the frame addresses is not subject to scheduling effects anymore.
Patch by Simon Convent
Reviewed by: protze.joachim, hbae
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D49181
llvm-svn: 338145
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function was not enabled by default and not exported when manually
tweaking the build flags. Additionally it was hard to use since there
is no corresponding __kmp_ft_page_free().
The code itself is questionable because the returned memory address
is padded by an extra pointer which stores the unpadded start of the
allocated region (this would need to be freed).
Differential Revision: https://reviews.llvm.org/D49802
llvm-svn: 338052
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change fixes possibly invalid access to the internal data structure during
library shutdown. In a heavily oversubscribed situation, the library shutdown
sequence can reach the point where resources are deallocated while there still
exist threads in their final spinning loop. The added loop in
__kmp_internal_end() checks if there are such busy-waiting threads and blocks
the shutdown sequence if that is the case. Two versions of kmp_wait_template()
are now used to minimize performance impact.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D49452
llvm-svn: 337486
|
|
|
|
|
|
| |
336563 eliminated CCAST() macros caused build failures
llvm-svn: 336586
|
|
|
|
| |
llvm-svn: 336575
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces the logic implementing hierarchical scheduling.
First and foremost, hierarchical scheduling is off by default
To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage.
This work is based off if the IWOMP paper:
"Workstealing and Nested Parallelism in SMP Systems"
Hierarchical scheduling is the layering of OpenMP schedules for different layers
of the memory hierarchy. One can have multiple layers between the threads and
the global iterations space. The threads will go up the hierarchy to grab
iterations, using possibly a different schedule & chunk for each layer.
[ Global iteration space (0-999) ]
(use static)
[ L1 | L1 | L1 | L1 ]
(use dynamic,1)
[ T0 T1 | T2 T3 | T4 T5 | T6 T7 ]
In the example shown above, there are 8 threads and 4 L1 caches begin targeted.
If the topology indicates that there are two threads per core, then two
consecutive threads will share the data of one L1 cache unit. This example
would have the iteration space (0-999) split statically across the four L1
caches (so the first L1 would get (0-249), the second would get (250-499), etc).
Then the threads will use a dynamic,1 schedule to grab iterations from the L1
cache units. There are currently four supported layers: L1, L2, L3, NUMA
OMP_SCHEDULE can now read a hierarchical schedule with this syntax:
OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK
And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before
I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h
to try to keep it separate from the rest of the code.
Differential Revision: https://reviews.llvm.org/D47962
llvm-svn: 336571
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reorganizes the loop scheduling code in order to allow hierarchical
scheduling to use it more effectively. In particular, the goal of this patch
is to separate the algorithmic parts of the scheduling from the thread
logistics code.
Moves declarations & structures to kmp_dispatch.h for easier access in
other files. Extracts the algorithmic part of __kmp_dispatch_init() and
__kmp_dispatch_next() into __kmp_dispatch_init_algorithm() and
__kmp_dispatch_next_algorithm(). The thread bookkeeping logic is still kept in
__kmp_dispatch_init() and __kmp_dispatch_next(). This is done because the
hierarchical scheduler needs to access the scheduling logic without the
bookkeeping logic. To prepare for new pointer in dispatch_private_info_t, a
new flags variable is created which stores the ordered and nomerge flags instead
of them being in two separate variables. This will keep the
dispatch_private_info_t structure the same size.
Differential Revision: https://reviews.llvm.org/D47961
llvm-svn: 336568
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are preliminary changes that attempt to use C++11 Atomics in the runtime.
We are expecting better portability with this change across architectures/OSes.
Here is the summary of the changes.
Most variables that need synchronization operation were converted to generic
atomic variables (std::atomic<T>). Variables that are updated with combined CAS
are packed into a single atomic variable, and partial read/write is done
through unpacking/packing
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D47903
llvm-svn: 336563
|
|
|
|
|
|
|
|
|
| |
The current implementation always provides the thread-num for the current
parallel region. This patch fixes the behavior for ancestor levels >0.
Differential Revision: https://reviews.llvm.org/D46533
llvm-svn: 336085
|
|
|
|
| |
llvm-svn: 335138
|
|
|
|
| |
llvm-svn: 334335
|
|
|
|
|
|
|
|
| |
Rename ompt_wait_id to omp_wait_id, as defined in the spec.
Differential Revision: https://reviews.llvm.org/D46530
llvm-svn: 333368
|
|
|
|
|
|
|
|
| |
Rename ompt_frame_t to omp_frame_t, as defined in the spec.
Differential Revision: https://reviews.llvm.org/D43568
llvm-svn: 333367
|
|
|
|
|
|
|
|
|
|
| |
Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands.
This also fixes installation of libomptarget-nvptx that previously
didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX.
Differential Revision: https://reviews.llvm.org/D47130
llvm-svn: 333284
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
implicit_task_end callbacks in nested parallel regions did not always give the
correct thread_num, since the inner parallel region may have already been
finalized.
Now, the thread_num is stored at the beginning of the implicit task and
retrieved at the end, whenever necessary.
A testcase was added as well.
Differential Revision: https://reviews.llvm.org/D46260
llvm-svn: 331632
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This line
(https://github.com/llvm-mirror/openmp/blob/0ed912c7a798f5c4f65f8bb6b492e07fab7f4cea/runtime/src/kmp_gsupport.cpp#L1459)
added in D45327 (rL330282) causes a compilation failure.
Reviewers: jlpeyton
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D45786
llvm-svn: 330299
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the affinity API reports garbage for the initial place list and any
thread's place lists when using KMP_AFFINITY=none|compact|scatter.
This patch does two things:
for KMP_AFFINITY=none, Creates a one entry table for the places, this way, the
initial place list is just a single place with all the proc ids in it. We also
set the initial place of any thread to 0 instead of KMP_PLACE_ALL so that the
thread reports that single place (place 0) instead of garbage (-1) when using
the affinity API.
When non-OMP_PROC_BIND affinity is used
(including KMP_AFFINITY=compact|scatter), a thread's place list is populated
correctly. We assume that each thread is assigned to a single place. This is
implemented in two of the affinity API functions
Differential Revision: https://reviews.llvm.org/D45527
llvm-svn: 330283
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces GOMP_taskloop to our API. It adds GOMP_4.5 to our
version symbols. Being a wrapper around __kmpc_taskloop, the function
creates a task with the loop bounds properly nested in the shareds so that
the GOMP task thunk will work properly. Also, the firstprivate copy constructors
are properly handled using the __kmp_gomp_task_dup() auxiliary function.
Currently, only linear spawning of tasks is supported
for the GOMP_taskloop interface.
Differential Revision: https://reviews.llvm.org/D45327
llvm-svn: 330282
|
|
|
|
| |
llvm-svn: 329928
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change removes the unnecessary lock operation on __kmp_initz_lock inside
the __kmp_atfork_child() function for Linux; the lock variable is initialized
in the same function later.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D44949
llvm-svn: 328900
|
|
|
|
| |
llvm-svn: 328575
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D44793
llvm-svn: 328228
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added settings code to read OMP_TARGET_OFFLOAD environment variable. Added
target-offload-var ICV as __kmp_target_offload, set via OMP_TARGET_OFFLOAD,
if available, otherwise defaulting to DEFAULT. Valid values for the ICV are
specified as enum values {0,1,2} for disabled, default, and mandatory. An
internal API access function __kmpc_get_target_offload is provided.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D44577
llvm-svn: 328046
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D44637
llvm-svn: 327875
|
|
|
|
|
|
|
|
| |
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D41914
llvm-svn: 326733
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D44019
llvm-svn: 326728
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The thread_num parameter of ompt_get_task_info() was not being used previously,
but need to be set.
The print_task_type() function (form the task-types.c testcase) was merged into
the print_ids() function (in callback.h). Testing of ompt_get_task_info() was
added to the task-types.c testcase. It was not tested extensively previously.
Differential Revision: https://reviews.llvm.org/D42472
llvm-svn: 326338
|
|
|
|
|
|
|
|
|
| |
This is required to be NULL for implicit barriers at the end of a
parallel region. Noticed in review of D43191.
Differential Revision: https://reviews.llvm.org/D43308
llvm-svn: 325922
|
|
|
|
|
|
|
|
| |
Applying clang-format to the /runtime/src/ folder
Differential Revision: https://reviews.llvm.org/D42169
llvm-svn: 325424
|
|
|
|
|
|
|
|
|
|
| |
Only use ompt_ functions when testing OMPT in api_calls testcase.
Add size parameter to print_list.
Fix small bug in implementation of ompt_get_partition_place_nums(): return correct length.
Differential Revision: https://reviews.llvm.org/D42162
llvm-svn: 325422
|
|
|
|
|
|
|
|
|
| |
If tool initialization returns 0, OMPT should not be active. The current
implementation provided some callback invocations in this case.
Differential Revision: https://reviews.llvm.org/D42709
llvm-svn: 324320
|
|
|
|
|
|
| |
The previous commit did not revert all replaced ompt_mutex_impl_unknown.
llvm-svn: 322631
|
|
|
|
|
|
|
|
|
|
| |
The defintion is not part of the spec and thus should not have the prefix
"ompt_" but rather a prefix that indicates that this is implementation
specific.
Differential Revision: https://reviews.llvm.org/D41166
llvm-svn: 322621
|
|
|
|
|
|
|
|
|
|
|
| |
threads
When the current thread is not an (initialized) OpenMP thread, the runtime
entry points return values that correspond to "not available" or similar
Differential Revision: https://reviews.llvm.org/D41167
llvm-svn: 322620
|
|
|
|
|
|
|
|
| |
Patch by simone <simone@cs.utah.edu>.
Differential Revision: https://reviews.llvm.org/D41945
llvm-svn: 322282
|
|
|
|
|
|
|
|
|
|
|
|
| |
If user requested affinity with granularity=tile we need to either use HWLOC
or ignore the request. The change allows user to not specify
KMP_TOPOLOGY_METHOD=hwloc and choose it automatically instead.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D40905
llvm-svn: 322205
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change simplifies __kmp_expand_threads to take a single argument.
Previously, it allowed two arguments and had logic to decide on different
potential expansion sizes. However, no calls to __kmp_expand_threads in the
runtime make use of this extra logic. Thus the extra argument and logic is
removed here.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D41836
llvm-svn: 322204
|
|
|
|
|
|
|
|
| |
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D41831
llvm-svn: 322203
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change improves stability of the runtime when the application forks child
processes. Acquiring/releasing __kmp_initz_lock and __kmp_forkjoin_lock in the
atfork handlers insures that the actual fork does not occur while those two
locks are held, and __kmp_itt_reset() reverts the itt's global state to the
initial state which also initializes the mutex stored in the global state.
Some missing initialization code was also inserted in the child's atfork handler.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D41462
llvm-svn: 322202
|
|
|
|
|
|
|
|
| |
Changes to task_data in barrier-begin were not visible at barrier-end
Differential Revision: https://reviews.llvm.org/D41176
llvm-svn: 322178
|
|
|
|
|
|
|
|
| |
incorrectly on 32-bit machines.
Differential Revision: https://reviews.llvm.org/D41854
llvm-svn: 322068
|
|
|
|
|
|
|
|
|
|
| |
This field is defined as kmp_int32, so we should use neither
pointers to kmp_int64 nor 64 bit atomic instructions.
(Found while testing on a Raspberry Pi, 32 bit ARM)
Differential Revision: https://reviews.llvm.org/D41656
llvm-svn: 321964
|
|
|
|
| |
llvm-svn: 321831
|
|
|
|
| |
llvm-svn: 321827
|
|
|
|
|
|
|
|
|
| |
As for normal task creation, the task frame addresses need to be stored
for the encountering task.
Differential Revision: https://reviews.llvm.org/D41165
llvm-svn: 321421
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The format string for hints only prints the second argument (string) and drops
the first argument (hint id). Depending on how you read the POSIX text for
printf, this could be valid. But for practical reason, i.e., unpacking the
va_list passed to printf based on the formating information, it makes sense
to fix the implementation and not pass the id for hint.
Failing testcases were:
misc_bugs/teams-reduction.c
ompt/parallel/not_enough_threads.c
Differential Revision: https://reviews.llvm.org/D41504
llvm-svn: 321361
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function is defined in OpenMP-TR6 section 4.1.5.1.6
The functions was not implemented yet.
Since ompt-functions can only be called after the runtime was initialized and
has loaded a tool, it can assume the runtime to be initialized. In contrast
to omp_get_num_procs which needs to check whether the runtime is initialized.
Differential Revision: https://reviews.llvm.org/D40949
llvm-svn: 321269
|
|
|
|
|
|
|
|
|
|
|
| |
This revision fixes failing testcases with parallel for loops and the gomp
interface. The return address needs to be stored at entry to runtime.
The storage is cleared on usage, so we need to update the storage before
calling again internal functions, that will trigger event callbacks.
Differential Revision: https://reviews.llvm.org/D41181
llvm-svn: 321265
|