summaryrefslogtreecommitdiffstats
path: root/openmp/runtime/src/kmp_runtime.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Remove outdated commentJonathan Peyton2015-11-121-3/+0
| | | | llvm-svn: 252953
* Remove incorrect debug assert.Jonathan Peyton2015-11-041-1/+0
| | | | | | in __kmp_free_team(), the team's number of processors can be == 1. llvm-svn: 252086
* Remove some empty lines.Jonathan Peyton2015-11-041-9/+0
| | | | llvm-svn: 252084
* Refactor of task_team code.Jonathan Peyton2015-11-041-126/+55
| | | | | | | | | | This is a refactoring of the task_team code that more elegantly handles the two task_team case. Two task_teams per team are kept in use for the lifetime of the team. Thus no reference counting is needed. Differential Revision: http://reviews.llvm.org/D13993 llvm-svn: 252082
* Removed zeroing th.th_task_state for master thread at start of nested parallel.Jonathan Peyton2015-10-201-9/+7
| | | | | | | | | | | The th.th_task_state for the master thread at the start of a nested parallel should not be zeroed in __kmp_allocate_team() because it is later put in the stack of states in __kmp_fork_call() for further re-use after exiting the nested region. It is zeroed after being put in the stack. Differential Revision: http://reviews.llvm.org/D13702 llvm-svn: 250847
* Clean-up cancellation state flag between parallel regionsJonathan Peyton2015-10-191-0/+4
| | | | | | | | Without this fix, cancellation requests in one parallel region cause cancellation of the second region even though the second one was not intended to be cancelled. llvm-svn: 250727
* Formatting/Whitespace/Comment changes associated with wait/release improvements.Jonathan Peyton2015-10-081-6/+4
| | | | llvm-svn: 249725
* Debug trace and assert statement changes for wait/release improvements.Jonathan Peyton2015-10-081-1/+2
| | | | | | | These changes improve/update the trace messages and debug asserts related to the previous wait/release checkin. llvm-svn: 249717
* OpenMP Wait/release improvements.Jonathan Peyton2015-10-081-5/+6
| | | | | | | | | | These changes improve the wait/release mechanism for threads spinning in barriers that are handling tasks while spinnin by providing feedback to the barriers about any task stealing that occurs. Differential Revision: http://reviews.llvm.org/D13353 llvm-svn: 249711
* Add basic NetBSD support.Joerg Sonnenberger2015-09-211-2/+2
| | | | llvm-svn: 248204
* [OMPT] Simplify control variable logic for OMPTJonathan Peyton2015-09-211-46/+35
| | | | | | | | | | | | | | | Prior to this change, OMPT had a status flag ompt_status, which could take several values. This was due to an earlier OMPT design that had several levels of enablement (ready, disabled, tracking state, tracking callbacks). The current OMPT design has OMPT support either on or off. This revision replaces ompt_status with a boolean flag ompt_enabled, which simplifies the runtime logic for OMPT. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D12999 llvm-svn: 248189
* [OMPT] Overhaul OMPT initialization interfaceJonathan Peyton2015-09-211-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | The OMPT specification has changed. This revision brings the LLVM OpenMP implementation up to date. Technical overview of changes: Previously, a public weak symbol ompt_initialize was called after the OpenMP runtime is initialized. The new interface calls a global weak symbol ompt_tool prior to initialization. If a tool is present, ompt_tool returns a pointer to a function that matches the signature for ompt_initialize. After OpenMP is initialized the function pointer is called to initialize a tool. Knowing that OMPT will be enabled before initialization allows OMPT support to be initialized as part of initialization instead of back patching initialization of OMPT support after the fact. Post OpenMP initialization support has been generalized moves from ompt-specific.c into ompt-general.c, since the OMPT initialization logic is no longer implementation specific. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D12998 llvm-svn: 248187
* Fix the OpenMP 3.0 buildJonathan Peyton2015-09-211-0/+2
| | | | | | | | | | | This change adds guards to the code in places where they are missing to enable the OpenMP 3.0 build. Patch by Diego Caballero and Johnny Peyton Mailing List: http://lists.llvm.org/pipermail/openmp-dev/2015-September/000935.html llvm-svn: 248178
* [OMPT] Fix assertion that arises when waiting for proxy tasks on runtime ↵Jonathan Peyton2015-09-101-1/+6
| | | | | | | | | | | | | | shutdown This only triggered when built in debug mode with OMPT enabled: __kmp_wait_template expected the state of the current thread to be either ompt_state_idle or ompt_state_wait_barrier{,_implicit,_explicit}. Patch by Jonas Hahnfeld Differential Revision: http://reviews.llvm.org/D12754 llvm-svn: 247339
* Cleanup of affinity hierarchy code.Jonathan Peyton2015-09-101-0/+1
| | | | | | | | | | | | Some of this is improvement to code suggested by Hal Finkel. Four changes here: 1.Cleanup of hierarchy code to handle all hierarchy cases whether affinity is available or not 2.Separated this and other classes and common functions out to a header file 3.Added a destructor-like fini function for the hierarchy (and call in __kmp_cleanup) 4.Remove some redundant code that is hopefully no longer needed Differential Revision: http://reviews.llvm.org/D12449 llvm-svn: 247326
* Fix hanging barriers if number of parallel regions exceeds UINT_MAXJonathan Peyton2015-09-101-2/+2
| | | | | | | | | | The fix is to make b_arrived flag 64 bit in both structures - kmp_balign_team_t and kmp_balign_t. Otherwise when flag in kmp_balign_team_t wrapped over UINT_MAX the library hangs. Differential Revision: http://reviews.llvm.org/D12563 llvm-svn: 247320
* Remove duplicate of num_threads assignment.Jonathan Peyton2015-09-021-1/+0
| | | | | | | The th.th_team_nproc is assigned in __kmp_allocate_thread() just 3 lines above, so there is no need to assign the same value again. llvm-svn: 246703
* Remove fork_context argument from __kmp_join_call() when OMPT is offJonathan Peyton2015-08-311-2/+9
| | | | | | | | | Conditionally include the fork_context parameter to __kmp_join_call() only if OMPT_SUPPORT=1 Differential Revision: http://reviews.llvm.org/D12495 llvm-svn: 246460
* D11990: Lock-free start of serialized parallel regionsAndrey Churbanov2015-08-181-25/+30
| | | | llvm-svn: 245286
* D11988: Force serial reduction when team size is 1Andrey Churbanov2015-08-171-4/+3
| | | | llvm-svn: 245209
* D11157: Fixed missed threads re-binding in case team size reduced via ↵Andrey Churbanov2015-08-171-1/+2
| | | | | | omp_set_num_threads llvm-svn: 245206
* Tidy statistics collectionJonathan Peyton2015-08-111-16/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes some statistics counters and timers which were not used, adds new counters and timers for some language features that were not monitored previously and separates the counters and timers into those which are of interest for investigating user code and those which are only of interest to the developer of the runtime itself. The runtime developer statistics are now ony collected if the additional #define KMP_DEVELOPER_STATS is set. Additional user statistics which are now collected include: * Count of nested parallelism (omp parallel inside a parallel region) * Count of omp distribute occurrences * Count of omp teams occurrences * Counts of task related statistics (taskyield, task execution, task cancellation, task steal) * Values passed to omp_set_numtheads * Time spent in omp single and omp master None of this affects code compiled without stats gathering enabled, which is the normal library build mode. This also fixes the CMake build by linking to the standard c++ library when building the stats library as it is a requirement. The normal library does not have this requirement and its link phase is left alone. Differential Revision: http://reviews.llvm.org/D11759 llvm-svn: 244677
* Patch out a fatal assertion in OpenMP runtime until preconditions are metJonathan Peyton2015-07-231-0/+4
| | | | | | | | | | | | | | Compiling simple testcase with g++ and linking it to the LLVM OpenMP runtime compiled in debug mode trips an assertion that produces a fatal error. When the assertion is skipped, the program runs successfully to completion and produces the same answer as the sequential code. Intel will restore the assertion with a patch that fixes the issues that cause it to trip. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D11269 llvm-svn: 243032
* Fix OMPT support for task frames, parallel regions, and parallel regions + loopsJonathan Peyton2015-07-211-11/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes it possible for a performance tool that uses call stack unwinding to map implementation-level call stacks from master and worker threads into a unified global view. There are several components to this patch. include/*/ompt.h.var Add a new enumeration type that indicates whether the code for a master task for a parallel region is invoked by the user program or the runtime system Change the signature for OMPT parallel begin/end callbacks to indicate whether the master task will be invoked by the program or the runtime system. This enables a performance tool using call stack unwinding to handle these two cases differently. For this case, a profiler that uses call stack unwinding needs to know that the call path prefix for the master task may differ from those available within the begin/end callbacks if the program invokes the master. kmp.h Change the signature for __kmp_join_call to take an additional parameter indicating the fork_context type. This is needed to supply the OMPT parallel end callback with information about whether the compiler or the runtime invoked the master task for a parallel region. kmp_csupport.c Ensure that the OMPT task frame field reenter_runtime_frame is properly set and cleared before and after calls to fork and join threads for a parallel region. Adjust the code for the new signature for __kmp_join_call. Adjust the OMPT parallel begin callback invocations to carry the extra parameter indicating whether the program or the runtime invokes the master task for a parallel region. kmp_gsupport.c Apply all of the analogous changes described for kmp_csupport.c for the GOMP interface Add OMPT support for the GOMP combined parallel region + loop API to maintain the OMPT task frame field reenter_runtime_frame. kmp_runtime.c: Use the new information passed by __kmp_join_call to adjust the OMPT parallel end callback invocations to carry the extra parameter indicating whether the program or the runtime invokes the master task for a parallel region. ompt_internal.h: Use the flavor of the parallel region API (GNU or Intel) to determine who invokes the master task. Differential Revision: http://reviews.llvm.org/D11259 llvm-svn: 242817
* Enable debugger supportJonathan Peyton2015-07-091-0/+41
| | | | | | | | | | | | These changes enable external debuggers to conveniently interface with the LLVM OpenMP Library. Structures are added which describe the important internal structures of the OpenMP Library e.g., teams, threads, etc. This feature is turned on by default (CMake variable LIBOMP_USE_DEBUGGER) and can be turned off with -DLIBOMP_USE_DEBUGGER=off. Differential Revision: http://reviews.llvm.org/D10038 llvm-svn: 241832
* Fix OMPT state maintenance for barriers and missing init of implicit task id.Jonathan Peyton2015-06-291-0/+4
| | | | | | | | | | | | | | | | | | Fix OMPT support for barriers so that state changes occur even if OMPT_TRACE turned off. These state changes are needed by performance tools that use callbacks for either ompt_event_wait_barrier_begin or ompt_event_wait_barrier_end. Change ifdef flag to OMPT_BLAME for callbacks ompt_event_wait_barrier_begin or ompt_event_wait_barrier_end rather than OMPT_TRACE -- they were misclassified. Without this patch, when the runtime is compiled with LIBOMP_OMPT_SUPPORT=true, LIBOMP_OMPT_BLAME=true, and LIBOMP_OMPT_TRACE=false, and a callback is registered for either ompt_event_wait_barrier_begin or ompt_event_wait_barrier_end, then an assertion will trip. Fix the scoping of one OMPT_TRACE ifdef, which should not have surrounded an update of an OMPT state. Add a missing initialization of an OMPT task id for an implicit task. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D10759 llvm-svn: 240970
* Re-enable Visual Studio Builds.Jonathan Peyton2015-06-221-4/+1
| | | | | | | | | I tried to compile with Visual Studio using CMake and found these two sections of code causing problems for Visual Studio. The first one removes the use of variable length arrays by instead using KMP_ALLOCA(). The second part eliminates a redundant cpuid assembly call by using the already existing __kmp_x86_cpuid() call instead. llvm-svn: 240290
* Remove unused variable warnings by deletion.Jonathan Peyton2015-06-081-9/+0
| | | | | | | | | | As an ongoing effort to sanitize the openmp code, these changes delete variables that aren't used at all. http://lists.cs.uiuc.edu/pipermail/openmp-dev/2015-June/000701.html Patch by Jack Howarth llvm-svn: 239334
* Remove unused variable warnings by moving variables.Jonathan Peyton2015-06-081-3/+4
| | | | | | | | | As an ongoing effort to sanitize the openmp code, these changes move variables under already existing macro guards. Patch by Jack Howarth llvm-svn: 239331
* Remove unused variable warnings by adding proper macro guards.Jonathan Peyton2015-06-081-0/+6
| | | | | | | | | As an ongoing effort to sanitize the openmp code, these changes remove unused variables by adding proper macros around both variables and functions. Patch by Jack Howarth llvm-svn: 239330
* Removed unused functions.Jonathan Peyton2015-06-081-1/+0
| | | | | | | | | | | As an ongoing effort to sanitize the openmp code, these changes remove unused functions. The unused functions are: __kmp_fini_allocator_thread(), __kmp_env_isDefined(), __kmp_strip_quotes(), __kmp_convert_to_seconds(), and __kmp_convert_to_nanoseconds(). Patch by Jack Howarth llvm-svn: 239323
* Get rid of some dead code.Jonathan Peyton2015-06-021-2/+0
| | | | | | | | | Some old references to RML and IOMP which aren't used anywhere are deleted. http://lists.cs.uiuc.edu/pipermail/openmp-dev/2015-June/000664.html Patch by Jack Howarth and Jonathan Peyton llvm-svn: 238878
* Apply name change to src/* files.Jonathan Peyton2015-06-011-1/+1
| | | | | | | | | These changes are mostly in comments, but there are a few that aren't. Change libiomp5 => libomp everywhere. One internal function name is changed in kmp_gsupport.c, and in kmp_i18n.c, the static char[] variable 'name' is changed to "libomp". llvm-svn: 238712
* Change macro GUIDEDLL_EXPORTS to KMP_DYNAMIC_LIBJonathan Peyton2015-05-261-6/+6
| | | | | | | | | | A while back, Hal suggested updating the GUIDEDLL_EXPORTS macro guard to a more descriptive name. It represents a dynamic library build so KMP_DYNAMIC_LIB is a more suitable name. Differential Revision: http://reviews.llvm.org/D9899 llvm-svn: 238221
* One line fix for possible out-of-bounds issue in kmp_runtime.cJonathan Peyton2015-05-261-1/+1
| | | | | | | The variable j is now checked so there is no possible out-of-bounds issue when accessing __kmp_nested_nth.nth[] array. llvm-svn: 238216
* Fix task team synchronization Jonathan Peyton2015-05-211-0/+1
| | | | | | | | | | | The fix simply syncs up the new threads to have the same task_state and task_team as the old threads. The master thread is skipped, because it shouldn't at this point have the team's task_team value yet -- it should still have parent_team's task_team. It gets pointed at the new team's task_team later, after __kmp_allocate_team returns, and the master has stored a memo of it's old task_state. llvm-svn: 237916
* D9306 omp 4.1 async offload support (partial): code changesAndrey Churbanov2015-05-071-0/+10
| | | | llvm-svn: 236753
* D9302.partial2: cleanup of ittnotify checks, that eliminats redundant ↵Andrey Churbanov2015-05-061-64/+58
| | | | | | notifications in case of nested regions. llvm-svn: 236631
* These are the actual changes in the runtime to issue OMPT-related functions. ↵Andrey Churbanov2015-04-291-5/+476
| | | | | | All of them are surrounded by #if OMPT_SUPPORT and can be disabled (which is the default). llvm-svn: 236122
* Replace some unsafe API calls with safe alternatives on Windows, prepare ↵Andrey Churbanov2015-04-021-9/+9
| | | | | | code for similar actions on other platforms - wrap unsafe API calls into macros. llvm-svn: 233915
* Cleanup provided by Carlo BertolliAndrey Churbanov2015-03-031-1/+2
| | | | llvm-svn: 231078
* Detect Intel MIC architecture and set some defaults at run time instead of ↵Andrey Churbanov2015-02-201-15/+47
| | | | | | build time. llvm-svn: 230033
* Added new user-guided lock api, currently disabled. Use ↵Andrey Churbanov2015-02-201-0/+8
| | | | | | KMP_USE_DYNAMIC_LOCK=1 to enable it. llvm-svn: 230030
* The usage of tt_state flag is replaced by an array of two task_team pointers.Andrey Churbanov2015-02-101-100/+171
| | | | llvm-svn: 228718
* Comments only: removing the Revision and Date svn variables from the top of ↵Andrey Churbanov2015-01-271-2/+0
| | | | | | all the source files. llvm-svn: 227207
* Fixed implementation of the teams construct in case it contains parallel ↵Andrey Churbanov2015-01-271-0/+5
| | | | | | regions with different number of threads. llvm-svn: 227198
* few fixes for ittnotify iterface (used by Intel(R) VTune Amplifier)Andrey Churbanov2015-01-161-15/+20
| | | | llvm-svn: 226283
* This patch enables the use of KMP_AFFINITY=balanced on non-MIC ↵Andrey Churbanov2015-01-131-10/+0
| | | | | | Architectures. The restriction for using balanced affinity on non-MIC architectures is it only works for one-package machines. llvm-svn: 225794
* aarch64 port sent by C. BergstromAndrey Churbanov2015-01-131-9/+9
| | | | llvm-svn: 225792
* I apologise in advance for the size of this check-in. At Intel we doJim Cownie2014-10-071-2669/+1156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | understand that this is not friendly, and are working to change our internal code-development to make it easier to make development features available more frequently and in finer (more functional) chunks. Unfortunately we haven't got that in place yet, and unpicking this into multiple separate check-ins would be non-trivial, so please bear with me on this one. We should be better in the future. Apologies over, what do we have here? GGC 4.9 compatibility -------------------- * We have implemented the new entrypoints used by code compiled by GCC 4.9 to implement the same functionality in gcc 4.8. Therefore code compiled with gcc 4.9 that used to work will continue to do so. However, there are some other new entrypoints (associated with task cancellation) which are not implemented. Therefore user code compiled by gcc 4.9 that uses these new features will not link against the LLVM runtime. (It remains unclear how to handle those entrypoints, since the GCC interface has potentially unpleasant performance implications for join barriers even when cancellation is not used) --- new parallel entry points --- new entry points that aren't OpenMP 4.0 related These are implemented fully :- GOMP_parallel_loop_dynamic() GOMP_parallel_loop_guided() GOMP_parallel_loop_runtime() GOMP_parallel_loop_static() GOMP_parallel_sections() GOMP_parallel() --- cancellation entry points --- Currently, these only give a runtime error if OMP_CANCELLATION is true because our plain barriers don't check for cancellation while waiting GOMP_barrier_cancel() GOMP_cancel() GOMP_cancellation_point() GOMP_loop_end_cancel() GOMP_sections_end_cancel() --- taskgroup entry points --- These are implemented fully. GOMP_taskgroup_start() GOMP_taskgroup_end() --- target entry points --- These are empty (as they are in libgomp) GOMP_target() GOMP_target_data() GOMP_target_end_data() GOMP_target_update() GOMP_teams() Improvements in Barriers and Fork/Join -------------------------------------- * Barrier and fork/join code is now in its own file (which makes it easier to understand and modify). * Wait/release code is now templated and in its own file; suspend/resume code is also templated * There's a new, hierarchical, barrier, which exploits the cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve fork/join and barrier performance. ***BEWARE*** the new source files have *not* been added to the legacy Cmake build system. If you want to use that fixes wil be required. Statistics Collection Code -------------------------- * New code has been added to collect application statistics (if this is enabled at library compile time; by default it is not). The statistics code itself is generally useful, the lightweight timing code uses the X86 rdtsc instruction, so will require changes for other architectures. The intent of this code is not for users to tune their codes but rather 1) For timing code-paths inside the runtime 2) For gathering general properties of OpenMP codes to focus attention on which OpenMP features are most used. Nested Hot Teams ---------------- * The runtime now maintains more state to reduce the overhead of creating and destroying inner parallel teams. This improves the performance of code that repeatedly uses nested parallelism with the same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL envirable to a depth to enable this (and, of course, OMP_NESTED=true to enable nested parallelism at all). Improved Intel(r) VTune(Tm) Amplifier support --------------------------------------------- * The runtime provides additional information to Vtune via the itt_notify interface to allow it to display better OpenMP specific analyses of load-imbalance. Support for OpenMP Composite Statements --------------------------------------- * Implement new entrypoints required by some of the OpenMP 4.1 composite statements. Improved ifdefs --------------- * More separation of concepts ("Does this platform do X?") from platforms ("Are we compiling for platform Y?"), which should simplify future porting. ScaleMP* contribution --------------------- Stack padding to improve the performance in their environment where cross-node coherency is managed at the page level. Redesign of wait and release code --------------------------------- The code is simplified and performance improved. Bug Fixes --------- *Fixes for Windows multiple processor groups. *Fix Fortran module build on Linux: offload attribute added. *Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen. *Fix an inconsistent error message for KMP_PLACE_THREADS environment variable. llvm-svn: 219214
OpenPOWER on IntegriCloud