summaryrefslogtreecommitdiffstats
path: root/openmp/runtime/src/kmp_csupport.c
Commit message (Collapse)AuthorAgeFilesLines
* Change source files from .c to .cppJonathan Peyton2016-12-141-3337/+0
| | | | | | | | Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D26688 llvm-svn: 289732
* Support of mips & mips64 for openmprtlSylvestre Ledru2016-12-081-1/+1
| | | | | | | | | | | | | | Summary: Implemented by Dejan Latinovic See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=790735 for more more information Reviewers: AndreyChurbanov, jlpeyton Subscribers: openmp-commits, mgorny Differential Revision: https://reviews.llvm.org/D26576 llvm-svn: 289032
* [OMPT] fix task frame information for gomp interfaceJonas Hahnfeld2016-09-141-15/+3
| | | | | | | | | | | Previous differencials D23305-D23310 changed task frame information management only for the kmp interface, but not for the whole gomp interface. This broke some testcases when building with gcc. This patch fixes the broken task frame information for the gomp interface. Patch by Joachim Protze! Differential Revision: https://reviews.llvm.org/D24502 llvm-svn: 281468
* [OMPT] save exit address to lwt if availableJonas Hahnfeld2016-09-141-7/+14
| | | | | | | | | | | In case, the current team is a serialized team (lwt), the frame information should be written to this data structure. Before, nested serialized teams would overwrite the same task information. Patch by Joachim Protze! Differential Revision: https://reviews.llvm.org/D23310 llvm-svn: 281467
* [OMPT] Align implementation of reenter frame address to latest (frozen) ↵Jonas Hahnfeld2016-09-141-3/+16
| | | | | | | | | | | | | | | | version of OMPT spec The latest OMPT spec changed the semantic of a tasks reenter frame to be the application frame, that will be entered, when the runtime frame drops. Before it was the last frame in the runtime. This doesn't work for some gcc execution pathes or even clang generated code for : Since there is no runtime frame between the executed task and the encountering task. The test case compares exit and reenter addresses against addresses captured in application code Patch by Joachim Protze! Differential Revision: https://reviews.llvm.org/D23305 llvm-svn: 281464
* Fix bug in futex fast path inside kmp_csupport.cJonathan Peyton2016-06-221-1/+1
| | | | llvm-svn: 273439
* Apply the KMP_USE_FUTEX feature macro everywhereJonathan Peyton2016-06-221-14/+14
| | | | llvm-svn: 273438
* Renaming change: 41 -> 45 and 4.1 -> 4.5Jonathan Peyton2016-06-141-2/+2
| | | | | | | | OpenMP 4.1 is now OpenMP 4.5. Any mention of 41 or 4.1 is replaced with 45 or 4.5. Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that 41 is deprecated and to use 45 instead. llvm-svn: 272687
* Offer API for setting number of loop dispatch buffersJonathan Peyton2016-05-311-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem is the lack of dispatch buffers when thousands of loops with nowait, about 10 iterations each, are executed by hundreds of threads. We only have built-in 7 dispatch buffers, but there is a need in dozens or hundreds of buffers. The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order to give users same possibility I changed build-time control into run-time one, adding API just in case. This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API function kmp_set_disp_num_buffers(int num_buffers). The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization, because during the serial initialization we already allocate buffers for the hot team, so it is too late to change the number of buffers later (or we need to reallocate buffers for all teams which sounds too complicated). The kmp_set_defaults() routine does not work for this envirable, because it calls serial initialization before reading the parameter string. So a new routine, kmp_set_disp_num_buffers(), is created so that it can set our internal global variable before the library initialization. If both the envirable and API used the envirable wins. Differential Revision: http://reviews.llvm.org/D20697 llvm-svn: 271318
* Make LIBOMP_USE_ITT_NOTIFY a setting that can be enabled or disabledJonathan Peyton2016-05-261-1/+1
| | | | | | | | | | | | | On Blue Gene/Q, having LIBOMP_USE_ITT_NOTIFY support compiled into a statically-linked binary causes a failure at runtime because dlopen fails. This patch changes LIBOMP_USE_ITT_NOTIFY to a cacheable configuration setting that can be disabled. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D20517 llvm-svn: 270884
* Remove trailing whitespace in src/ directoryJonathan Peyton2016-05-201-2/+2
| | | | | | This patch doesn't affect D19878's context. So D19878 still cleanly applies. llvm-svn: 270252
* Clean all the mess around KMP_USE_FUTEX and kmp_lock.hPaul Osmialowski2016-05-161-0/+1
| | | | | | | | | | | | | | | | | | KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used inconsequently throughout LLVM libomp code. * some .c files that use this define do not include kmp_lock.h file, in effect guarded part of code are never compiled * some places in code use architecture-depending preprocessor logic expressions which effectively disable use of Futex for AArch64 architecture, all these places should use '#if KMP_USE_FUTEX' instead to avoid any further confusions * some places use KMP_HAS_FUTEX which is nowhere defined, KMP_USE_FUTEX should be used instead Differential Revision: http://reviews.llvm.org/D19629 llvm-svn: 269642
* [STATS] Use partitioned timer schemeJonathan Peyton2016-05-051-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | This change removes the current timers with ones that partition time properly. The current timers are nested, so that if a new timer, B, starts when the current timer, A, is already timing, A's time will include B's. To eliminate this problem, the partitioned timers are designed to stop the current timer (A), let the new timer run (B), and when the new timer is finished, restart the previously running timer (A). With this partitioning of time, a threads' timers all sum up to the OMP_worker_thread_life time and can now easily show the percentage of time a thread is spending in different parts of the runtime or user code. There is also a new state variable associated with each thread which tells where it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if time is spent in OMP_task_taskwait, then that thread executed tasks inside a #pragma omp taskwait construct. The changes are mostly changing the MACROs to use the new PARITIONED_* macros, the new partitionedTimers class and its methods, and new state logic. Differential Revision: http://reviews.llvm.org/D19229 llvm-svn: 268640
* [ITTNOTIFY] Remove serialized parallel regions from frame notificationJonathan Peyton2016-04-191-30/+0
| | | | llvm-svn: 266760
* Exponential back off logic for test-and-set lockJonathan Peyton2016-04-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | This change adds back off logic in the test and set lock for better contended lock performance. It uses a simple truncated binary exponential back off function. The default back off parameters are tuned for x86. The main back off logic has a two loop structure where each is controlled by a user-level parameter: max_backoff - limits the outer loop number of iterations. This parameter should be a power of 2. min_ticks - the inner spin wait loop number of "ticks" which is system dependent and should be tuned for your system if you so choose. The "ticks" on x86 correspond to the time stamp counter, but on other architectures ticks is a timestamp derived from gettimeofday(). The user can modify these via the environment variable: KMP_SPIN_BACKOFF_PARAMS=max_backoff[,min_ticks] Currently, since the default user lock is a queuing lock, one would have to also specify KMP_LOCK_KIND=tas to use the test-and-set locks. Differential Revision: http://reviews.llvm.org/D19020 llvm-svn: 266329
* Fixing the non-x86 build by removing dependence on kmp_cpuid_tHal Finkel2016-03-271-2/+9
| | | | | | | | | | | | The problem is that the definition of kmp_cpuinfo_t contains: char name [3*sizeof (kmp_cpuid_t)]; // CPUID(0x80000002,0x80000003,0x80000004) and kmp_cpuid_t is only defined when compiling for x86. Differential Revision: http://reviews.llvm.org/D18245 llvm-svn: 264535
* [OMPT] Fix parallel_id and task_id in loop_end with schedule staticJonas Hahnfeld2016-03-241-6/+3
| | | | | | | | | For serialized parallel regions, wrong ids were reported. Now the same code is used as in kmp_dispatch.cpp which emits the correct ids. Differential Revision: http://reviews.llvm.org/D18348 llvm-svn: 264266
* Fix Visual Studio buildsJonathan Peyton2016-03-231-1/+3
| | | | | | | Have Visual Studio use MemoryBarrier() instead of _mm_mfence() and remove __declspec align attribute from function parameters in kmp_atomic.h llvm-svn: 264166
* [STATS] Add OMP_critical and OMP_critical_wait timersJonathan Peyton2016-03-211-1/+3
| | | | | | | OMP_critical - time spent in critical section OMP_critical_wait - time spent waiting to enter a critical section llvm-svn: 263967
* [STATS] fix master and single timersJonathan Peyton2016-03-031-3/+5
| | | | | | Only the thread which executes the single/master section will update its statistics. llvm-svn: 262656
* Add new OpenMP 4.5 doacross loop nest featureJonathan Peyton2016-03-021-0/+289
| | | | | | | | | | | | | | | | | | | | | | From the standard: A doacross loop nest is a loop nest that has cross-iteration dependence. An iteration is dependent on one or more lexicographically earlier iterations. The ordered clause parameter on a loop directive identifies the loop(s) associated with the doacross loop nest. The init/fini routines allocate/free doacross buffer(s) for each loop for each thread. The wait routine waits for a flag designated by the dependence vector. The post routine sets the flag designated by current iteration vector. We use a similar technique of shared buffer indices that covers up to 7 nowait loops executed simultaneously by different threads (number 7 has no real meaning, just heuristic value). Also, the size of structures are kept intact via reducing dummy arrays. This needs to be put into the OpenMP runtime library in order for the compiler team to develop the compiler side of the implementation. Differential Revision: http://reviews.llvm.org/D17399 llvm-svn: 262532
* Fix build error: OMPT_SUPPORT=true was not tested after hinted lock changesJonathan Peyton2015-12-231-1/+8
| | | | | | | | | | | | | | | | | | | | Recent changes to support dynamic locks didn't consider the code compiled when OMPT_SUPPORT=true. As a result, the OMPT support was broken by recent changes to nested locks to support dynamic locks. For OMPT to work with dynamic locks, they need to provide a return code indicating whether a nested lock acquisition was the first or not. This patch moves the OMPT support for nested locks into the #else case when DYNAMIC locks were not used. New support is needed for dynamic locks. This patch fixes the build and leaves a placeholder where the missing OMPT callbacks can be added either the author of the OMPT support for locks, or the dynamic locking support. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D15656 llvm-svn: 256314
* Hinted lock (OpenMP 4.5 feature) Updates/Fixes Part 3Jonathan Peyton2015-12-111-76/+195
| | | | | | | | | | | | | | | | | | | | | | | | This change set includes all changes to make the code conform to the OMP 4.5 specification: * Removed hint / hinted_init definitions from include/40 files * Hint values are powers of 2 to enable composition (4.5 spec) * Hinted lock initialization functions were renamed (4.5 spec) kmp_init_lock_hinted -> omp_init_lock_with_hint kmp_init_nest_lock_hinted -> omp_init_nest_lock_with_hint * __kmpc_critical_section_with_hint was added to support a critical section with a hint (4.5 spec) * __kmp_map_hint_to_lock was added to convert a hint (possibly a composite) to an internal lock type * kmpc_init_lock_with_hint and kmpc_init_nest_lock_with_hint were added as internal entries for the hinted lock initializers. The preivous internal functions (__kmp_init*) were moved to kmp_csupport.c and reused in multiple places * Added the two init functions to dllexports * KMP_USE_DYNAMIC_LOCK is turned on if OMP_41_ENABLED is turned on Differential Revision: http://reviews.llvm.org/D15205 llvm-svn: 255376
* Hinted lock (OpenMP 4.5 feature) Updates/Fixes Part 2Jonathan Peyton2015-12-111-56/+64
| | | | | | | | | | | | | | | | | | | * Added a new user TSX lock implementation, RTM, This implementation is a light-weight version of the adaptive lock implementation, omitting the back-off logic for deciding when to specualte (or not). The fall-back lock is still the queuing lock. * Changed indirect lock table management. The data for indirect lock management was encapsulated in the "kmp_indirect_lock_table_t" type. Also, the lock table dimension was changed to 2D (was linear), and each entry is a kmp_indirect_lock_t object now (was a pointer to an object). * Some clean up in the critical section code * Removed the limits of the tuning parameters read from KMP_ADAPTIVE_LOCK_PROPS * KMP_USE_DYNAMIC_LOCK=1 also turns on these two switches: KMP_USE_TSX, KMP_USE_ADAPTIVE_LOCKS Differential Revision: http://reviews.llvm.org/D15204 llvm-svn: 255375
* Hinted lock (OpenMP 4.5 feature) Updates/FixesJonathan Peyton2015-12-111-3/+3
| | | | | | | | | | | | | There are going to be two more patches which bring this feature up to date and in line with OpenMP 4.5. * Renamed jump tables for the lock functions (and some clean up). * Renamed some macros to be in KMP_ namespace. * Return type of unset functions changed from void to int. * Enabled use of _xebgin() et al. intrinsics for accessing TSX instructions. Differential Revision: http://reviews.llvm.org/D15199 llvm-svn: 255373
* Replace DYNA_* names with KMP_* namesJonathan Peyton2015-12-031-83/+83
| | | | llvm-svn: 254637
* [OMPT] Add OMPT events for API lockingJonathan Peyton2015-10-161-1/+49
| | | | | | | | | | | | | | | | | | | | | This fix implements the following OMPT events for the API locking routines: * ompt_event_acquired_lock * ompt_event_acquired_nest_lock_first * ompt_event_acquired_nest_lock_next * ompt_event_init_lock * ompt_event_init_nest_lock * ompt_event_destroy_lock * ompt_event_destroy_nest_lock For the acquired events the depths of the locks ist required, so a return value was added similiar to the return values we already have for the release lock routines. Patch by Tim Cramer Differential Revision: http://reviews.llvm.org/D13689 llvm-svn: 250526
* [OMPT] Reduce overhead of OMPTJonathan Peyton2015-10-091-4/+4
| | | | | | | | | | | * Avoid computing state needed only by OMPT unless the ompt_enabled flag is set. * Properly handle a corner case in OMPT where team == NULL. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D13502 llvm-svn: 249857
* Added sockets to the syntax of KMP_PLACE_THREADS environment variable.Jonathan Peyton2015-10-081-2/+4
| | | | | | | | | | | | | | | | | | | Added (optional) sockets to the syntax of the KMP_PLACE_THREADS environment variable. Some limitations: * The number of sockets and then optional offset should be specified first (before other parameters). * The letter designation is mandatory for sockets and then for other parameters. * If number of cores is specified first, then the number of sockets is defaulted to all sockets on the machine; also, the old syntax is partially supported if sockets are skipped. * If number of threads per core is specified first, then the number of sockets and cores per socket are defaulted to all sockets and all cores per socket respectively. * The number of cores per socket cannot be specified before sockets or after threads per core. * The number of threads per core can be specified before or after core-offset (old syntax required it to be before core-offset); * Parameters delimiter can be: empty, comma, lower-case x; * Spaces are allowed around numbers, around letters, around delimiter. Approximate shorthand specification: KMP_PLACE_THREADS="[num_sockets(S|s)[[delim]offset(O|o)][delim]][num_cores_per_socket(C|c)[[delim]offset(O|o)][delim]][num_threads_per_core(T|t)]" Differential Revision: http://reviews.llvm.org/D13175 llvm-svn: 249708
* [OMPT] Simplify control variable logic for OMPTJonathan Peyton2015-09-211-19/+17
| | | | | | | | | | | | | | | Prior to this change, OMPT had a status flag ompt_status, which could take several values. This was due to an earlier OMPT design that had several levels of enablement (ready, disabled, tracking state, tracking callbacks). The current OMPT design has OMPT support either on or off. This revision replaces ompt_status with a boolean flag ompt_enabled, which simplifies the runtime logic for OMPT. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D12999 llvm-svn: 248189
* Remove fork_context argument from __kmp_join_call() when OMPT is offJonathan Peyton2015-08-311-2/+10
| | | | | | | | | Conditionally include the fork_context parameter to __kmp_join_call() only if OMPT_SUPPORT=1 Differential Revision: http://reviews.llvm.org/D12495 llvm-svn: 246460
* Tidy statistics collectionJonathan Peyton2015-08-111-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes some statistics counters and timers which were not used, adds new counters and timers for some language features that were not monitored previously and separates the counters and timers into those which are of interest for investigating user code and those which are only of interest to the developer of the runtime itself. The runtime developer statistics are now ony collected if the additional #define KMP_DEVELOPER_STATS is set. Additional user statistics which are now collected include: * Count of nested parallelism (omp parallel inside a parallel region) * Count of omp distribute occurrences * Count of omp teams occurrences * Counts of task related statistics (taskyield, task execution, task cancellation, task steal) * Values passed to omp_set_numtheads * Time spent in omp single and omp master None of this affects code compiled without stats gathering enabled, which is the normal library build mode. This also fixes the CMake build by linking to the standard c++ library when building the stats library as it is a requirement. The normal library does not have this requirement and its link phase is left alone. Differential Revision: http://reviews.llvm.org/D11759 llvm-svn: 244677
* D11156: Fix comments by eliminating possible trademark conflictsAndrey Churbanov2015-08-051-4/+4
| | | | llvm-svn: 244034
* Fix OMPT support for task frames, parallel regions, and parallel regions + loopsJonathan Peyton2015-07-211-5/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes it possible for a performance tool that uses call stack unwinding to map implementation-level call stacks from master and worker threads into a unified global view. There are several components to this patch. include/*/ompt.h.var Add a new enumeration type that indicates whether the code for a master task for a parallel region is invoked by the user program or the runtime system Change the signature for OMPT parallel begin/end callbacks to indicate whether the master task will be invoked by the program or the runtime system. This enables a performance tool using call stack unwinding to handle these two cases differently. For this case, a profiler that uses call stack unwinding needs to know that the call path prefix for the master task may differ from those available within the begin/end callbacks if the program invokes the master. kmp.h Change the signature for __kmp_join_call to take an additional parameter indicating the fork_context type. This is needed to supply the OMPT parallel end callback with information about whether the compiler or the runtime invoked the master task for a parallel region. kmp_csupport.c Ensure that the OMPT task frame field reenter_runtime_frame is properly set and cleared before and after calls to fork and join threads for a parallel region. Adjust the code for the new signature for __kmp_join_call. Adjust the OMPT parallel begin callback invocations to carry the extra parameter indicating whether the program or the runtime invokes the master task for a parallel region. kmp_gsupport.c Apply all of the analogous changes described for kmp_csupport.c for the GOMP interface Add OMPT support for the GOMP combined parallel region + loop API to maintain the OMPT task frame field reenter_runtime_frame. kmp_runtime.c: Use the new information passed by __kmp_join_call to adjust the OMPT parallel end callback invocations to carry the extra parameter indicating whether the program or the runtime invokes the master task for a parallel region. ompt_internal.h: Use the flavor of the parallel region API (GNU or Intel) to determine who invokes the master task. Differential Revision: http://reviews.llvm.org/D11259 llvm-svn: 242817
* Fix some bugs in OMPT supportJonathan Peyton2015-07-131-2/+3
| | | | | | | | | | | | | | | | | | | | | | | 1.) in kmp_csupport.c, move computation of parameters only needed for OMPT tracing inside a conditional to reduce overhead if not receiving ompt_event_master_begin callbacks. 2.) in kmp_gsupport.c, remove spurious reset of OMPT reenter_runtime_frame (which is set in its caller, GOMP_parallel_start correct placement of #if OMP_TRACE so that state is maintained even if tracing support not included. 3.) in z_Linux_util.c, add architecture independent support for OMPT by setting and resetting OMPT's exit_frame_ptr before and after invoking a microtask. 4.) On the Intel MIC, the loader refuses to retain static symbols in the libomp.so shared library, even though tools need them. The loader could not be bullied into doing so. To accommodate this, I changed the visibility of OMPT placeholder functions to public. This required additions in exports.so.txt, adding extern "C" scoping in ompt-general.c so that the public placeholder symbols won't be mangled. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D11062 llvm-svn: 242052
* Remove unused variable warnings by deletion.Jonathan Peyton2015-06-081-1/+0
| | | | | | | | | | As an ongoing effort to sanitize the openmp code, these changes delete variables that aren't used at all. http://lists.cs.uiuc.edu/pipermail/openmp-dev/2015-June/000701.html Patch by Jack Howarth llvm-svn: 239334
* Remove unused variable warnings by fooling compiler.Jonathan Peyton2015-06-081-1/+2
| | | | | | | | | | Some variables are convenient to keep around even if they aren't really used in a release build. This is often seen in DEBUG guarded code where the variable is only used in a DEBUG build. Patch by Jack Howarth llvm-svn: 239326
* Suppress uninitialized-variable-is-used warning in kmp_csupport.cJonathan Peyton2015-06-031-2/+2
| | | | | | | | | | | The following change is needed to suppress the "variable 'retval' is used uninitialized whenever 'if' condition is false" warnings in runtime/src/kmp_csupport.c. This change just initializes 'retval' to 0. http://lists.cs.uiuc.edu/pipermail/openmp-dev/2015-June/000667.html Patch by Jack Howarth llvm-svn: 238954
* Fix doxygen commentsJonathan Peyton2015-05-221-2/+3
| | | | | | These fixes make doxygen happy. llvm-svn: 238061
* D9306 omp 4.1 async offload support (partial): code changesAndrey Churbanov2015-05-071-0/+8
| | | | llvm-svn: 236753
* D9302.partial2: cleanup of ittnotify checks, that eliminats redundant ↵Andrey Churbanov2015-05-061-17/+20
| | | | | | notifications in case of nested regions. llvm-svn: 236631
* These are the actual changes in the runtime to issue OMPT-related functions. ↵Andrey Churbanov2015-04-291-3/+172
| | | | | | All of them are surrounded by #if OMPT_SUPPORT and can be disabled (which is the default). llvm-svn: 236122
* Replace some unsafe API calls with safe alternatives on Windows, prepare ↵Andrey Churbanov2015-04-021-2/+2
| | | | | | code for similar actions on other platforms - wrap unsafe API calls into macros. llvm-svn: 233915
* Removed unused varargs from __kmpc_flush function.Andrey Churbanov2015-02-201-6/+2
| | | | llvm-svn: 230032
* Added new user-guided lock api, currently disabled. Use ↵Andrey Churbanov2015-02-201-3/+483
| | | | | | KMP_USE_DYNAMIC_LOCK=1 to enable it. llvm-svn: 230030
* The usage of tt_state flag is replaced by an array of two task_team pointers.Andrey Churbanov2015-02-101-11/+5
| | | | llvm-svn: 228718
* enable environment variable KMP_PLACE_THREADS also for non-MIC architecturesAndrey Churbanov2015-01-291-2/+0
| | | | llvm-svn: 227467
* Comments only: removing the Revision and Date svn variables from the top of ↵Andrey Churbanov2015-01-271-2/+0
| | | | | | all the source files. llvm-svn: 227207
* aarch64 port sent by C. BergstromAndrey Churbanov2015-01-131-20/+20
| | | | llvm-svn: 225792
* I apologise in advance for the size of this check-in. At Intel we doJim Cownie2014-10-071-381/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | understand that this is not friendly, and are working to change our internal code-development to make it easier to make development features available more frequently and in finer (more functional) chunks. Unfortunately we haven't got that in place yet, and unpicking this into multiple separate check-ins would be non-trivial, so please bear with me on this one. We should be better in the future. Apologies over, what do we have here? GGC 4.9 compatibility -------------------- * We have implemented the new entrypoints used by code compiled by GCC 4.9 to implement the same functionality in gcc 4.8. Therefore code compiled with gcc 4.9 that used to work will continue to do so. However, there are some other new entrypoints (associated with task cancellation) which are not implemented. Therefore user code compiled by gcc 4.9 that uses these new features will not link against the LLVM runtime. (It remains unclear how to handle those entrypoints, since the GCC interface has potentially unpleasant performance implications for join barriers even when cancellation is not used) --- new parallel entry points --- new entry points that aren't OpenMP 4.0 related These are implemented fully :- GOMP_parallel_loop_dynamic() GOMP_parallel_loop_guided() GOMP_parallel_loop_runtime() GOMP_parallel_loop_static() GOMP_parallel_sections() GOMP_parallel() --- cancellation entry points --- Currently, these only give a runtime error if OMP_CANCELLATION is true because our plain barriers don't check for cancellation while waiting GOMP_barrier_cancel() GOMP_cancel() GOMP_cancellation_point() GOMP_loop_end_cancel() GOMP_sections_end_cancel() --- taskgroup entry points --- These are implemented fully. GOMP_taskgroup_start() GOMP_taskgroup_end() --- target entry points --- These are empty (as they are in libgomp) GOMP_target() GOMP_target_data() GOMP_target_end_data() GOMP_target_update() GOMP_teams() Improvements in Barriers and Fork/Join -------------------------------------- * Barrier and fork/join code is now in its own file (which makes it easier to understand and modify). * Wait/release code is now templated and in its own file; suspend/resume code is also templated * There's a new, hierarchical, barrier, which exploits the cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve fork/join and barrier performance. ***BEWARE*** the new source files have *not* been added to the legacy Cmake build system. If you want to use that fixes wil be required. Statistics Collection Code -------------------------- * New code has been added to collect application statistics (if this is enabled at library compile time; by default it is not). The statistics code itself is generally useful, the lightweight timing code uses the X86 rdtsc instruction, so will require changes for other architectures. The intent of this code is not for users to tune their codes but rather 1) For timing code-paths inside the runtime 2) For gathering general properties of OpenMP codes to focus attention on which OpenMP features are most used. Nested Hot Teams ---------------- * The runtime now maintains more state to reduce the overhead of creating and destroying inner parallel teams. This improves the performance of code that repeatedly uses nested parallelism with the same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL envirable to a depth to enable this (and, of course, OMP_NESTED=true to enable nested parallelism at all). Improved Intel(r) VTune(Tm) Amplifier support --------------------------------------------- * The runtime provides additional information to Vtune via the itt_notify interface to allow it to display better OpenMP specific analyses of load-imbalance. Support for OpenMP Composite Statements --------------------------------------- * Implement new entrypoints required by some of the OpenMP 4.1 composite statements. Improved ifdefs --------------- * More separation of concepts ("Does this platform do X?") from platforms ("Are we compiling for platform Y?"), which should simplify future porting. ScaleMP* contribution --------------------- Stack padding to improve the performance in their environment where cross-node coherency is managed at the page level. Redesign of wait and release code --------------------------------- The code is simplified and performance improved. Bug Fixes --------- *Fixes for Windows multiple processor groups. *Fix Fortran module build on Linux: offload attribute added. *Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen. *Fix an inconsistent error message for KMP_PLACE_THREADS environment variable. llvm-svn: 219214
OpenPOWER on IntegriCloud