summaryrefslogtreecommitdiffstats
path: root/openmp/runtime/src
Commit message (Collapse)AuthorAgeFilesLines
...
* [STATS] Add stats gathering for taskloop constructJonathan Peyton2016-06-132-0/+5
| | | | llvm-svn: 272560
* Fix spelling in commentJonathan Peyton2016-06-091-1/+1
| | | | llvm-svn: 272291
* Refactor __kmp_execute_tasks_template functionJonathan Peyton2016-06-091-228/+105
| | | | | | | | | | | | | | Refactored __kmp_execute_tasks_template to shorten and remove code redundancy. The original code for __kmp_execute_tasks_template was very redundant with large sections of repeated code that needed to be kept consistent, and goto statements that made the control flow difficult to discern. This refactoring removes all gotos and redundancy. Patch by Terry Wilmarth Differential Revision: http://reviews.llvm.org/D20879 llvm-svn: 272286
* kmp_lock.h: Fix VS2013 build after r271324Hans Wennborg2016-06-091-0/+16
| | | | | | | | | | | MSVC doesn't allow std::atomic<>s in a union since they don't have trivial copy constructor. Replacing them with e.g. std::atomic_int works, but that breaks the GCC build on Linux, because then calls to e.g. std::atomic_load_explicit fail, as they expect a real std::atomic<> pointer. Fixing this with an #ifdef to unbreak the build for now. llvm-svn: 272271
* Fine tuning of TC* macros - small followupPaul Osmialowski2016-06-011-1/+1
| | | | | | | | | As I replaced no-op TCR_4 with actual code, compiler complained while building debug build. This patch moves 'cast to int' to the correct place. Extension to Differential Revision: http://reviews.llvm.org/D19880 llvm-svn: 271377
* Use C++11 atomics for ticket locks implementationPaul Osmialowski2016-05-316-82/+139
| | | | | | | | | | | | | | | | | | | | | | This patch replaces use of compiler builtin atomics with C++11 atomics for ticket locks implementation. Ticket locks are used in critical places of the runtime, e.g. in the tasking mechanism. The main reason this change was introduced is the problem with work stealing function on ARM architecture which suffered from nasty race condition. It turned out that the root cause of the problem lies in the way ticket locks are implemented. Changing compiler builtins into C++11 atomics solves the problem. Two assertions were added into kmp_tasking.c which are useful for detecting early symptoms of something wrong going on with work stealing, which were among the possible outcomes of the race condition. Differential Revision: http://reviews.llvm.org/D19878 llvm-svn: 271324
* Addition of OpenMP 4.5 feature: schedule(simd:static)Jonathan Peyton2016-05-312-1/+28
| | | | | | | | | | | | This patch implements the new kmp_sch_static_balanced_chunked schedule kind that the compiler will generate when it encounters schedule(simd: static). It just adds the new constant and the new switch case __kmp_for_static_init. Patch by Alex Duran. Differential Revision: http://reviews.llvm.org/D20699 llvm-svn: 271320
* Avoid deadlock with COIJonathan Peyton2016-05-314-28/+82
| | | | | | | | | | | | | | When an asynchronous offload task is completed, COI calls the runtime to queue a "destructor task". When the task deques are full, a dead-lock situation arises where the OpenMP threads are inside but cannot progress because the COI thread is stuck inside the runtime trying to find a slot in a deque. This patch implements the solution where the task deques doubled in size when a task is being queued from a COI thread. Differential Revision: http://reviews.llvm.org/D20733 llvm-svn: 271319
* Offer API for setting number of loop dispatch buffersJonathan Peyton2016-05-3114-14/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem is the lack of dispatch buffers when thousands of loops with nowait, about 10 iterations each, are executed by hundreds of threads. We only have built-in 7 dispatch buffers, but there is a need in dozens or hundreds of buffers. The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order to give users same possibility I changed build-time control into run-time one, adding API just in case. This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API function kmp_set_disp_num_buffers(int num_buffers). The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization, because during the serial initialization we already allocate buffers for the hot team, so it is too late to change the number of buffers later (or we need to reallocate buffers for all teams which sounds too complicated). The kmp_set_defaults() routine does not work for this envirable, because it calls serial initialization before reading the parameter string. So a new routine, kmp_set_disp_num_buffers(), is created so that it can set our internal global variable before the library initialization. If both the envirable and API used the envirable wins. Differential Revision: http://reviews.llvm.org/D20697 llvm-svn: 271318
* Fix storing the frame pointer for OMP-T during ppc64 microtask dispatchHal Finkel2016-05-271-0/+1
| | | | | | Thanks to John Mellor-Crummey for reporting the omission. llvm-svn: 271035
* Add missing OpenMP 4.5 device entries to stubs library.Jonathan Peyton2016-05-273-10/+119
| | | | llvm-svn: 271006
* Fix for OMP_PROC_BIND=spread strategyJonathan Peyton2016-05-261-4/+14
| | | | | | | | | | | | | | | | The OMP_PROC_BIND=spread strategy fails to assign the master thread the correct place partition after the first parallel region. Other threads in the hot team will remember their place_partition, but the master's place partition is restored to what it was before entering the parallel region. So when the hot team is used for subsequent parallel regions, the master has lost this info. This fix calls __kmp_partition_places to update only the master thread's place partition in the spread case when there are no other changes to the hot team. Patch by Terry Wilmarth Differential Revision: http://reviews.llvm.org/D20539 llvm-svn: 270890
* Make LIBOMP_USE_ITT_NOTIFY a setting that can be enabled or disabledJonathan Peyton2016-05-263-3/+5
| | | | | | | | | | | | | On Blue Gene/Q, having LIBOMP_USE_ITT_NOTIFY support compiled into a statically-linked binary causes a failure at runtime because dlopen fails. This patch changes LIBOMP_USE_ITT_NOTIFY to a cacheable configuration setting that can be disabled. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D20517 llvm-svn: 270884
* Add an assembly __kmp_invoke_microtask for ppc64[le]Hal Finkel2016-05-262-1/+221
| | | | | | | | | Clang no longer restricts itself to generating microtasks with a small number of arguments, and so an assembly implementation is required to prevent hitting the parameter limit present in the C implementation. This adds an implementation for ppc64[le]. llvm-svn: 270821
* D20525: Use more general function for getting gtid which may be faster than ↵Andrey Churbanov2016-05-251-1/+1
| | | | | | specific one. llvm-svn: 270694
* Fork performance improvementsJonathan Peyton2016-05-232-31/+47
| | | | | | | | | | | Most of this is modifications to check for differences before updating data fields in team struct. There is also some rearrangement of the team struct. Patch by Diego Caballero Differential Revision: http://reviews.llvm.org/D20487 llvm-svn: 270468
* Allow unit testing on WindowsJonathan Peyton2016-05-231-1/+5
| | | | | | | | | | | | | | These changes allow testing on Windows using clang.exe. There are two main changes: 1. Only link to -lm when it actually exists on the system 2. Create basic versions of pthread_create() and pthread_join() for windows. They are not POSIX compliant by any stretch but will allow any existing and future tests to use pthread_create() and pthread_join() for testing interactions of libomp with os threads. Differential Revision: http://reviews.llvm.org/D20391 llvm-svn: 270464
* Changed parameter names in Fortran modules to correspond with OpenMP 4.5 ↵Jonathan Peyton2016-05-232-76/+76
| | | | | | specification llvm-svn: 270447
* Remove trailing whitespace in src/ directoryJonathan Peyton2016-05-2031-163/+163
| | | | | | This patch doesn't affect D19878's context. So D19878 still cleanly applies. llvm-svn: 270252
* Clean all the mess around KMP_USE_FUTEX and kmp_lock.hPaul Osmialowski2016-05-165-13/+16
| | | | | | | | | | | | | | | | | | KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used inconsequently throughout LLVM libomp code. * some .c files that use this define do not include kmp_lock.h file, in effect guarded part of code are never compiled * some places in code use architecture-depending preprocessor logic expressions which effectively disable use of Futex for AArch64 architecture, all these places should use '#if KMP_USE_FUTEX' instead to avoid any further confusions * some places use KMP_HAS_FUTEX which is nowhere defined, KMP_USE_FUTEX should be used instead Differential Revision: http://reviews.llvm.org/D19629 llvm-svn: 269642
* NFC fix indent (relates to my previous commit)Paul Osmialowski2016-05-131-3/+3
| | | | llvm-svn: 269443
* Solve 'Too many args to microtask' problemPaul Osmialowski2016-05-132-3/+144
| | | | | | | | | | | | | This patch solves 'Too many args to microtask' problem which occurs while executing lulesh2.0.3 benchmark on AArch64. To solve this I had to wrtite AArch64 assembly version of __kmp_invoke_microtask() function, similar to x86 and x86_64 implementations. Differential Revision: http://reviews.llvm.org/D19879 llvm-svn: 269399
* Adding new kmp_aligned_malloc() entry pointJonathan Peyton2016-05-1218-6/+165
| | | | | | | | | | | | | This change adds a new entry point, kmp_aligned_malloc(size_t size, size_t alignment), an entry point corresponding to kmp_malloc() but with the capability to return aligned memory as well. Other allocator routines have been adjusted so that kmp_free() can be used for freeing memory blocks allocated by any kmp_*alloc() routine, including the new kmp_aligned_malloc() routine. Differential Revision: http://reviews.llvm.org/D19814 llvm-svn: 269365
* Fix team reuse with foreign threadsJonathan Peyton2016-05-121-0/+2
| | | | | | | | | | | | | | After hot teams were enabled by default, the library started using levels kept in the team structure. The levels are broken in case foreign thread exits and puts its team into the pool which is then re-used by another foreign thread. The broken behavior observed is when printing the levels for each new team, one gets 1, 2, 1, 2, 1, 2, etc. This makes the library believe that every other team is nested which is incorrect. What is wanted is for the levels to be 1, 1, 1, etc. Differential Revision: http://reviews.llvm.org/D19980 llvm-svn: 269363
* New hwloc API compatibilityPaul Osmialowski2016-05-121-0/+13
| | | | | | Differential Revision: http://reviews.llvm.org/D19628 llvm-svn: 269284
* Restore NULL flag check in __kmp_null_resume_wrapperHal Finkel2016-05-121-0/+2
| | | | | | | | | | This reverts a presumaby-unintentional change in: r268640 - [STATS] Use partitioned timer scheme and fixes segfaults in an x86_64 debug build of the runtime library. llvm-svn: 269259
* Fine tuning of TC* macrosPaul Osmialowski2016-05-073-3/+7
| | | | | | | | | | This patch introduces following: * TCI_* and TCD_* macros for incrementation and decrementation * Fix for invalid use of TCR_8 in one expression Differential Revision: http://reviews.llvm.org/D19880 llvm-svn: 268826
* [STATS] Use partitioned timer schemeJonathan Peyton2016-05-0511-68/+369
| | | | | | | | | | | | | | | | | | | | | | | | This change removes the current timers with ones that partition time properly. The current timers are nested, so that if a new timer, B, starts when the current timer, A, is already timing, A's time will include B's. To eliminate this problem, the partitioned timers are designed to stop the current timer (A), let the new timer run (B), and when the new timer is finished, restart the previously running timer (A). With this partitioning of time, a threads' timers all sum up to the OMP_worker_thread_life time and can now easily show the percentage of time a thread is spending in different parts of the runtime or user code. There is also a new state variable associated with each thread which tells where it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if time is spent in OMP_task_taskwait, then that thread executed tasks inside a #pragma omp taskwait construct. The changes are mostly changing the MACROs to use the new PARITIONED_* macros, the new partitionedTimers class and its methods, and new state logic. Differential Revision: http://reviews.llvm.org/D19229 llvm-svn: 268640
* NFC remove unneded spaces (test commit)Paul Osmialowski2016-05-031-1/+1
| | | | llvm-svn: 268462
* Remove architecture dependent Hwloc DEBUG sectionJonathan Peyton2016-04-251-30/+0
| | | | | | | This debug sections's functionality can be replicated using the environment variable KMP_TOPOLOGY_METHOD with different values and KMP_AFFINITY=verbose llvm-svn: 267472
* Fix buffer problem with printing long Hwloc affinity maskJonathan Peyton2016-04-251-1/+1
| | | | | | | | This change has the hwloc_bitmap_list_snprintf() function use the entire buffer to print the mask. There is no need to shorten the buffer length by 7. It only needs to be shortened by one byte. llvm-svn: 267470
* [ITTNOTIFY] Remove serialized parallel regions from frame notificationJonathan Peyton2016-04-195-65/+10
| | | | llvm-svn: 266760
* Fix trip count calculation for parallel loops in runtimeJonathan Peyton2016-04-182-26/+42
| | | | | | | | | | | | | | | The trip count calculation was incorrect for loops with large bounds. For example, for(int i=-2,000,000,000; i < 2,000,000,000; i+=50000000), the trip count calculation had overflow (trying to calculate 2,000,000,000 + 2,000,000,000 with signed integers) and wasn't giving the right value. This patch fixes this error in the runtime by using unsigned integers instead. There is still a bug in the clang compiler component because it warns that there is overflow in the test case file when there isn't. This error isn't there for the Intel Compiler. So for now, the test case is designated as XFAIL. Differential Revision: http://reviews.llvm.org/D19078 llvm-svn: 266677
* Runtime support for untied tasksJonathan Peyton2016-04-182-2/+37
| | | | | | | | | | | Introduced a counter of parts of an untied task submitted for execution. The counter controls whether all parts of the task are already finished. The compiler should generate re-submission of partially executed untied task by itself before exiting of each task part except for the lexical last part. Differential Revision: http://reviews.llvm.org/D19026 llvm-svn: 266675
* Fix for pthread_setspecific (TLS and shutdown) problemJonathan Peyton2016-04-183-14/+20
| | | | | | | | | | | | | | | Some codes that use TLS fail intermittently because one thread tries to write TLS values after the TLS key has been destroyed by another thread. This happens when one thread executes library shutdown (and destroys TLS keys), while another thread starts to execute the TLS key destructor routine. Before this change, the kmp_init_runtime flag was checked before calling pthread_* TLS functions, but this flag is set to FALSE later than the destruction of the TLS keys, which leads to failure. The fix is to check kmp_init_gtid instead, as this flag is unset *before* the destruction of TLS keys. Differential Revision: http://reviews.llvm.org/D19022 llvm-svn: 266674
* [STATS] Remove timePair class and unused functionsJonathan Peyton2016-04-182-47/+0
| | | | llvm-svn: 266634
* [STATS] print Total_* stats on their own lineJonathan Peyton2016-04-181-1/+4
| | | | llvm-svn: 266633
* [ITTNOTIFY] Correct barrier imbalance time in case of tasksJonathan Peyton2016-04-142-0/+23
| | | | | | | | | | | | | | | | ittnotify fix for barrier imbalance time in case tasks exist. In the current implementation, task execution time is included into aggregated time on a barrier. This fix calculates task execution time and corrects the arrive time by subtracting the task execution time. Since __kmp_invoke_task() can not only be called on a barrier, the field th.th_bar_arrive_time is used to check if the function was called at the barrier (th.th_bar_arrive_time != 0). So for this check, th_bar_arrive_time is set to zero right after the value is used on the barrier. Differential Revision: http://reviews.llvm.org/D19030 llvm-svn: 266332
* Exponential back off logic for test-and-set lockJonathan Peyton2016-04-145-3/+164
| | | | | | | | | | | | | | | | | | | | | | | | | This change adds back off logic in the test and set lock for better contended lock performance. It uses a simple truncated binary exponential back off function. The default back off parameters are tuned for x86. The main back off logic has a two loop structure where each is controlled by a user-level parameter: max_backoff - limits the outer loop number of iterations. This parameter should be a power of 2. min_ticks - the inner spin wait loop number of "ticks" which is system dependent and should be tuned for your system if you so choose. The "ticks" on x86 correspond to the time stamp counter, but on other architectures ticks is a timestamp derived from gettimeofday(). The user can modify these via the environment variable: KMP_SPIN_BACKOFF_PARAMS=max_backoff[,min_ticks] Currently, since the default user lock is a queuing lock, one would have to also specify KMP_LOCK_KIND=tas to use the test-and-set locks. Differential Revision: http://reviews.llvm.org/D19020 llvm-svn: 266329
* Add declarations of OpenMP 4.5 target/offload routines to headersJonathan Peyton2016-04-123-0/+21
| | | | | | All these routines are implemented in the offload library. llvm-svn: 266120
* [STATS] Remove trailing whitespace in stats source filesJonathan Peyton2016-04-054-40/+40
| | | | llvm-svn: 265437
* OMP_WAIT_POLICY changesJonathan Peyton2016-04-041-2/+15
| | | | | | | | | | | | | This change has OMP_WAIT_POLICY=active to mean that threads will busy-wait in spin loops and virtually never go to sleep. OMP_WAIT_POLICY=passive now means that threads will immediately go to sleep inside a spin loop. KMP_BLOCKTIME was the previous mechanism to specify this behavior via KMP_BLOCKTIME=0 or KMP_BLOCKTIME=infinite, but the standard OpenMP environment variable should also be able to specify this behavior. Differential Revision: http://reviews.llvm.org/D18577 llvm-svn: 265339
* Fix bug when KMP_USE_ADAPTIVE_LOCKS is 0Jonathan Peyton2016-03-301-1/+1
| | | | | | | | #endif was one line too low. If KMP_USE_ADAPTIVE_LOCKS is 0, then queuing locks would incorrectly use drdpa lock mechanism. This is a fix for https://llvm.org/bugs/show_bug.cgi?id=26649 llvm-svn: 264934
* Fix comment in kmp_wait_release.hJonathan Peyton2016-03-291-8/+6
| | | | | | | Removed reference to "ref ct" in a comment, as ref_ct no longer exists. Also moved the comment to where the task_team is about to be tested if NULL. llvm-svn: 264786
* Fix incorrect indention in kmp_alloc.cJonathan Peyton2016-03-291-73/+61
| | | | llvm-svn: 264777
* Remove dead KMP_USE_POOLED_ALLOC codeJonathan Peyton2016-03-291-78/+6
| | | | llvm-svn: 264776
* Fixing the non-x86 build by removing dependence on kmp_cpuid_tHal Finkel2016-03-273-3/+15
| | | | | | | | | | | | The problem is that the definition of kmp_cpuinfo_t contains: char name [3*sizeof (kmp_cpuid_t)]; // CPUID(0x80000002,0x80000003,0x80000004) and kmp_cpuid_t is only defined when compiling for x86. Differential Revision: http://reviews.llvm.org/D18245 llvm-svn: 264535
* [OMPT] Fix parallel_id and task_id in loop_end with schedule staticJonas Hahnfeld2016-03-241-6/+3
| | | | | | | | | For serialized parallel regions, wrong ids were reported. Now the same code is used as in kmp_dispatch.cpp which emits the correct ids. Differential Revision: http://reviews.llvm.org/D18348 llvm-svn: 264266
* [OMPT] Fix duplicate implicit_task_end events for master thread with GCCJonas Hahnfeld2016-03-241-10/+13
| | | | | | | | | | For non-serialized parallel regions the master thread issued two callbacks: The first one in kmp_gsupport.c and the second in __kmp_join_call. Therefore only trigger the callback in kmp_gsupport.c for serialized parallel regions. Differential Revision: http://reviews.llvm.org/D16716 llvm-svn: 264264
* Fix Visual Studio buildsJonathan Peyton2016-03-232-1/+10
| | | | | | | Have Visual Studio use MemoryBarrier() instead of _mm_mfence() and remove __declspec align attribute from function parameters in kmp_atomic.h llvm-svn: 264166
OpenPOWER on IntegriCloud