summaryrefslogtreecommitdiffstats
path: root/openmp
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix team reuse with foreign threadsJonathan Peyton2016-05-122-0/+84
| | | | | | | | | | | | | | After hot teams were enabled by default, the library started using levels kept in the team structure. The levels are broken in case foreign thread exits and puts its team into the pool which is then re-used by another foreign thread. The broken behavior observed is when printing the levels for each new team, one gets 1, 2, 1, 2, 1, 2, etc. This makes the library believe that every other team is nested which is incorrect. What is wanted is for the levels to be 1, 1, 1, etc. Differential Revision: http://reviews.llvm.org/D19980 llvm-svn: 269363
* New hwloc API compatibilityPaul Osmialowski2016-05-121-0/+13
| | | | | | Differential Revision: http://reviews.llvm.org/D19628 llvm-svn: 269284
* Restore NULL flag check in __kmp_null_resume_wrapperHal Finkel2016-05-121-0/+2
| | | | | | | | | | This reverts a presumaby-unintentional change in: r268640 - [STATS] Use partitioned timer scheme and fixes segfaults in an x86_64 debug build of the runtime library. llvm-svn: 269259
* Fine tuning of TC* macrosPaul Osmialowski2016-05-073-3/+7
| | | | | | | | | | This patch introduces following: * TCI_* and TCD_* macros for incrementation and decrementation * Fix for invalid use of TCR_8 in one expression Differential Revision: http://reviews.llvm.org/D19880 llvm-svn: 268826
* [STATS] Use partitioned timer schemeJonathan Peyton2016-05-0511-68/+369
| | | | | | | | | | | | | | | | | | | | | | | | This change removes the current timers with ones that partition time properly. The current timers are nested, so that if a new timer, B, starts when the current timer, A, is already timing, A's time will include B's. To eliminate this problem, the partitioned timers are designed to stop the current timer (A), let the new timer run (B), and when the new timer is finished, restart the previously running timer (A). With this partitioning of time, a threads' timers all sum up to the OMP_worker_thread_life time and can now easily show the percentage of time a thread is spending in different parts of the runtime or user code. There is also a new state variable associated with each thread which tells where it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if time is spent in OMP_task_taskwait, then that thread executed tasks inside a #pragma omp taskwait construct. The changes are mostly changing the MACROs to use the new PARITIONED_* macros, the new partitionedTimers class and its methods, and new state logic. Differential Revision: http://reviews.llvm.org/D19229 llvm-svn: 268640
* NFC remove unneded spaces (test commit)Paul Osmialowski2016-05-031-1/+1
| | | | llvm-svn: 268462
* Remove architecture dependent Hwloc DEBUG sectionJonathan Peyton2016-04-251-30/+0
| | | | | | | This debug sections's functionality can be replicated using the environment variable KMP_TOPOLOGY_METHOD with different values and KMP_AFFINITY=verbose llvm-svn: 267472
* Fix buffer problem with printing long Hwloc affinity maskJonathan Peyton2016-04-251-1/+1
| | | | | | | | This change has the hwloc_bitmap_list_snprintf() function use the entire buffer to print the mask. There is no need to shorten the buffer length by 7. It only needs to be shortened by one byte. llvm-svn: 267470
* ARM Limited license agreement from the copyright/patent holderJonathan Peyton2016-04-251-0/+50
| | | | | | | | | | | | | | | | | I have prepared some patches for LLVM OpenMP runtime, mostly addressing ARMv8 support. Before I upstream them, I must address legal issues that arose around my planned contribution. I was advised that before I send any substantial commit, I need to make sure that LICENSE.txt file in the projects repository contains a statement submitted by ARM, similar to the one provided by Intel (see "a license agreement from the copyright/patent holders"). This is the same situation as with top-level LLVM project: ARM has provided the same statement in http://llvm.org/svn/llvm-project/llvm/trunk/lib/Target/ARM/LICENSE.TXT file. Patch by Paul Osmialowski Differential Revision: http://reviews.llvm.org/D19319 llvm-svn: 267446
* [ITTNOTIFY] Remove serialized parallel regions from frame notificationJonathan Peyton2016-04-195-65/+10
| | | | llvm-svn: 266760
* Fix trip count calculation for parallel loops in runtimeJonathan Peyton2016-04-183-26/+109
| | | | | | | | | | | | | | | The trip count calculation was incorrect for loops with large bounds. For example, for(int i=-2,000,000,000; i < 2,000,000,000; i+=50000000), the trip count calculation had overflow (trying to calculate 2,000,000,000 + 2,000,000,000 with signed integers) and wasn't giving the right value. This patch fixes this error in the runtime by using unsigned integers instead. There is still a bug in the clang compiler component because it warns that there is overflow in the test case file when there isn't. This error isn't there for the Intel Compiler. So for now, the test case is designated as XFAIL. Differential Revision: http://reviews.llvm.org/D19078 llvm-svn: 266677
* Runtime support for untied tasksJonathan Peyton2016-04-182-2/+37
| | | | | | | | | | | Introduced a counter of parts of an untied task submitted for execution. The counter controls whether all parts of the task are already finished. The compiler should generate re-submission of partially executed untied task by itself before exiting of each task part except for the lexical last part. Differential Revision: http://reviews.llvm.org/D19026 llvm-svn: 266675
* Fix for pthread_setspecific (TLS and shutdown) problemJonathan Peyton2016-04-183-14/+20
| | | | | | | | | | | | | | | Some codes that use TLS fail intermittently because one thread tries to write TLS values after the TLS key has been destroyed by another thread. This happens when one thread executes library shutdown (and destroys TLS keys), while another thread starts to execute the TLS key destructor routine. Before this change, the kmp_init_runtime flag was checked before calling pthread_* TLS functions, but this flag is set to FALSE later than the destruction of the TLS keys, which leads to failure. The fix is to check kmp_init_gtid instead, as this flag is unset *before* the destruction of TLS keys. Differential Revision: http://reviews.llvm.org/D19022 llvm-svn: 266674
* [STATS] Remove timePair class and unused functionsJonathan Peyton2016-04-182-47/+0
| | | | llvm-svn: 266634
* [STATS] print Total_* stats on their own lineJonathan Peyton2016-04-181-1/+4
| | | | llvm-svn: 266633
* [ITTNOTIFY] Correct barrier imbalance time in case of tasksJonathan Peyton2016-04-142-0/+23
| | | | | | | | | | | | | | | | ittnotify fix for barrier imbalance time in case tasks exist. In the current implementation, task execution time is included into aggregated time on a barrier. This fix calculates task execution time and corrects the arrive time by subtracting the task execution time. Since __kmp_invoke_task() can not only be called on a barrier, the field th.th_bar_arrive_time is used to check if the function was called at the barrier (th.th_bar_arrive_time != 0). So for this check, th_bar_arrive_time is set to zero right after the value is used on the barrier. Differential Revision: http://reviews.llvm.org/D19030 llvm-svn: 266332
* Exponential back off logic for test-and-set lockJonathan Peyton2016-04-146-3/+165
| | | | | | | | | | | | | | | | | | | | | | | | | This change adds back off logic in the test and set lock for better contended lock performance. It uses a simple truncated binary exponential back off function. The default back off parameters are tuned for x86. The main back off logic has a two loop structure where each is controlled by a user-level parameter: max_backoff - limits the outer loop number of iterations. This parameter should be a power of 2. min_ticks - the inner spin wait loop number of "ticks" which is system dependent and should be tuned for your system if you so choose. The "ticks" on x86 correspond to the time stamp counter, but on other architectures ticks is a timestamp derived from gettimeofday(). The user can modify these via the environment variable: KMP_SPIN_BACKOFF_PARAMS=max_backoff[,min_ticks] Currently, since the default user lock is a queuing lock, one would have to also specify KMP_LOCK_KIND=tas to use the test-and-set locks. Differential Revision: http://reviews.llvm.org/D19020 llvm-svn: 266329
* Add declarations of OpenMP 4.5 target/offload routines to headersJonathan Peyton2016-04-123-0/+21
| | | | | | All these routines are implemented in the offload library. llvm-svn: 266120
* [STATS] Remove trailing whitespace in stats source filesJonathan Peyton2016-04-054-40/+40
| | | | llvm-svn: 265437
* OMP_WAIT_POLICY changesJonathan Peyton2016-04-042-2/+55
| | | | | | | | | | | | | This change has OMP_WAIT_POLICY=active to mean that threads will busy-wait in spin loops and virtually never go to sleep. OMP_WAIT_POLICY=passive now means that threads will immediately go to sleep inside a spin loop. KMP_BLOCKTIME was the previous mechanism to specify this behavior via KMP_BLOCKTIME=0 or KMP_BLOCKTIME=infinite, but the standard OpenMP environment variable should also be able to specify this behavior. Differential Revision: http://reviews.llvm.org/D18577 llvm-svn: 265339
* Fix bug when KMP_USE_ADAPTIVE_LOCKS is 0Jonathan Peyton2016-03-301-1/+1
| | | | | | | | #endif was one line too low. If KMP_USE_ADAPTIVE_LOCKS is 0, then queuing locks would incorrectly use drdpa lock mechanism. This is a fix for https://llvm.org/bugs/show_bug.cgi?id=26649 llvm-svn: 264934
* Fix comment in kmp_wait_release.hJonathan Peyton2016-03-291-8/+6
| | | | | | | Removed reference to "ref ct" in a comment, as ref_ct no longer exists. Also moved the comment to where the task_team is about to be tested if NULL. llvm-svn: 264786
* Fix incorrect indention in kmp_alloc.cJonathan Peyton2016-03-291-73/+61
| | | | llvm-svn: 264777
* Remove dead KMP_USE_POOLED_ALLOC codeJonathan Peyton2016-03-291-78/+6
| | | | llvm-svn: 264776
* [STATS] Missing check for MIC in config-ix.cmakeJonathan Peyton2016-03-281-1/+1
| | | | llvm-svn: 264616
* Fixing the non-x86 build by removing dependence on kmp_cpuid_tHal Finkel2016-03-273-3/+15
| | | | | | | | | | | | The problem is that the definition of kmp_cpuinfo_t contains: char name [3*sizeof (kmp_cpuid_t)]; // CPUID(0x80000002,0x80000003,0x80000004) and kmp_cpuid_t is only defined when compiling for x86. Differential Revision: http://reviews.llvm.org/D18245 llvm-svn: 264535
* [OMPT] Fix parallel_id and task_id in loop_end with schedule staticJonas Hahnfeld2016-03-2415-6/+141
| | | | | | | | | For serialized parallel regions, wrong ids were reported. Now the same code is used as in kmp_dispatch.cpp which emits the correct ids. Differential Revision: http://reviews.llvm.org/D18348 llvm-svn: 264266
* [OMPT] Test ids reported by ompt_get_{parallel,task}_idJonas Hahnfeld2016-03-245-3/+124
| | | | llvm-svn: 264265
* [OMPT] Fix duplicate implicit_task_end events for master thread with GCCJonas Hahnfeld2016-03-244-10/+41
| | | | | | | | | | For non-serialized parallel regions the master thread issued two callbacks: The first one in kmp_gsupport.c and the second in __kmp_join_call. Therefore only trigger the callback in kmp_gsupport.c for serialized parallel regions. Differential Revision: http://reviews.llvm.org/D16716 llvm-svn: 264264
* Fix Visual Studio buildsJonathan Peyton2016-03-232-1/+10
| | | | | | | Have Visual Studio use MemoryBarrier() instead of _mm_mfence() and remove __declspec align attribute from function parameters in kmp_atomic.h llvm-svn: 264166
* [OMPT] Make tests require OMPT_BLAMEJonas Hahnfeld2016-03-227-8/+6
| | | | | | | ompt_event_barrier_{begin,end} are optional blame events. In total it doesn't make any sense to test partially built OMPT support. llvm-svn: 264031
* [OMPT] Create infrastructure and add first tests for OMPTJonas Hahnfeld2016-03-228-0/+404
| | | | | | | | | | | | | | | | | | Some basic checks next to the implementation should futher lower the possibility to introduce regressions. (Note that this would have catched the ordering issue fixed in rL258866 and pointed to rL263940.) The tests are implementation dependent in one point because they assume that thread ids are assigned in ascending order. This is not defined by the standard but currently ensured in libomp. We have to think about another way of ordering the threads should this ever be subject to change... Note that this isn't aiming at replacing the implementation independent test-suite at https://github.com/OpenMPToolsInterface/ompt-test-suite! Differential Revision: http://reviews.llvm.org/D16715 llvm-svn: 264027
* [STATS] Add OMP_critical and OMP_critical_wait timersJonathan Peyton2016-03-212-1/+6
| | | | | | | OMP_critical - time spent in critical section OMP_critical_wait - time spent waiting to enter a critical section llvm-svn: 263967
* [STATS] separate noTotal bit flag from onlyInMaster and noUnitsJonathan Peyton2016-03-211-22/+22
| | | | | | | | | | | This change logically separates the stats_flags_e::noTotal bit flag from the stats_flags_e::onlyInMaster and stats_flags_e::noUnits bit flags. If no TOTAL_foo output is wanted for a particular statistic, the flag must be explicitly included in that statistic's flags. Differential Revision: http://reviews.llvm.org/D18198 llvm-svn: 263954
* [OMPT] Fix wrong parent_task_id in serialized parallel_begin with GCCJonas Hahnfeld2016-03-211-10/+15
| | | | | | | | | | Without this patch a simple '#pragma omp parallel num_threads(1)' leads to ompt_event_parallel_begin: parent_task_id=3, [...], parallel_id=2, [...] ompt_event_parallel_end: parallel_id=2, task_id=4, [...] Differential Revision: http://reviews.llvm.org/D16714 llvm-svn: 263940
* Update www/index.html to reflect current status of OpenMP projectJonathan Peyton2016-03-181-34/+40
| | | | llvm-svn: 263788
* [CMake] Fix Windows build problem for CMake versions < 3.3Jonathan Peyton2016-03-161-4/+5
| | | | | | | | | | | | Building libomp using CMake versions < 3.3 caused a link time error. These errors occurred because when assembling z_Windows_NT-586_asm.asm, the definitions: OMPT_SUPPORT, _M_AMD64|_M_IA32 weren't defined on the command line. To fix the problem, the COMPILE_FLAGS property for the assembly file is appended to instead of the COMPILE_DEFINITIONS property being set. For whatever reason, the COMPILE_DEFINITIONS property doesn't pick up the definitions for assembly files for the older CMake versions. llvm-svn: 263651
* Fix spelling error in commentJonathan Peyton2016-03-151-1/+1
| | | | llvm-svn: 263586
* [STATS] Print "Unknown" for frequency if it wasn't able to be parsedJonathan Peyton2016-03-152-2/+5
| | | | llvm-svn: 263583
* [STATS] Fix comments in kmp_stats.hJonathan Peyton2016-03-151-22/+17
| | | | llvm-svn: 263582
* [STATS] Add header information to stats print outJonathan Peyton2016-03-154-61/+69
| | | | | | | | | | This change adds a header to the printout of the statistics which includes the time, machine name, and processor info if available. This change also includes some cosmetic changes like using enum casting for timer and counter iteration. Differential Revision: http://reviews.llvm.org/D18153 llvm-svn: 263580
* Initialize two variables in kmp_tasking.Samuel Antao2016-03-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Two initialized local variables are causing clang to produce warnings: ``` ./src/projects/openmp/runtime/src/kmp_tasking.c:3019:5: error: variable 'num_tasks' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized] default: ^~~~~~~ ./src/projects/openmp/runtime/src/kmp_tasking.c:3027:21: note: uninitialized use occurs here for( i = 0; i < num_tasks; ++i ) { ^~~~~~~~~ ./src/projects/openmp/runtime/src/kmp_tasking.c:2968:28: note: initialize the variable 'num_tasks' to silence this warning kmp_uint64 i, num_tasks, extras; ^ = 0 ./src/projects/openmp/runtime/src/kmp_tasking.c:3019:5: error: variable 'extras' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized] default: ^~~~~~~ ./src/projects/openmp/runtime/src/kmp_tasking.c:3022:52: note: uninitialized use occurs here KMP_DEBUG_ASSERT(tc == num_tasks * grainsize + extras); ^~~~~~ ./src/projects/openmp/runtime/src/kmp_debug.h:62:60: note: expanded from macro 'KMP_DEBUG_ASSERT' #define KMP_DEBUG_ASSERT( cond ) KMP_ASSERT( cond ) ^ ./src/projects/openmp/runtime/src/kmp_debug.h:60:51: note: expanded from macro 'KMP_ASSERT' #define KMP_ASSERT( cond ) ( (cond) ? 0 : __kmp_debug_assert( #cond, __FILE__, __LINE__ ) ) ^ ./src/projects/openmp/runtime/src/kmp_tasking.c:2968:36: note: initialize the variable 'extras' to silence this warning kmp_uint64 i, num_tasks, extras; ^ = 0 2 errors generated. ``` This patch initializes these two variables. Reviewers: tlwilmar, jlpeyton Subscribers: tlwilmar, openmp-commits Differential Revision: http://reviews.llvm.org/D17909 llvm-svn: 263316
* [STATS] change TASK_execution name to OMP_taskJonathan Peyton2016-03-112-3/+3
| | | | llvm-svn: 263291
* [STATS] Add a total statistics countJonathan Peyton2016-03-112-75/+59
| | | | | | | | | | | | This change removes synthesized stats and instead has all timers print out a total which is the aggregate statistics across threads. This is displayed as "Total_foo" at the end of program. The stats_flags_e::synthesized flag is removed and the printStats() function is split into two separate functions: printTimerStats() which can display the aggregate total and printCounterStats(). Differential Revision: http://reviews.llvm.org/D17869 llvm-svn: 263290
* [STATS] fix output formatting when sample count is 0Jonathan Peyton2016-03-031-8/+19
| | | | | | Force 0.0 to be displayed for all statistics which have sample count equal to 0 llvm-svn: 262658
* [STATS] fix master and single timersJonathan Peyton2016-03-031-3/+5
| | | | | | Only the thread which executes the single/master section will update its statistics. llvm-svn: 262656
* Add new OpenMP 4.5 taskloop construct featureJonathan Peyton2016-03-024-6/+391
| | | | | | | | | | | | | | | | | | From the standard: The taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using OpenMP tasks. The iterations are distributed across tasks created by the construct and scheduled to be executed. This initial implementation uses a simple linear tasks distribution algorithm. Later we can add other algorithms to speedup generation of huge number of tasks (i.e., tree-like tasks generation should be faster). This needs to be put into the OpenMP runtime library in order for the compiler team to develop the compiler side of the implementation. Differential Revision: http://reviews.llvm.org/D17404 llvm-svn: 262535
* Forgot to add test files for doacross and task priority.Jonathan Peyton2016-03-022-0/+78
| | | | llvm-svn: 262533
* Add new OpenMP 4.5 doacross loop nest featureJonathan Peyton2016-03-025-8/+341
| | | | | | | | | | | | | | | | | | | | | | From the standard: A doacross loop nest is a loop nest that has cross-iteration dependence. An iteration is dependent on one or more lexicographically earlier iterations. The ordered clause parameter on a loop directive identifies the loop(s) associated with the doacross loop nest. The init/fini routines allocate/free doacross buffer(s) for each loop for each thread. The wait routine waits for a flag designated by the dependence vector. The post routine sets the flag designated by current iteration vector. We use a similar technique of shared buffer indices that covers up to 7 nowait loops executed simultaneously by different threads (number 7 has no real meaning, just heuristic value). Also, the size of structures are kept intact via reducing dummy arrays. This needs to be put into the OpenMP runtime library in order for the compiler team to develop the compiler side of the implementation. Differential Revision: http://reviews.llvm.org/D17399 llvm-svn: 262532
* Add new OpenMP 4.5 affinity APIJonathan Peyton2016-02-258-0/+274
| | | | | | | | | | | | | | | | | | | | | | | | | | This change introduces the new OpenMP 4.5 affinity api surrounding OpenMP Places. There are six new entry points: Typically called in serial region: * omp_get_num_places - returns the number of places available to the execution environment in the place list. * omp_get_place_num_procs - returns the number of processors available to the execution environment in the specified place. * omp_get_place_proc_ids - returns the numerical identifiers of the processors available to the execution environment in the specified place. Typically called inside parallel region: * omp_get_place_num - returns the place number of the place to which the encountering thread is bound. * omp_get_partition_num_places - returns the number of places in the place partition of the innermost implicit task. * omp_get_partition_place_nums - returns the list of place numbers corresponding to the places in the place-var ICV of the innermost implicit task. Differential Revision: http://reviews.llvm.org/D17417 llvm-svn: 261915
OpenPOWER on IntegriCloud