summaryrefslogtreecommitdiffstats
path: root/openmp/runtime/src/kmp_stats.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [OpenMP] NFC: Fix trivial typos in commentsKazuaki Ishizaki2020-01-071-1/+1
| | | | | | | | | | | | Reviewers: jdoerfert, Jim Reviewed By: Jim Subscribers: Jim, mgorny, guansong, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72285
* [OpenMP][Stats] Fix stats gathering for distribute and team clauseJonathan Peyton2019-04-031-1/+0
| | | | | | | | | | | The distribute clause needs an explicit push of a timer. The teams clause needs a timer added and also, similarly to parallel, exchanged with the serial timer when encountered so that serial regions are counted properly. Differential Revision: https://reviews.llvm.org/D59801 llvm-svn: 357621
* [OpenMP][stats] Update stats gathering macrosJonathan Peyton2019-03-081-3/+5
| | | | llvm-svn: 355739
* Update more file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | | to reflect the new license. These used slightly different spellings that defeated my regular expressions. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351648
* [OpenMP][Stats] Cleanup stats gathering codeJonathan Peyton2018-07-301-69/+278
| | | | | | | | | | | | | | | | | | | | | | 1) Remove unnecessary data from list node structure 2) Remove timerPair in favor of pushing/popping explicitTimers. This way, nested timers will work properly. 3) Fix #pragma omp critical timers 4) Add histogram capability 5) Add KMP_STATS_FILE formatting capability 6) Have time partitioned into serial & parallel by introducing partitionedTimers::exchange(). This also counts the number of serial regions in the executable. 7) Fix up the timers around OMP loops so that scheduling overhead and work are both counted correctly. 8) Fix up the iterations statistics so they count the number of iterations the thread receives at each loop scheduling event 9) Change timers so there is only one RDTSC read per event change 10) Fix up the outdated comments for the timers Differential Revision: https://reviews.llvm.org/D49699 llvm-svn: 338276
* Apply formatting changesJonathan Peyton2017-10-201-2/+0
| | | | | | | | | | .clang-format's comments are removed and a (hopefully) final set of formatting changes are applied. Differential Revision: https://reviews.llvm.org/D38837 Differential Revision: https://reviews.llvm.org/D38920 llvm-svn: 316227
* Clang-format and whitespace cleanup of source codeJonathan Peyton2017-05-121-546/+538
| | | | | | | | | | | | | This patch contains the clang-format and cleanup of the entire code base. Some of clang-formats changes made the code look worse in places. A best effort was made to resolve the bulk of these problems, but many remain. Most of the problems were mangling line-breaks and tabbing of comments. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D32659 llvm-svn: 302929
* Update stats-gathering codeJonathan Peyton2016-11-141-12/+26
| | | | | | | | | | | | | Have developer timers use partitioning scheme which also required that some redundant developer timers be removed in favor of the already existing normal timers. Move per thread stats initialization to just after global thread id assignment which is as early as possible. Also put all global stats initialization code in __kmp_stats_init() and all global stats destruction code in __kmp_stats_fini(). Differential Revision: https://reviews.llvm.org/D26361 llvm-svn: 286892
* [STATS] Adding process id to output filenameJonathan Peyton2016-06-211-3/+19
| | | | | | | | | This change appends the process id to the KMP_STATS_FILE (if specified) which enables MPI processes to output their stats to separate files. Differential Revision: http://reviews.llvm.org/D21386 llvm-svn: 273273
* [STATS] Use partitioned timer schemeJonathan Peyton2016-05-051-6/+73
| | | | | | | | | | | | | | | | | | | | | | | | This change removes the current timers with ones that partition time properly. The current timers are nested, so that if a new timer, B, starts when the current timer, A, is already timing, A's time will include B's. To eliminate this problem, the partitioned timers are designed to stop the current timer (A), let the new timer run (B), and when the new timer is finished, restart the previously running timer (A). With this partitioning of time, a threads' timers all sum up to the OMP_worker_thread_life time and can now easily show the percentage of time a thread is spending in different parts of the runtime or user code. There is also a new state variable associated with each thread which tells where it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if time is spent in OMP_task_taskwait, then that thread executed tasks inside a #pragma omp taskwait construct. The changes are mostly changing the MACROs to use the new PARITIONED_* macros, the new partitionedTimers class and its methods, and new state logic. Differential Revision: http://reviews.llvm.org/D19229 llvm-svn: 268640
* [STATS] print Total_* stats on their own lineJonathan Peyton2016-04-181-1/+4
| | | | llvm-svn: 266633
* [STATS] Remove trailing whitespace in stats source filesJonathan Peyton2016-04-051-24/+24
| | | | llvm-svn: 265437
* [STATS] Print "Unknown" for frequency if it wasn't able to be parsedJonathan Peyton2016-03-151-1/+4
| | | | llvm-svn: 263583
* [STATS] Add header information to stats print outJonathan Peyton2016-03-151-31/+50
| | | | | | | | | | This change adds a header to the printout of the statistics which includes the time, machine name, and processor info if available. This change also includes some cosmetic changes like using enum casting for timer and counter iteration. Differential Revision: http://reviews.llvm.org/D18153 llvm-svn: 263580
* [STATS] Add a total statistics countJonathan Peyton2016-03-111-50/+31
| | | | | | | | | | | | This change removes synthesized stats and instead has all timers print out a total which is the aggregate statistics across threads. This is displayed as "Total_foo" at the end of program. The stats_flags_e::synthesized flag is removed and the printStats() function is split into two separate functions: printTimerStats() which can display the aggregate total and printCounterStats(). Differential Revision: http://reviews.llvm.org/D17869 llvm-svn: 263290
* [STATS] fix output formatting when sample count is 0Jonathan Peyton2016-03-031-8/+19
| | | | | | Force 0.0 to be displayed for all statistics which have sample count equal to 0 llvm-svn: 262658
* Fix stats build problem.Jonathan Peyton2015-09-241-4/+0
| | | | | | | | This change removes the KMP_STATS_ENABLED macro inside kmp_stats.cpp since it is only compiled anyways when LIBOMP_STATS=on. Also, include kmp_config.h in kmp_stats.h to ensure KMP_STATS_ENABLED is defined. llvm-svn: 248494
* Tidy statistics collectionJonathan Peyton2015-08-111-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes some statistics counters and timers which were not used, adds new counters and timers for some language features that were not monitored previously and separates the counters and timers into those which are of interest for investigating user code and those which are only of interest to the developer of the runtime itself. The runtime developer statistics are now ony collected if the additional #define KMP_DEVELOPER_STATS is set. Additional user statistics which are now collected include: * Count of nested parallelism (omp parallel inside a parallel region) * Count of omp distribute occurrences * Count of omp teams occurrences * Counts of task related statistics (taskyield, task execution, task cancellation, task steal) * Values passed to omp_set_numtheads * Time spent in omp single and omp master None of this affects code compiled without stats gathering enabled, which is the normal library build mode. This also fixes the CMake build by linking to the standard c++ library when building the stats library as it is a requirement. The normal library does not have this requirement and its link phase is left alone. Differential Revision: http://reviews.llvm.org/D11759 llvm-svn: 244677
* I apologise in advance for the size of this check-in. At Intel we doJim Cownie2014-10-071-0/+615
understand that this is not friendly, and are working to change our internal code-development to make it easier to make development features available more frequently and in finer (more functional) chunks. Unfortunately we haven't got that in place yet, and unpicking this into multiple separate check-ins would be non-trivial, so please bear with me on this one. We should be better in the future. Apologies over, what do we have here? GGC 4.9 compatibility -------------------- * We have implemented the new entrypoints used by code compiled by GCC 4.9 to implement the same functionality in gcc 4.8. Therefore code compiled with gcc 4.9 that used to work will continue to do so. However, there are some other new entrypoints (associated with task cancellation) which are not implemented. Therefore user code compiled by gcc 4.9 that uses these new features will not link against the LLVM runtime. (It remains unclear how to handle those entrypoints, since the GCC interface has potentially unpleasant performance implications for join barriers even when cancellation is not used) --- new parallel entry points --- new entry points that aren't OpenMP 4.0 related These are implemented fully :- GOMP_parallel_loop_dynamic() GOMP_parallel_loop_guided() GOMP_parallel_loop_runtime() GOMP_parallel_loop_static() GOMP_parallel_sections() GOMP_parallel() --- cancellation entry points --- Currently, these only give a runtime error if OMP_CANCELLATION is true because our plain barriers don't check for cancellation while waiting GOMP_barrier_cancel() GOMP_cancel() GOMP_cancellation_point() GOMP_loop_end_cancel() GOMP_sections_end_cancel() --- taskgroup entry points --- These are implemented fully. GOMP_taskgroup_start() GOMP_taskgroup_end() --- target entry points --- These are empty (as they are in libgomp) GOMP_target() GOMP_target_data() GOMP_target_end_data() GOMP_target_update() GOMP_teams() Improvements in Barriers and Fork/Join -------------------------------------- * Barrier and fork/join code is now in its own file (which makes it easier to understand and modify). * Wait/release code is now templated and in its own file; suspend/resume code is also templated * There's a new, hierarchical, barrier, which exploits the cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve fork/join and barrier performance. ***BEWARE*** the new source files have *not* been added to the legacy Cmake build system. If you want to use that fixes wil be required. Statistics Collection Code -------------------------- * New code has been added to collect application statistics (if this is enabled at library compile time; by default it is not). The statistics code itself is generally useful, the lightweight timing code uses the X86 rdtsc instruction, so will require changes for other architectures. The intent of this code is not for users to tune their codes but rather 1) For timing code-paths inside the runtime 2) For gathering general properties of OpenMP codes to focus attention on which OpenMP features are most used. Nested Hot Teams ---------------- * The runtime now maintains more state to reduce the overhead of creating and destroying inner parallel teams. This improves the performance of code that repeatedly uses nested parallelism with the same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL envirable to a depth to enable this (and, of course, OMP_NESTED=true to enable nested parallelism at all). Improved Intel(r) VTune(Tm) Amplifier support --------------------------------------------- * The runtime provides additional information to Vtune via the itt_notify interface to allow it to display better OpenMP specific analyses of load-imbalance. Support for OpenMP Composite Statements --------------------------------------- * Implement new entrypoints required by some of the OpenMP 4.1 composite statements. Improved ifdefs --------------- * More separation of concepts ("Does this platform do X?") from platforms ("Are we compiling for platform Y?"), which should simplify future porting. ScaleMP* contribution --------------------- Stack padding to improve the performance in their environment where cross-node coherency is managed at the page level. Redesign of wait and release code --------------------------------- The code is simplified and performance improved. Bug Fixes --------- *Fixes for Windows multiple processor groups. *Fix Fortran module build on Linux: offload attribute added. *Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen. *Fix an inconsistent error message for KMP_PLACE_THREADS environment variable. llvm-svn: 219214
OpenPOWER on IntegriCloud