summaryrefslogtreecommitdiffstats
path: root/openmp/runtime/src
Commit message (Collapse)AuthorAgeFilesLines
...
* Replace a bad instance of __kmp_free() with KMP_CPU_FREE_ARRAY() macro.Jonathan Peyton2016-09-021-1/+1
| | | | llvm-svn: 280530
* Use 'critical' reduction method when 'atomic' is not available but requested.Jonathan Peyton2016-09-022-8/+16
| | | | | | | | | | | | | In case atomic reduction method is not available (the compiler can't generate it) the assertion failure occurred if KMP_FORCE_REDUCTION=atomic was specified. This change replaces the assertion with a warning and sets the reduction method to the default one - 'critical'. Patch by Olga Malysheva Differential Revision: https://reviews.llvm.org/D23990 llvm-svn: 280519
* cleanup: fixed names of dummy arguments of Fortran interfaces declarations, ↵Andrey Churbanov2016-08-174-138/+158
| | | | | | no functional changes done llvm-svn: 278951
* Fixes for hierarchical barrier (possible hang if team size changed).Andrey Churbanov2016-08-111-0/+14
| | | | | | Differential Revision: http://reviews.llvm.org/D23175 llvm-svn: 278332
* kmp_gsupport: Fix library initialization with taskgroupJonas Hahnfeld2016-08-081-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D23259 llvm-svn: 278003
* Do not block on explicit task depending on proxy taskJonas Hahnfeld2016-08-082-10/+16
| | | | | | | | | | | | | | | | | | | | | | Consider the following code: int dep; #pragma omp target nowait depend(out: dep) { sleep(1); } #pragma omp task depend(in: dep) { printf("Task with dependency\n"); } printf("Doing some work...\n"); In its current state the runtime will block on the second task and not continue execution. Differential Revision: https://reviews.llvm.org/D23116 llvm-svn: 277992
* __kmp_free_task: Fix for serial explicit tasks producing proxy tasksJonas Hahnfeld2016-08-081-14/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following code which may be executed by a serial team: int dep; #pragma omp target nowait depend(out: dep) { sleep(1); } #pragma omp task depend(in: dep) { #pragma omp target nowait { sleep(1); } } Here the explicit task may not be freed until the nested proxy task has finished. The current code hasn't considered this and called __kmp_free_task anyway which triggered an assert because of remaining incomplete children: KMP_DEBUG_ASSERT( TCR_4(taskdata->td_incomplete_child_tasks) == 0 ); Differential Revision: https://reviews.llvm.org/D23115 llvm-svn: 277991
* Fixed x2APIC discovery for 256-processor architectures.Andrey Churbanov2016-08-051-3/+3
| | | | | | | | Mask for value read from ebx register returned by CPUID expanded to 0xFFFF. Differential Revision: https://reviews.llvm.org/D23203 llvm-svn: 277825
* kmp_taskdeps.cpp: Fix debugging outputJonas Hahnfeld2016-08-041-3/+5
| | | | | | | | node->dn.task is only filled after the dependencies are already processed. This currently leads to unhelpful output from KA_TRACE or even a crash if one enables KMP_SUPPORT_GRAPH_OUTPUT. llvm-svn: 277717
* Disable KMP_CANCEL_THREADS on AndroidPirama Arumuga Nainar2016-08-031-0/+6
| | | | | | | | | | | | Summary: Android does not have pthread_cancel. Disable KMP_CANCEL_THREADS if __ANDROID__ is defined. Subscribers: tberghammer, srhines, openmp-commits, danalbert Differential Revision: https://reviews.llvm.org/D23029 llvm-svn: 277618
* Make balanced affinity work on AArch64.Paul Osmialowski2016-07-291-57/+141
| | | | | | | | | | | This patch enables balanced affinity on machines that do not have hardware threads and have cores clustered into packages. In facts, balacing algorithm could be generalized for any arrangement with at least two levels of hierarchy (depth > 1). Differential Revision: https://reviews.llvm.org/D22365 llvm-svn: 277212
* Replace enum types in variadic functions by build-in types.Samuel Antao2016-07-223-5/+17
| | | | | | | | | | | | | | | | | | | | | | | Summary: When compiling the runtime library with clang we get warnings like: ``` error: passing an object that undergoes default argument promotion to 'va_start' has undefined behavior [-Werror,-Wvarargs] va_start( args, id ); ^ note: parameter of type 'kmp_i18n_id_t' (aka 'kmp_i18n_id') is declared here kmp_i18n_id_t id, ``` My understanding is that the va_start macro only gets the promoted type so it won't know what was the exact type of the argument, which can potentially not work for some targets given that the implementation of the the calling convention could not be done properly. This patch fixes that by using a built-in type in the function signature. Reviewers: tlwilmar, jlpeyton, AndreyChurbanov Subscribers: arpith-jacob, carlo.bertolli, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D22427 llvm-svn: 276428
* http://reviews.llvm.org/D22134: Implementation of OpenMP 4.5 nonmonotonic ↵Andrey Churbanov2016-07-113-79/+176
| | | | | | schedule modifier llvm-svn: 275052
* Improving EPCC performance when linking with hwlocJonathan Peyton2016-07-083-2/+16
| | | | | | | | | | | | | | When linking with libhwloc, the ORDERED EPCC test slows down on big machines (> 48 cores). Performance analysis showed that a cache thrash was occurring and this padding helps alleviate the problem. Also, inside the main spin-wait loop in kmp_wait_release.h, we can eliminate the references to the global shared variables by instead creating a local variable, oversubscribed and instead checking that. Differential Revision: http://reviews.llvm.org/D22093 llvm-svn: 274894
* D22138: Added more Intel compiler versions as allowed build compilersAndrey Churbanov2016-07-081-0/+4
| | | | llvm-svn: 274854
* D22137: Memory leak fixed by adding missed cleanup of single level array of ↵Andrey Churbanov2016-07-081-2/+4
| | | | | | hot teams info llvm-svn: 274851
* D22136: Memory leaks fixed by adding missed __kmp_free() callsAndrey Churbanov2016-07-081-0/+2
| | | | llvm-svn: 274850
* __kmp_partition_places: Update assertion for new parameter update_master_onlyJonas Hahnfeld2016-07-041-2/+2
| | | | | | | | | | | | If update_master_only is set the place list is not completely traversed and therefore this assertion failed. Make it only trigger if update_master_only is false. (was introduced by D20539) Differential Revision: http://reviews.llvm.org/D21925 llvm-svn: 274482
* Fix checks on schedule structJonathan Peyton2016-07-011-19/+13
| | | | | | | | | | | | This change fixes an error in comparing the existing schedule on the team to the new schedule, in the chunk field. Also added additional checks and used KMP_CHECK_UPDATE where appropriate. Patch by Terry Wilmarth. Differential Revision: http://reviews.llvm.org/D21897 llvm-svn: 274371
* Improve performance of #pragma omp singleJonathan Peyton2016-07-011-2/+4
| | | | | | | | | | | | EPCC Performance of single is considerably worse than plain barrier. Adding a read-only check to the code before the atomic compare-and-store helps considerably. Patch by Terry Wilmarth. Differential Revision: http://reviews.llvm.org/D21893 llvm-svn: 274369
* Fix bugs in TAS and futex lockJonathan Peyton2016-06-281-3/+3
| | | | | | | | | * Incorrect lock value written in __kmp_test_futex_lock * Incorrect lock value check in tas/futex lock with USE_LOCK_PROFILE on Patch by Hansang Bae llvm-svn: 274053
* Revert r273898's UNICODE quick fix in favor of CMake's remove_definitions()Jonathan Peyton2016-06-282-6/+6
| | | | | | | | | | | UNICODE and _UNICODE defintions were added in the LLVM CMake build system. While on Unices, the UNICODE/_UNICODE macros don't cause problems, on Windows only ittnotify_static.c should be compiled using -DUNICODE. We are still looking at a proper fix, but this change sets the build back to exactly what it was doing before. Also, a comment and TODO were added in the src/CMakeLists.txt file to help explain. llvm-svn: 274052
* Fix the Windows build after r273599Hans Wennborg2016-06-272-1/+6
| | | | | | | | | | | | | | That patch made all LLVM projects build with -DUNICODE. However, this doesn't work for the OpenMP runtime. But just overriding the flag with -UUNICODE breaks compiling ittnotify_static.c, which for some reason needs to be compiled with -DUNICIODE. Note that compiling ittnotify.h with -DUNICODE does not work though. This seems like a mess. This commit fixes it for now, but it would be great if someone who works on the OpenMP runtime could fix it properly. llvm-svn: 273898
* Fix bug in futex fast path inside kmp_csupport.cJonathan Peyton2016-06-221-1/+1
| | | | llvm-svn: 273439
* Apply the KMP_USE_FUTEX feature macro everywhereJonathan Peyton2016-06-223-23/+24
| | | | llvm-svn: 273438
* Add debug trace messages for taskloopJonathan Peyton2016-06-211-0/+5
| | | | llvm-svn: 273299
* Bug fix for hang when tasks used in nested parallelJonathan Peyton2016-06-211-3/+3
| | | | | | | | | | | | Bug fix for hang when omp task and nested parallelism used together. Still some problem remains with task state saving/restoring, but user's case works fine now. All tasking unit tests passed as well. Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21558 llvm-svn: 273297
* Performance improvement: accessing thread struct as opposed to team structJonathan Peyton2016-06-211-12/+12
| | | | | | | | | | | Replaced readings of nproc from team structure with ones from thread structure to improve performance. Patch by Andrey Churbanov. Differential Revision: http://reviews.llvm.org/D21559 llvm-svn: 273293
* Addition of debugger comments and whitespaceJonathan Peyton2016-06-213-9/+12
| | | | | | | | | | | | The removal of legacy code to support long-deprecated debugger support library resulted in some whitespace changes. Comments from that legacy code were made public as they may be useful for other debuggers. Patch by Olga Malysheva. Differential Revision: http://reviews.llvm.org/D21391 llvm-svn: 273282
* Improvements to process affinity mask settingJonathan Peyton2016-06-211-51/+102
| | | | | | | | | | | | A couple improvements: 1) Add ability to limit fullMask size when KMP_HW_SUBSET limits resources. 2) Make KMP_HW_SUBSET work for affinity_none, and only limit fullMask in this case. Patch by Andrey Churbanov. Differential Revision: http://reviews.llvm.org/D21528 llvm-svn: 273278
* Bug fix for segfault in stubs libraryJonathan Peyton2016-06-211-3/+7
| | | | | | | | | | | | There was a segfault in the stubs library in posix_memalign because of a bad parameter. The fix is to send address of the pointer as a parameter. Also added check of result of posix_memalign. Patch by Andrey Churbanov. Differential Revision: http://reviews.llvm.org/D21529 llvm-svn: 273276
* [STATS] Adding process id to output filenameJonathan Peyton2016-06-212-4/+20
| | | | | | | | | This change appends the process id to the KMP_STATS_FILE (if specified) which enables MPI processes to output their stats to separate files. Differential Revision: http://reviews.llvm.org/D21386 llvm-svn: 273273
* Fix typos in Fortran headersJonathan Peyton2016-06-212-6/+6
| | | | | | | | Fix typos in Fortran headers to match spec. Patch by Andrey Churbanov. Differential Revision: http://reviews.llvm.org/D21531 llvm-svn: 273272
* Change hwloc discovery algorithm to print topology only for accessible resourcesJonathan Peyton2016-06-161-17/+29
| | | | | | | | | | | | | Change hwloc discovery algorithm to print topology for only accessible resources, and report uniformity correspondingly, similar to what other topology discovery algorithms do. Fixes minor inconsistency in total topology reported and resources used for threads binding in case hwloc used. Patch by Andrey Churbanov. Differential Revision: http://reviews.llvm.org/D21389 llvm-svn: 272952
* Teach OpenMP Library to use Hwloc on WindowsJonathan Peyton2016-06-164-48/+102
| | | | | | | | | | | | | | | | | | | This patch allows a user to enable Hwloc on windows. There are three main changes in here: 1.kmp.h - Move definitions/declarations out of KMP_OS_WINDOWS guard (our windows implementation of affinity) because they need to be defined when KMP_USE_HWLOC is on as well. 2.teach __kmp_set_system_affinity, __kmp_get_system_affinity, __kmp_get_proc_group, and __kmp_affinity_bind_thread how to use hwloc. 3.teach CMake how to include hwloc when building Windows Another minor change in here is to make sure that anything under KMP_USE_HWLOC is also guarded by KMP_AFFINITY_SUPPORTED as well. This is to prevent Mac builds from requiring anything from Hwloc. Differential Revision: http://reviews.llvm.org/D21441 llvm-svn: 272951
* Fix for crash in task dependenciesJonathan Peyton2016-06-161-1/+1
| | | | | | | | | | | | With single thread using __kmpc_omp_wait_deps segfaults in OpenMP runtime. Offloading with depend also encounters this problem when we generate kmpc_omp_wait_deps instead of kmpc_omp_task_with_deps. Patch by Alex Duran Differential Revision: http://reviews.llvm.org/D21384 llvm-svn: 272949
* Fixed missing memory cleanup in __kmp_affinity_create_hwloc_map()Jonathan Peyton2016-06-161-0/+2
| | | | | | | | | | | Cleanup: fixed missing memory cleanup in couple of corner cases. Fixes possible memory leak in some corner cases Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21355 llvm-svn: 272946
* Reduce perf impact of redundant ittnotify callsJonathan Peyton2016-06-163-8/+18
| | | | | | | | | | | | Improved performance of ittnotify calls by request from ittnotify owner: calls to __itt_string_handle_create made unique (it was called multiple times). Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21353 llvm-svn: 272945
* Deprecate KMP_PLACE_THREADS and rename as KMP_HW_SUBSETJonathan Peyton2016-06-163-33/+55
| | | | | | | | | | | | | Deprecate KMP_PLACE_THREADS and rename it to KMP_HW_SUBSET due to confusion about its purpose and function among users. KMP_HW_SUBSET is an environment variable which allows users to easily pick a subset of the hardware topology to use. e.g., KMP_HW_SUBSET=30c,2t means use 30 cores, 2 threads per core. Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21340 llvm-svn: 272937
* Bug fix: crash if teams executed on hostJonathan Peyton2016-06-161-0/+1
| | | | | | | | | | | | | | Added argv array check/allocation for parallel directly nested inside the teams construct, as new coming Fortran codegen passes parameters directly into kmpc_fork_call missing same parameters in kmpc_fork_teams (earlier codegen passed to parallel the subset of parameter passed to teams, and thus no check/allocation needed). Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21336 llvm-svn: 272935
* Fix large overhead with itt notifications on region/barrier name composingJonathan Peyton2016-06-141-5/+19
| | | | | | | | | | | | | | | Currently, there is a big overhead in reporting of loop metadata through ittnotify. The pair of functions: __kmp_str_loc_init/__kmp_str_loc_free are replaced with strchr/atoi calls. Thus, a lot of time consuming actions are skipped - many memory allocations/deallocations, heavy string duplication, etc. The loop metadata only needs line and column info from the source string, so no allocations and string splitting actually needed. Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21309 llvm-svn: 272698
* Remove unused wait/release code.Jonathan Peyton2016-06-144-44/+0
| | | | | | | | | | | | Cleanup - unused code removal. TODO: consider to remove (replace with flag class methods) also kmp_wait_64 and kmp_release_64 routines. Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21332 llvm-svn: 272697
* Whitespace cleanup of dllexportsJonathan Peyton2016-06-141-2/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D21331 llvm-svn: 272691
* Renaming change: 41 -> 45 and 4.1 -> 4.5Jonathan Peyton2016-06-1421-81/+81
| | | | | | | | OpenMP 4.1 is now OpenMP 4.5. Any mention of 41 or 4.1 is replaced with 45 or 4.5. Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that 41 is deprecated and to use 45 instead. llvm-svn: 272687
* Bug fix for Bugzilla bug 26602: Remove function bodies with KMP_ASSERT(0)Jonathan Peyton2016-06-131-4/+4
| | | | | | | | | | | | | | | | Fix for bugzilla https://llvm.org/bugs/show_bug.cgi?id=26602. Removed functions body consisted of the only KMP_ASSERT(0) statement. Thus possible runtime crash converted to compile-time error, which looks preferable (faster possible error detection). TODO: consider C++11 static assert as an alternative, that could make the diagnostics better. Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21304 llvm-svn: 272590
* Affinity mask processing improvementsJonathan Peyton2016-06-134-57/+56
| | | | | | | | | | | | Remove static specifier from var fullMask and remove kmp_get_fullMask() routine. When iterating through procs in a mask, always check if proc is in fullMask (this check was missing in a few places). Patch by Brian Bliss. Differential Revision: http://reviews.llvm.org/D21300 llvm-svn: 272589
* Exclude untied tasks from task stealing constraintJonathan Peyton2016-06-131-2/+2
| | | | | | | | | | If either current_task or new_task is untied then skip task scheduling constraint checks, because untied tasks are not affected by the task scheduling constraints. Differential Revision: http://reviews.llvm.org/D21196 llvm-svn: 272570
* Fix crash when libomp loaded/unloaded multiple timesJonathan Peyton2016-06-131-38/+23
| | | | | | | | | | | | | | | | | The problem scenario is the following: A dynamic library, libfoo.so, depends on libomp.so (it creates parallel region and calls some omp functions). An application has a loop where it dynamically loads libfoo.so, calls the function from it, unloads libfoo.so. After several loop iterations application crashes with the message about lack of resources OMP: Error #34: System unable to allocate necessary resources for OMP thread: The problem is that pthread_kill() was not followed by pthread_join() in case of terminated thread. This patch fixes this problem for both worker and monitor threads. Differential Revision: http://reviews.llvm.org/D21200 llvm-svn: 272567
* Hwloc refactoring patchJonathan Peyton2016-06-132-131/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These changes remove the hwloc_topology_ignore_type function which doesn't exist in the hwloc 2.0 API. In the existing code, the topology extracted from hwloc has the cache levels stripped out and then assumes the final stripped topology follows the typical three-level topology: packages -> cores -> HW threads. But the code is doing unclean manipulations to determine at what level those resources are located and also assumes too much about what hwloc is detecting (there could be intermediate levels in between socket and core for instance). This new way of extracting the topology doesn't strip out any hardware objects that hwloc detects. It does not assume the three level topology, and instead searches for the relevant three levels within the topology for each bit of information using hwloc interface functions. i.e., the three level topology subset that our affinity code is interested in is extracted from the hwloc topology tree directly. For example, the new __kmp_hwloc_get_nobjs_under_obj function gives the user the number of cores under a socket reliably without worrying if there are unexpected objects between the socket object and core object in the hwloc topology structure. Also, now that all topology information is kept, there are also possibilities of using the caches/numa nodes to determine more sophisticated affinity settings in the future. There is also some cleanup code added for the destruction of the __kmp_hwloc_topology object. Differential Revision: http://reviews.llvm.org/D21195 llvm-svn: 272565
* Fix bitmask complement operationJonathan Peyton2016-06-131-3/+25
| | | | | | | | | | | | The bitmask complement operation doesn't consider the max proc id which means something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a Linux system even though there aren't 600 processors on said system. This change has the complement bitmask and-ed with the fullmask so that it will only contain valid processors. Differential Revision: http://reviews.llvm.org/D21245 llvm-svn: 272561
OpenPOWER on IntegriCloud