summaryrefslogtreecommitdiffstats
path: root/openmp
Commit message (Collapse)AuthorAgeFilesLines
...
* Change hwloc discovery algorithm to print topology only for accessible resourcesJonathan Peyton2016-06-161-17/+29
| | | | | | | | | | | | | Change hwloc discovery algorithm to print topology for only accessible resources, and report uniformity correspondingly, similar to what other topology discovery algorithms do. Fixes minor inconsistency in total topology reported and resources used for threads binding in case hwloc used. Patch by Andrey Churbanov. Differential Revision: http://reviews.llvm.org/D21389 llvm-svn: 272952
* Teach OpenMP Library to use Hwloc on WindowsJonathan Peyton2016-06-165-63/+116
| | | | | | | | | | | | | | | | | | | This patch allows a user to enable Hwloc on windows. There are three main changes in here: 1.kmp.h - Move definitions/declarations out of KMP_OS_WINDOWS guard (our windows implementation of affinity) because they need to be defined when KMP_USE_HWLOC is on as well. 2.teach __kmp_set_system_affinity, __kmp_get_system_affinity, __kmp_get_proc_group, and __kmp_affinity_bind_thread how to use hwloc. 3.teach CMake how to include hwloc when building Windows Another minor change in here is to make sure that anything under KMP_USE_HWLOC is also guarded by KMP_AFFINITY_SUPPORTED as well. This is to prevent Mac builds from requiring anything from Hwloc. Differential Revision: http://reviews.llvm.org/D21441 llvm-svn: 272951
* Fix for crash in task dependenciesJonathan Peyton2016-06-161-1/+1
| | | | | | | | | | | | With single thread using __kmpc_omp_wait_deps segfaults in OpenMP runtime. Offloading with depend also encounters this problem when we generate kmpc_omp_wait_deps instead of kmpc_omp_task_with_deps. Patch by Alex Duran Differential Revision: http://reviews.llvm.org/D21384 llvm-svn: 272949
* Fixed missing memory cleanup in __kmp_affinity_create_hwloc_map()Jonathan Peyton2016-06-161-0/+2
| | | | | | | | | | | Cleanup: fixed missing memory cleanup in couple of corner cases. Fixes possible memory leak in some corner cases Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21355 llvm-svn: 272946
* Reduce perf impact of redundant ittnotify callsJonathan Peyton2016-06-163-8/+18
| | | | | | | | | | | | Improved performance of ittnotify calls by request from ittnotify owner: calls to __itt_string_handle_create made unique (it was called multiple times). Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21353 llvm-svn: 272945
* Deprecate KMP_PLACE_THREADS and rename as KMP_HW_SUBSETJonathan Peyton2016-06-163-33/+55
| | | | | | | | | | | | | Deprecate KMP_PLACE_THREADS and rename it to KMP_HW_SUBSET due to confusion about its purpose and function among users. KMP_HW_SUBSET is an environment variable which allows users to easily pick a subset of the hardware topology to use. e.g., KMP_HW_SUBSET=30c,2t means use 30 cores, 2 threads per core. Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21340 llvm-svn: 272937
* Bug fix: crash if teams executed on hostJonathan Peyton2016-06-161-0/+1
| | | | | | | | | | | | | | Added argv array check/allocation for parallel directly nested inside the teams construct, as new coming Fortran codegen passes parameters directly into kmpc_fork_call missing same parameters in kmpc_fork_teams (earlier codegen passed to parallel the subset of parameter passed to teams, and thus no check/allocation needed). Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21336 llvm-svn: 272935
* Fix large overhead with itt notifications on region/barrier name composingJonathan Peyton2016-06-141-5/+19
| | | | | | | | | | | | | | | Currently, there is a big overhead in reporting of loop metadata through ittnotify. The pair of functions: __kmp_str_loc_init/__kmp_str_loc_free are replaced with strchr/atoi calls. Thus, a lot of time consuming actions are skipped - many memory allocations/deallocations, heavy string duplication, etc. The loop metadata only needs line and column info from the source string, so no allocations and string splitting actually needed. Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21309 llvm-svn: 272698
* Remove unused wait/release code.Jonathan Peyton2016-06-144-44/+0
| | | | | | | | | | | | Cleanup - unused code removal. TODO: consider to remove (replace with flag class methods) also kmp_wait_64 and kmp_release_64 routines. Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21332 llvm-svn: 272697
* Whitespace cleanup of dllexportsJonathan Peyton2016-06-141-2/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D21331 llvm-svn: 272691
* Renaming change: 41 -> 45 and 4.1 -> 4.5Jonathan Peyton2016-06-1425-89/+93
| | | | | | | | OpenMP 4.1 is now OpenMP 4.5. Any mention of 41 or 4.1 is replaced with 45 or 4.5. Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that 41 is deprecated and to use 45 instead. llvm-svn: 272687
* Bug fix for Bugzilla bug 26602: Remove function bodies with KMP_ASSERT(0)Jonathan Peyton2016-06-131-4/+4
| | | | | | | | | | | | | | | | Fix for bugzilla https://llvm.org/bugs/show_bug.cgi?id=26602. Removed functions body consisted of the only KMP_ASSERT(0) statement. Thus possible runtime crash converted to compile-time error, which looks preferable (faster possible error detection). TODO: consider C++11 static assert as an alternative, that could make the diagnostics better. Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21304 llvm-svn: 272590
* Affinity mask processing improvementsJonathan Peyton2016-06-134-57/+56
| | | | | | | | | | | | Remove static specifier from var fullMask and remove kmp_get_fullMask() routine. When iterating through procs in a mask, always check if proc is in fullMask (this check was missing in a few places). Patch by Brian Bliss. Differential Revision: http://reviews.llvm.org/D21300 llvm-svn: 272589
* Exclude untied tasks from task stealing constraintJonathan Peyton2016-06-131-2/+2
| | | | | | | | | | If either current_task or new_task is untied then skip task scheduling constraint checks, because untied tasks are not affected by the task scheduling constraints. Differential Revision: http://reviews.llvm.org/D21196 llvm-svn: 272570
* Fix crash when libomp loaded/unloaded multiple timesJonathan Peyton2016-06-131-38/+23
| | | | | | | | | | | | | | | | | The problem scenario is the following: A dynamic library, libfoo.so, depends on libomp.so (it creates parallel region and calls some omp functions). An application has a loop where it dynamically loads libfoo.so, calls the function from it, unloads libfoo.so. After several loop iterations application crashes with the message about lack of resources OMP: Error #34: System unable to allocate necessary resources for OMP thread: The problem is that pthread_kill() was not followed by pthread_join() in case of terminated thread. This patch fixes this problem for both worker and monitor threads. Differential Revision: http://reviews.llvm.org/D21200 llvm-svn: 272567
* Hwloc refactoring patchJonathan Peyton2016-06-132-131/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These changes remove the hwloc_topology_ignore_type function which doesn't exist in the hwloc 2.0 API. In the existing code, the topology extracted from hwloc has the cache levels stripped out and then assumes the final stripped topology follows the typical three-level topology: packages -> cores -> HW threads. But the code is doing unclean manipulations to determine at what level those resources are located and also assumes too much about what hwloc is detecting (there could be intermediate levels in between socket and core for instance). This new way of extracting the topology doesn't strip out any hardware objects that hwloc detects. It does not assume the three level topology, and instead searches for the relevant three levels within the topology for each bit of information using hwloc interface functions. i.e., the three level topology subset that our affinity code is interested in is extracted from the hwloc topology tree directly. For example, the new __kmp_hwloc_get_nobjs_under_obj function gives the user the number of cores under a socket reliably without worrying if there are unexpected objects between the socket object and core object in the hwloc topology structure. Also, now that all topology information is kept, there are also possibilities of using the caches/numa nodes to determine more sophisticated affinity settings in the future. There is also some cleanup code added for the destruction of the __kmp_hwloc_topology object. Differential Revision: http://reviews.llvm.org/D21195 llvm-svn: 272565
* Fix bitmask complement operationJonathan Peyton2016-06-131-3/+25
| | | | | | | | | | | | The bitmask complement operation doesn't consider the max proc id which means something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a Linux system even though there aren't 600 processors on said system. This change has the complement bitmask and-ed with the fullmask so that it will only contain valid processors. Differential Revision: http://reviews.llvm.org/D21245 llvm-svn: 272561
* [STATS] Add stats gathering for taskloop constructJonathan Peyton2016-06-132-0/+5
| | | | llvm-svn: 272560
* Fix spelling in commentJonathan Peyton2016-06-091-1/+1
| | | | llvm-svn: 272291
* Revert accidental commit to lit.cfgJonathan Peyton2016-06-091-3/+0
| | | | llvm-svn: 272287
* Refactor __kmp_execute_tasks_template functionJonathan Peyton2016-06-092-228/+108
| | | | | | | | | | | | | | Refactored __kmp_execute_tasks_template to shorten and remove code redundancy. The original code for __kmp_execute_tasks_template was very redundant with large sections of repeated code that needed to be kept consistent, and goto statements that made the control flow difficult to discern. This refactoring removes all gotos and redundancy. Patch by Terry Wilmarth Differential Revision: http://reviews.llvm.org/D20879 llvm-svn: 272286
* kmp_lock.h: Fix VS2013 build after r271324Hans Wennborg2016-06-091-0/+16
| | | | | | | | | | | MSVC doesn't allow std::atomic<>s in a union since they don't have trivial copy constructor. Replacing them with e.g. std::atomic_int works, but that breaks the GCC build on Linux, because then calls to e.g. std::atomic_load_explicit fail, as they expect a real std::atomic<> pointer. Fixing this with an #ifdef to unbreak the build for now. llvm-svn: 272271
* Fine tuning of TC* macros - small followupPaul Osmialowski2016-06-011-1/+1
| | | | | | | | | As I replaced no-op TCR_4 with actual code, compiler complained while building debug build. This patch moves 'cast to int' to the correct place. Extension to Differential Revision: http://reviews.llvm.org/D19880 llvm-svn: 271377
* Use C++11 atomics for ticket locks implementationPaul Osmialowski2016-05-316-82/+139
| | | | | | | | | | | | | | | | | | | | | | This patch replaces use of compiler builtin atomics with C++11 atomics for ticket locks implementation. Ticket locks are used in critical places of the runtime, e.g. in the tasking mechanism. The main reason this change was introduced is the problem with work stealing function on ARM architecture which suffered from nasty race condition. It turned out that the root cause of the problem lies in the way ticket locks are implemented. Changing compiler builtins into C++11 atomics solves the problem. Two assertions were added into kmp_tasking.c which are useful for detecting early symptoms of something wrong going on with work stealing, which were among the possible outcomes of the race condition. Differential Revision: http://reviews.llvm.org/D19878 llvm-svn: 271324
* Addition of OpenMP 4.5 feature: schedule(simd:static)Jonathan Peyton2016-05-312-1/+28
| | | | | | | | | | | | This patch implements the new kmp_sch_static_balanced_chunked schedule kind that the compiler will generate when it encounters schedule(simd: static). It just adds the new constant and the new switch case __kmp_for_static_init. Patch by Alex Duran. Differential Revision: http://reviews.llvm.org/D20699 llvm-svn: 271320
* Avoid deadlock with COIJonathan Peyton2016-05-314-28/+82
| | | | | | | | | | | | | | When an asynchronous offload task is completed, COI calls the runtime to queue a "destructor task". When the task deques are full, a dead-lock situation arises where the OpenMP threads are inside but cannot progress because the COI thread is stuck inside the runtime trying to find a slot in a deque. This patch implements the solution where the task deques doubled in size when a task is being queued from a COI thread. Differential Revision: http://reviews.llvm.org/D20733 llvm-svn: 271319
* Offer API for setting number of loop dispatch buffersJonathan Peyton2016-05-3116-14/+246
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem is the lack of dispatch buffers when thousands of loops with nowait, about 10 iterations each, are executed by hundreds of threads. We only have built-in 7 dispatch buffers, but there is a need in dozens or hundreds of buffers. The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order to give users same possibility I changed build-time control into run-time one, adding API just in case. This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API function kmp_set_disp_num_buffers(int num_buffers). The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization, because during the serial initialization we already allocate buffers for the hot team, so it is too late to change the number of buffers later (or we need to reallocate buffers for all teams which sounds too complicated). The kmp_set_defaults() routine does not work for this envirable, because it calls serial initialization before reading the parameter string. So a new routine, kmp_set_disp_num_buffers(), is created so that it can set our internal global variable before the library initialization. If both the envirable and API used the envirable wins. Differential Revision: http://reviews.llvm.org/D20697 llvm-svn: 271318
* Fix storing the frame pointer for OMP-T during ppc64 microtask dispatchHal Finkel2016-05-271-0/+1
| | | | | | Thanks to John Mellor-Crummey for reporting the omission. llvm-svn: 271035
* Add missing OpenMP 4.5 device entries to stubs library.Jonathan Peyton2016-05-273-10/+119
| | | | llvm-svn: 271006
* Fix for OMP_PROC_BIND=spread strategyJonathan Peyton2016-05-261-4/+14
| | | | | | | | | | | | | | | | The OMP_PROC_BIND=spread strategy fails to assign the master thread the correct place partition after the first parallel region. Other threads in the hot team will remember their place_partition, but the master's place partition is restored to what it was before entering the parallel region. So when the hot team is used for subsequent parallel regions, the master has lost this info. This fix calls __kmp_partition_places to update only the master thread's place partition in the spread case when there are no other changes to the hot team. Patch by Terry Wilmarth Differential Revision: http://reviews.llvm.org/D20539 llvm-svn: 270890
* Make LIBOMP_USE_ITT_NOTIFY a setting that can be enabled or disabledJonathan Peyton2016-05-264-5/+9
| | | | | | | | | | | | | On Blue Gene/Q, having LIBOMP_USE_ITT_NOTIFY support compiled into a statically-linked binary causes a failure at runtime because dlopen fails. This patch changes LIBOMP_USE_ITT_NOTIFY to a cacheable configuration setting that can be disabled. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D20517 llvm-svn: 270884
* Add a test case for microtask dispatch with many argumentsHal Finkel2016-05-261-0/+38
| | | | | | This is a cleaned-up version of the test case posted in the D19879 review. llvm-svn: 270867
* Add an assembly __kmp_invoke_microtask for ppc64[le]Hal Finkel2016-05-262-1/+221
| | | | | | | | | Clang no longer restricts itself to generating microtasks with a small number of arguments, and so an assembly implementation is required to prevent hitting the parameter limit present in the C implementation. This adds an implementation for ppc64[le]. llvm-svn: 270821
* D20525: Use more general function for getting gtid which may be faster than ↵Andrey Churbanov2016-05-251-1/+1
| | | | | | specific one. llvm-svn: 270694
* Fork performance improvementsJonathan Peyton2016-05-233-31/+51
| | | | | | | | | | | Most of this is modifications to check for differences before updating data fields in team struct. There is also some rearrangement of the team struct. Patch by Diego Caballero Differential Revision: http://reviews.llvm.org/D20487 llvm-svn: 270468
* Allow unit testing on WindowsJonathan Peyton2016-05-236-3/+74
| | | | | | | | | | | | | | These changes allow testing on Windows using clang.exe. There are two main changes: 1. Only link to -lm when it actually exists on the system 2. Create basic versions of pthread_create() and pthread_join() for windows. They are not POSIX compliant by any stretch but will allow any existing and future tests to use pthread_create() and pthread_join() for testing interactions of libomp with os threads. Differential Revision: http://reviews.llvm.org/D20391 llvm-svn: 270464
* Changed parameter names in Fortran modules to correspond with OpenMP 4.5 ↵Jonathan Peyton2016-05-232-76/+76
| | | | | | specification llvm-svn: 270447
* Remove trailing whitespace in src/ directoryJonathan Peyton2016-05-2031-163/+163
| | | | | | This patch doesn't affect D19878's context. So D19878 still cleanly applies. llvm-svn: 270252
* Remove unnecessary unistd.h header from tests.Jonathan Peyton2016-05-184-4/+0
| | | | llvm-svn: 269987
* Remove trailing whitespace in files in doc/ directoryJonathan Peyton2016-05-172-4/+4
| | | | llvm-svn: 269842
* Remove trailing whitespace from testsJonathan Peyton2016-05-1754-181/+181
| | | | llvm-svn: 269841
* Remove trailing whitespace in files in tools/ directoryJonathan Peyton2016-05-171-4/+4
| | | | llvm-svn: 269837
* Remove trailing whitespace in CMake filesJonathan Peyton2016-05-172-7/+7
| | | | llvm-svn: 269836
* Remove trailing whitespace in READMEs, CREDITS.txt and index.htmlJonathan Peyton2016-05-175-31/+31
| | | | llvm-svn: 269835
* Update copyright year in LICENSE.txtJonathan Peyton2016-05-171-2/+2
| | | | llvm-svn: 269826
* [OpenMP Testing] Have lit.py be a valid lit executableJonathan Peyton2016-05-171-1/+1
| | | | | | | Users can use either llvm-lit (generated during llvm build) or lit.py which exists in llvm/utils/lit. llvm-svn: 269774
* Clean all the mess around KMP_USE_FUTEX and kmp_lock.hPaul Osmialowski2016-05-165-13/+16
| | | | | | | | | | | | | | | | | | KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used inconsequently throughout LLVM libomp code. * some .c files that use this define do not include kmp_lock.h file, in effect guarded part of code are never compiled * some places in code use architecture-depending preprocessor logic expressions which effectively disable use of Futex for AArch64 architecture, all these places should use '#if KMP_USE_FUTEX' instead to avoid any further confusions * some places use KMP_HAS_FUTEX which is nowhere defined, KMP_USE_FUTEX should be used instead Differential Revision: http://reviews.llvm.org/D19629 llvm-svn: 269642
* NFC fix indent (relates to my previous commit)Paul Osmialowski2016-05-131-3/+3
| | | | llvm-svn: 269443
* Solve 'Too many args to microtask' problemPaul Osmialowski2016-05-132-3/+144
| | | | | | | | | | | | | This patch solves 'Too many args to microtask' problem which occurs while executing lulesh2.0.3 benchmark on AArch64. To solve this I had to wrtite AArch64 assembly version of __kmp_invoke_microtask() function, similar to x86 and x86_64 implementations. Differential Revision: http://reviews.llvm.org/D19879 llvm-svn: 269399
* Adding new kmp_aligned_malloc() entry pointJonathan Peyton2016-05-1219-6/+227
| | | | | | | | | | | | | This change adds a new entry point, kmp_aligned_malloc(size_t size, size_t alignment), an entry point corresponding to kmp_malloc() but with the capability to return aligned memory as well. Other allocator routines have been adjusted so that kmp_free() can be used for freeing memory blocks allocated by any kmp_*alloc() routine, including the new kmp_aligned_malloc() routine. Differential Revision: http://reviews.llvm.org/D19814 llvm-svn: 269365
OpenPOWER on IntegriCloud