bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[OpenMP] NFC: Fix trivial typos in comments	Kazuaki Ishizaki	2020-01-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: jdoerfert, Jim Reviewed By: Jim Subscribers: Jim, mgorny, guansong, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72285
*	[OpenMP] NFC: Fix trivial typos in comments	Kelvin Li	2020-01-03	1	-1/+1
\| \| \| \| \| \|	Submitted by: kiszk Differential Revision: https://reviews.llvm.org/D72171
*	[OpenMP 5.0] Deprecate nest-var and associated features	Jonathan Peyton	2019-02-28	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nest-var, OMP_NESTED, omp_set_nested()., and omp_get_nested() have been deprecated in the 5.0 spec. Initial nesting info is now derived from OMP_MAX_ACTIVE_LEVELS, OMP_NUM_THREADS, and OMP_PROC_BIND. This patch deprecates the internal ICV that corresponds to nest-var, and replaces it with the max-active-levels-var ICV to determine nesting. The change still allows for use of OMP_NESTED (according to 5.0 changes), omp_get_nested, and omp_set_nested, which have had deprecation messages added to them. The change allows certain settings of OMP_NUM_THREADS, OMP_PROC_BIND, and OMP_MAX_ACTIVE_LEVELS to turn on nesting, but OMP_NESTED=0 will still force nesting to be off. The runtime now prints informative messages about deprecation of OMP_NESTED, omp_set_nested(), and omp_get_nested(), when those environment variables or routines are used. It also prints deprecated message in output for KMP_SETTINGS and OMP_DISPLAY_ENV for OMP_NESTED. This patch also fixes OMP_DISPLAY_ENV output for OMP_TARGET_OFFLOAD. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D58408 llvm-svn: 355138
*	Update more file headers across all of the LLVM projects in the monorepo	Chandler Carruth	2019-01-19	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reflect the new license. These used slightly different spellings that defeated my regular expressions. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351648
*	[OpenMP] Implement OpenMP 5.0 affinity format functionality	Jonathan Peyton	2018-12-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the affinity format functionality introduced in OpenMP 5.0. This patch adds: Two new environment variables: OMP_DISPLAY_AFFINITY=TRUE\|FALSE OMP_AFFINITY_FORMAT=<string> and Four new API: 1) omp_set_affinity_format() 2) omp_get_affinity_format() 3) omp_display_affinity() 4) omp_capture_affinity() The affinity format functionality has two ICV's associated with it: affinity-display-var (bool) and affinity-format-var (string). The affinity-display-var enables/disables the functionality through the envirable OMP_DISPLAY_AFFINITY. The affinity-format-var is a formatted string with the special field types beginning with a '%' character similar to printf For example, the affinity-format-var could be: "OMP: host:%H pid:%P OStid:%i num_threads:%N thread_num:%n affinity:{%A}" The affinity-format-var is displayed by every thread implicitly at the beginning of a parallel region when any thread's affinity has changed (including a brand new thread being spawned), or explicitly using the omp_display_affinity() API. The omp_capture_affinity() function can capture the affinity-format-var in a char buffer. And omp_set\|get_affinity_format() allow the user to set\|get the affinity-format-var explicitly at runtime. omp_capture_affinity() and omp_get_affinity_format() both return the number of characters needed to hold the entire string it tried to make (not including NULL character). If not enough buffer space is available, both these functions truncate their output. Differential Revision: https://reviews.llvm.org/D55148 llvm-svn: 349089
*	[OpenMP] Initial implementation of OMP 5.0 Memory Management routines	Jonathan Peyton	2018-09-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implemented omp_alloc, omp_free, omp_{set,get}_default_allocator entries, and OMP_ALLOCATOR environment variable. Added support for HBW memory on Linux if libmemkind.so library is accessible (dynamic library only, no support for static libraries). Only used stable API (hbwmalloc) of the memkind library though we may consider using experimental API in future. The ICV def-allocator-var is implemented per implicit task similar to place-partition-var. In the absence of a requested allocator, the uses the default allocator. Predefined allocators (the only ones currently available) are made similar for C and Fortran, - pointers (long integers) with values 1 to 8. Patch by Andrey Churbanov Differential Revision: https://reviews.llvm.org/D51232 llvm-svn: 341687
*	[OpenMP] Introduce hierarchical scheduling	Jonathan Peyton	2018-07-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces the logic implementing hierarchical scheduling. First and foremost, hierarchical scheduling is off by default To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage. This work is based off if the IWOMP paper: "Workstealing and Nested Parallelism in SMP Systems" Hierarchical scheduling is the layering of OpenMP schedules for different layers of the memory hierarchy. One can have multiple layers between the threads and the global iterations space. The threads will go up the hierarchy to grab iterations, using possibly a different schedule & chunk for each layer. [ Global iteration space (0-999) ] (use static) [ L1 \| L1 \| L1 \| L1 ] (use dynamic,1) [ T0 T1 \| T2 T3 \| T4 T5 \| T6 T7 ] In the example shown above, there are 8 threads and 4 L1 caches begin targeted. If the topology indicates that there are two threads per core, then two consecutive threads will share the data of one L1 cache unit. This example would have the iteration space (0-999) split statically across the four L1 caches (so the first L1 would get (0-249), the second would get (250-499), etc). Then the threads will use a dynamic,1 schedule to grab iterations from the L1 cache units. There are currently four supported layers: L1, L2, L3, NUMA OMP_SCHEDULE can now read a hierarchical schedule with this syntax: OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h to try to keep it separate from the rest of the code. Differential Revision: https://reviews.llvm.org/D47962 llvm-svn: 336571
*	Fix trademarks found by scanner	Jonathan Peyton	2018-01-04	1	-1/+1
\| \| \| \|	llvm-svn: 321827
*	Remove unused positional argument for printf	Joachim Protze	2017-12-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The format string for hints only prints the second argument (string) and drops the first argument (hint id). Depending on how you read the POSIX text for printf, this could be valid. But for practical reason, i.e., unpacking the va_list passed to printf based on the formating information, it makes sense to fix the implementation and not pass the id for hint. Failing testcases were: misc_bugs/teams-reduction.c ompt/parallel/not_enough_threads.c Differential Revision: https://reviews.llvm.org/D41504 llvm-svn: 321361
*	Extension of HWLOC topology discovery with NUMA nodes and tiles	Andrey Churbanov	2017-11-30	1	-0/+4
\| \| \| \| \| \| \| \|	Patch by Olga Malysheva Differential Revision: https://reviews.llvm.org/D40309 llvm-svn: 319422
*	Warning is emitted when tiles are requested but cannot be used	Jonathan Peyton	2017-11-29	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added two warnings: 1) Before building the topology map check if tiles are requested but the topo method is not hwloc; 2) After building the topology map check if tiles are requested but not detected by the library. Patch by Olga Malysheva Differential Revision: https://reviews.llvm.org/D40340 llvm-svn: 319374
*	[OMPT] Fix assertion for OpenMP code generated with outdated compilers	Joachim Protze	2017-11-10	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	For up-to-date compilers, this assertion is reasonable, but it breaks compatibility with the typical compiler installed on most systems. This patch changes the default value to what we had when there was no compiler support. A warning about the outdated compiler is printed during runtime, when this point is reached. Differential Revision: https://reviews.llvm.org/D39890 llvm-svn: 317928
*	Add new envirable KMP_TEAMS_THREAD_LIMIT	Jonathan Peyton	2017-08-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change adds a new environment variable, KMP_TEAMS_THREAD_LIMIT, which is used to set a new global variable, __kmp_teams_max_nth, which is checked when determining the size and quantity of teams that will be created in the teams construct. Specifically, it is a limit on the total number of threads in a given teams construct. It differentiates the limits for the teams construct from the limits for regular parallel regions (KMP_DEVICE_THREAD_LIMIT/__kmp_max_nth and OMP_THREAD_LIMIT/__kmp_cg_max_nth). When each individual team is formed, it is still subject to those limits. After the clauses to the teams construct are parsed and calculated, we check to make sure we are within this limit, and if not, reduce num_threads per team and/or number of teams, accordingly. The default value is set to the number of available processors on the system. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D36009 llvm-svn: 309874
*	Fix implementation of OMP_THREAD_LIMIT	Jonathan Peyton	2017-07-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change fixes the implementation of OMP_THREAD_LIMIT. The implementation of this previously was not restricted to a contention group (but it should be, according to the spec), and this is fixed here. A field is added to root thread to store a counter of the threads in the contention group. An extra check is added when reserving threads for a parallel region that checks this variable and compares to threadlimit-var, which is implemented as a new global variable, kmp_cg_max_nth. Associated settings changes were also made, and clean up of comments that referred to OMP_THREAD_LIMIT, but should refer to the new KMP_DEVICE_THREAD_LIMIT (added in an earlier patch). Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D35912 llvm-svn: 309319
*	Fix wrong website in messages	Jonathan Peyton	2017-07-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Address user message bug where the messages were sending users to Intel's website instead of the LLVM OpenMP runtime websites. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=32892 Differential Revision: https://reviews.llvm.org/D35018 llvm-svn: 307206
*	KMP_HW_SUBSET extended with NUMA support when HWLOC enabled	Andrey Churbanov	2017-04-13	1	-3/+7
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D31600 llvm-svn: 300220
*	Printing OS thread id, when KMP_AFFINITY is set.	Jonathan Peyton	2017-01-27	1	-1/+1
\| \| \| \| \| \| \| \|	Patch by Vishakha Agrawal Differential Revision: https://reviews.llvm.org/D28873 llvm-svn: 293315
*	Use 'critical' reduction method when 'atomic' is not available but requested.	Jonathan Peyton	2016-09-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	In case atomic reduction method is not available (the compiler can't generate it) the assertion failure occurred if KMP_FORCE_REDUCTION=atomic was specified. This change replaces the assertion with a warning and sets the reduction method to the default one - 'critical'. Patch by Olga Malysheva Differential Revision: https://reviews.llvm.org/D23990 llvm-svn: 280519
*	Deprecate KMP_PLACE_THREADS and rename as KMP_HW_SUBSET	Jonathan Peyton	2016-06-16	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	Deprecate KMP_PLACE_THREADS and rename it to KMP_HW_SUBSET due to confusion about its purpose and function among users. KMP_HW_SUBSET is an environment variable which allows users to easily pick a subset of the hardware topology to use. e.g., KMP_HW_SUBSET=30c,2t means use 30 cores, 2 threads per core. Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21340 llvm-svn: 272937
*	Offer API for setting number of loop dispatch buffers	Jonathan Peyton	2016-05-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem is the lack of dispatch buffers when thousands of loops with nowait, about 10 iterations each, are executed by hundreds of threads. We only have built-in 7 dispatch buffers, but there is a need in dozens or hundreds of buffers. The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order to give users same possibility I changed build-time control into run-time one, adding API just in case. This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API function kmp_set_disp_num_buffers(int num_buffers). The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization, because during the serial initialization we already allocate buffers for the hot team, so it is too late to change the number of buffers later (or we need to reallocate buffers for all teams which sounds too complicated). The kmp_set_defaults() routine does not work for this envirable, because it calls serial initialization before reading the parameter string. So a new routine, kmp_set_disp_num_buffers(), is created so that it can set our internal global variable before the library initialization. If both the envirable and API used the envirable wins. Differential Revision: http://reviews.llvm.org/D20697 llvm-svn: 271318
*	Adding Hwloc library option for affinity mechanism	Jonathan Peyton	2015-11-30	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These changes allow libhwloc to be used as the topology discovery/affinity mechanism for libomp. It is supported on Unices. The code additions: * Canonicalize KMP_CPU_* interface macros so bitmask operations are implementation independent and work with both hwloc bitmaps and libomp bitmaps. So there are new KMP_CPU_ALLOC_* and KMP_CPU_ITERATE() macros and the like. These are all in kmp.h and appropriately placed. * Hwloc topology discovery code in kmp_affinity.cpp. This uses the hwloc interface to create a libomp address2os object which the rest of libomp knows how to handle already. * To build, use -DLIBOMP_USE_HWLOC=on and -DLIBOMP_HWLOC_INSTALL_DIR=/path/to/install/dir [default /usr/local]. If CMake can't find the library or hwloc.h, then it will tell you and exit. Differential Revision: http://reviews.llvm.org/D13991 llvm-svn: 254320
*	Removed '@' from delimiters, added it as offset designator.	Jonathan Peyton	2015-10-20	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Moved '@' from delimiters to offset designators for the KMP_PLACE_THREADS environment variable. Only one of: postfix "o" or prefix @, should be used in the value of KMP_PLACE_THREADS. For example, '2s@2,4c@2,1t'. This is also the format of KMP_SETTINGS=1 output now (removed "o" from there). e.g., 2s,2o,4c,2o,1t. Differential Revision: http://reviews.llvm.org/D13701 llvm-svn: 250846
*	Added sockets to the syntax of KMP_PLACE_THREADS environment variable.	Jonathan Peyton	2015-10-08	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added (optional) sockets to the syntax of the KMP_PLACE_THREADS environment variable. Some limitations: * The number of sockets and then optional offset should be specified first (before other parameters). * The letter designation is mandatory for sockets and then for other parameters. * If number of cores is specified first, then the number of sockets is defaulted to all sockets on the machine; also, the old syntax is partially supported if sockets are skipped. * If number of threads per core is specified first, then the number of sockets and cores per socket are defaulted to all sockets and all cores per socket respectively. * The number of cores per socket cannot be specified before sockets or after threads per core. * The number of threads per core can be specified before or after core-offset (old syntax required it to be before core-offset); * Parameters delimiter can be: empty, comma, lower-case x; * Spaces are allowed around numbers, around letters, around delimiter. Approximate shorthand specification: KMP_PLACE_THREADS="[num_sockets(S\|s)[[delim]offset(O\|o)][delim]][num_cores_per_socket(C\|c)[[delim]offset(O\|o)][delim]][num_threads_per_core(T\|t)]" Differential Revision: http://reviews.llvm.org/D13175 llvm-svn: 249708
*	Get rid of some dead code.	Jonathan Peyton	2015-06-02	1	-8/+7
\| \| \| \| \| \| \| \| \|	Some old references to RML and IOMP which aren't used anywhere are deleted. http://lists.cs.uiuc.edu/pipermail/openmp-dev/2015-June/000664.html Patch by Jack Howarth and Jonathan Peyton llvm-svn: 238878
*	cleanup: removed unused function __kmp_change_thread_affinity_mask	Andrey Churbanov	2015-03-10	1	-1/+1
\| \| \| \|	llvm-svn: 231778
*	Two warning messages fixed.	Andrey Churbanov	2015-02-20	1	-4/+4
\| \| \| \|	llvm-svn: 230035
*	Comments only: removing the Revision and Date svn variables from the top of ↵	Andrey Churbanov	2015-01-27	1	-2/+0
\| \| \| \| \| \|	all the source files. llvm-svn: 227207
*	I apologise in advance for the size of this check-in. At Intel we do	Jim Cownie	2014-10-07	1	-7/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	understand that this is not friendly, and are working to change our internal code-development to make it easier to make development features available more frequently and in finer (more functional) chunks. Unfortunately we haven't got that in place yet, and unpicking this into multiple separate check-ins would be non-trivial, so please bear with me on this one. We should be better in the future. Apologies over, what do we have here? GGC 4.9 compatibility -------------------- * We have implemented the new entrypoints used by code compiled by GCC 4.9 to implement the same functionality in gcc 4.8. Therefore code compiled with gcc 4.9 that used to work will continue to do so. However, there are some other new entrypoints (associated with task cancellation) which are not implemented. Therefore user code compiled by gcc 4.9 that uses these new features will not link against the LLVM runtime. (It remains unclear how to handle those entrypoints, since the GCC interface has potentially unpleasant performance implications for join barriers even when cancellation is not used) --- new parallel entry points --- new entry points that aren't OpenMP 4.0 related These are implemented fully :- GOMP_parallel_loop_dynamic() GOMP_parallel_loop_guided() GOMP_parallel_loop_runtime() GOMP_parallel_loop_static() GOMP_parallel_sections() GOMP_parallel() --- cancellation entry points --- Currently, these only give a runtime error if OMP_CANCELLATION is true because our plain barriers don't check for cancellation while waiting GOMP_barrier_cancel() GOMP_cancel() GOMP_cancellation_point() GOMP_loop_end_cancel() GOMP_sections_end_cancel() --- taskgroup entry points --- These are implemented fully. GOMP_taskgroup_start() GOMP_taskgroup_end() --- target entry points --- These are empty (as they are in libgomp) GOMP_target() GOMP_target_data() GOMP_target_end_data() GOMP_target_update() GOMP_teams() Improvements in Barriers and Fork/Join -------------------------------------- * Barrier and fork/join code is now in its own file (which makes it easier to understand and modify). * Wait/release code is now templated and in its own file; suspend/resume code is also templated * There's a new, hierarchical, barrier, which exploits the cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve fork/join and barrier performance. *BEWARE* the new source files have not been added to the legacy Cmake build system. If you want to use that fixes wil be required. Statistics Collection Code -------------------------- * New code has been added to collect application statistics (if this is enabled at library compile time; by default it is not). The statistics code itself is generally useful, the lightweight timing code uses the X86 rdtsc instruction, so will require changes for other architectures. The intent of this code is not for users to tune their codes but rather 1) For timing code-paths inside the runtime 2) For gathering general properties of OpenMP codes to focus attention on which OpenMP features are most used. Nested Hot Teams ---------------- * The runtime now maintains more state to reduce the overhead of creating and destroying inner parallel teams. This improves the performance of code that repeatedly uses nested parallelism with the same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL envirable to a depth to enable this (and, of course, OMP_NESTED=true to enable nested parallelism at all). Improved Intel(r) VTune(Tm) Amplifier support --------------------------------------------- * The runtime provides additional information to Vtune via the itt_notify interface to allow it to display better OpenMP specific analyses of load-imbalance. Support for OpenMP Composite Statements --------------------------------------- * Implement new entrypoints required by some of the OpenMP 4.1 composite statements. Improved ifdefs --------------- * More separation of concepts ("Does this platform do X?") from platforms ("Are we compiling for platform Y?"), which should simplify future porting. ScaleMP* contribution --------------------- Stack padding to improve the performance in their environment where cross-node coherency is managed at the page level. Redesign of wait and release code --------------------------------- The code is simplified and performance improved. Bug Fixes --------- Fixes for Windows multiple processor groups. Fix Fortran module build on Linux: offload attribute added. Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen. Fix an inconsistent error message for KMP_PLACE_THREADS environment variable. llvm-svn: 219214
*	Fix typos	Alp Toker	2014-02-24	1	-1/+1
\| \| \| \|	llvm-svn: 202018
*	First attempt to import OpenMP runtime	Jim Cownie	2013-09-27	1	-0/+464
	llvm-svn: 191506