bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[CodeGen] Generate llvm.loop.parallel_accesses instead of ↵	Michael Kruse	2018-12-20	11	-221/+228
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	llvm.mem.parallel_loop_access metadata. Instead of generating llvm.mem.parallel_loop_access metadata, generate llvm.access.group on instructions and llvm.loop.parallel_accesses on loops. There is one access group per generated loop. This is clang part of D52116/r349725. Differential Revision: https://reviews.llvm.org/D52117 llvm-svn: 349823
*	[OPENMP]Mark the loop as started when initialized.	Alexey Bataev	2018-12-19	1	-1/+12
\| \| \| \| \| \| \| \| \|	Need to mark the loop as started when the initialization statement is found. It is required to prevent possible incorrect loop iteraton variable detection during template instantiation and fix the compiler crash during the codegen. llvm-svn: 349657
*	[OpenMP] Fix data sharing analysis in nested clause	Joel E. Denny	2018-12-19	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this patch, clang doesn't complain that X needs explicit data sharing attributes in the following: ``` #pragma omp target teams default(none) { #pragma omp parallel num_threads(X) ; } ``` However, clang does produce that complaint after the braces are removed. With this patch, clang complains in both cases. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D55861 llvm-svn: 349635
*	[OPENMP] parsing and sema support for 'close' map-type-modifier	Kelvin Li	2018-12-18	12	-24/+170
\| \| \| \| \| \| \| \| \| \| \| \|	A map clause with the close map-type-modifier is a hint to prefer that the variables are mapped using a copy into faster memory. Patch by Ahsan Saghir (saghir) Differential Revision: https://reviews.llvm.org/D55719 llvm-svn: 349551
*	[OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytes	Alexey Bataev	2018-12-18	10	-11/+11
\| \| \| \| \| \| \| \| \| \| \|	buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540
*	[OPENMP][NVPTX]Added extra sync point to the inter-warp copy function.	Alexey Bataev	2018-12-18	1	-0/+4
\| \| \| \| \| \| \|	The parallel reduction operation requires an extra synchronization point in the inter-warp copy function to avoid divergence. llvm-svn: 349525
*	[OPENMP][NVPTX]Improved interwarp copy function.	Alexey Bataev	2018-12-14	2	-28/+21
\| \| \| \| \| \| \| \| \|	Inlined runtime with the current implementation of the interwarp copy function leads to the undefined behavior because of the not quite correct implementation of the barriers. Start using generic __kmpc_barier function instead of the custom made barriers. llvm-svn: 349192
*	[OPENMP][NVPTX] Fix globalization of the mapped array sections.	Alexey Bataev	2018-12-06	1	-19/+52
\| \| \| \| \| \| \| \| \|	If the array section is based on pointer and this sections is mapped in target region + then it is used in the inner parallel region, it also must be globalized as the pointer itself is passed by value, not by reference. llvm-svn: 348492
*	[OPENMP][NVPTX]Fixed emission of the critical region.	Alexey Bataev	2018-12-04	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Critical regions in NVPTX are the constructs, which, generally speaking, are not supported by the NVPTX target. Instead we're using special technique to handle the critical regions. Currently they are supported only within the loop and all the threads in the loop must execute the same critical region. Inside of this special regions the regions still must be emitted as critical, to avoid possible data races between the teams + synchronization must use __kmpc_barrier functions. llvm-svn: 348272
*	[OPENMP][NVPTX]Mark __kmpc_barrier functions as convergent.	Alexey Bataev	2018-12-04	1	-2/+8
\| \| \| \| \| \| \| \|	__kmpc_barrier runtime functions must be marked as convergent to prevent some dangerous optimizations. Also, for NVPTX target all barriers must be emitted as simple barriers. llvm-svn: 348271
*	Move AST tests into their own test directory; NFC.	Aaron Ballman	2018-11-30	1	-84/+0
\| \| \| \| \| \|	This moves everything primarily testing the functionality of -ast-dump and -ast-print into their own directory, rather than leaving the tests spread around the testing directory. llvm-svn: 348017
*	[OPENMP][NVPTX]Call get __kmpc_global_thread_num in worker after	Alexey Bataev	2018-11-29	1	-1/+1
\| \| \| \| \| \| \| \| \|	initialization. Function __kmpc_global_thread_num should be called only after initialization, not earlier. llvm-svn: 347919
*	[OpenMP] Add a new version of the SPMD deinit kernel function	Gheorghe-Teodor Bercea	2018-11-29	8	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D54970 llvm-svn: 347915
*	[OPENMP]Fix emission of the target regions in virtual functions.	Alexey Bataev	2018-11-28	1	-0/+24
\| \| \| \| \| \| \| \|	Fixed emission of the target regions found in the virtual functions. Previously we may end up with the situation when those regions could be skipped. llvm-svn: 347793
*	[OPENMP][NVPTX]Basic support for reductions across the teams.	Alexey Bataev	2018-11-27	1	-948/+7
\| \| \| \| \| \|	Added basic codegen support for the reductions across the teams. llvm-svn: 347715
*	[OPENMP][NVPTX]Emit default locations with the correct Exec\|Runtime	Alexey Bataev	2018-11-26	3	-29/+173
\| \| \| \| \| \| \| \| \| \| \|	modes. If the region is inside target\|teams\|distribute region, we can emit the locations with the correct info for execution mode and runtime mode. Patch adds this ability to the NVPTX codegen to help the optimizer to produce better code. llvm-svn: 347583
*	[OPENMP][NVPTX]Emit default locations as constant with undefined mode.	Alexey Bataev	2018-11-21	2	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \|	For the NVPTX target default locations should be emitted as constants + additional info must be emitted in the reserved_2 field of the ident_t structure. The 1st bit controls the execution mode and the 2nd bit controls use of the lightweight runtime. The combination of the bits for Non-SPMD mode + lightweight runtime represents special undefined mode, used outside of the target regions for orphaned directives or functions. Should allow and additional optimization inside of the target regions. llvm-svn: 347425
*	[OPENMP]Fix handling of the LCVs in loop-based directives.	Alexey Bataev	2018-11-21	1	-0/+90
\| \| \| \| \| \| \| \| \|	Loop-control variables with the default data-sharing attributes should not be captured in the OpenMP region as they are private by default. Also, default attributes should be emitted for such variables in the inner OpenMP regions for the correct data sharing during codegen. llvm-svn: 347409
*	[OPENMP] Support relational-op != (not-equal) as one of the canonical	Kelvin Li	2018-11-21	22	-20/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	forms of random access iterator In OpenMP 4.5, only 4 relational operators are supported: <, <=, >, and >=. This work is to enable support for relational operator != (not-equal) as one of the canonical forms. Patch by Anh Tuyen Tran Differential Revision: https://reviews.llvm.org/D54441 llvm-svn: 347405
*	[OpenMP] Update CHECK-DAG usage in target_parallel_codegen.cpp	Joel E. Denny	2018-11-20	1	-37/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adjusts a test not to depend on deprecated FileCheck behavior that permits overlapping matches within a block of CHECK-DAG directives. Thus, this patch also removes uses of FileCheck's -allow-deprecated-dag-overlap command-line option. There were two issues in this test: 1. There were sets of patterns for store instructions in which a pattern X could match a superset of a pattern Y. While X appeared before Y, Y's intended match appeared before X's intended match. The result was that X matched Y's intended match. Under the old overlapping behavior, Y also matched Y's intended match. Under the new non-overlapping behavior, Y had nothing left to match. This patch fixes this by gathering these sets in one place and putting the most specific patterns (Y) before the more general patterns (X). 2. The CHECK-DAG patterns involving the variables CBPADDR3 and CBPADDR4 were the same, but there was only one match in the text, so CBPADDR4 patterns had nothing to match under the new non-overlapping behavior. Moreover, a preceding related series of directives had variables (SADDR0, BPADDR0, etc.) numbered only 0 through 4, but this series had variables numbered 0 through 5. Assuming CBPADDR4's directives were not intended, this patch removes them. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D54765 llvm-svn: 347351
*	[OpenMP] Update CHECK-DAG usage in for_codegen.cpp	Joel E. Denny	2018-11-20	1	-14/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adjusts a test not to depend on deprecated FileCheck behavior that permits overlapping matches within a block of CHECK-DAG directives. Thus, this patch also removes uses of FileCheck's -allow-deprecated-dag-overlap command-line option. Specifically, the FileCheck variables DBG_LOC_START, DBG_LOC_END, and DBG_LOC_CANCEL were all set to the same value. As a result, three TERM_DEBUG-DAG patterns, one for each variable, all matched the same text under the old overlapping behavior. Under the new non-overlapping behavior, that's not permitted. This patch's solution is to replace these variables with one variable and replace these patterns with one pattern. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D54764 llvm-svn: 347350
*	[OpenMP] Check target architecture supports unified shared memory for ↵	Patrick Lyster	2018-11-19	1	-0/+25
\| \| \| \| \| \|	requires directive. Differential Review: https://reviews.llvm.org/D54493 llvm-svn: 347214
*	[OPENMP]Fix PR39694: do not capture `this` in non-`this` region.	Alexey Bataev	2018-11-16	1	-9/+18
\| \| \| \| \| \| \| \| \|	If lambda is used inside of the OpenMP region and captures `this`, we should recapture it in the OpenMP region also. But we should do this only if the OpenMP region is used in the context of the same class, just like the lambda. llvm-svn: 347096
*	[OPENMP][NVPTX]Emit correct reduction code for teams/parallel	Alexey Bataev	2018-11-16	9	-119/+414
\| \| \| \| \| \| \| \| \| \| \|	reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches. llvm-svn: 347081
*	[OPENMP][NVPTX]Extend number of constructs executed in SPMD mode.	Alexey Bataev	2018-11-09	7	-12/+23
\| \| \| \| \| \| \| \| \|	If the statements between target\|teams\|distribute directives does not require execution in master thread, like constant expressions, null statements, simple declarations, etc., such construct can be xecuted in SPMD mode. llvm-svn: 346551
*	[OPENMP][NVPTX]Allow to use shared memory for the	Alexey Bataev	2018-11-09	7	-42/+43
\| \| \| \| \| \| \| \| \| \|	target\|teams\|distribute variables. If the total size of the variables, declared in target\|teams\|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory. llvm-svn: 346507
*	[OPENMP]Make lambda mapping follow reqs for PTR_AND_OBJ mapping.	Alexey Bataev	2018-11-08	1	-0/+15
\| \| \| \| \| \| \| \|	The base pointer for the lambda mapping must point to the lambda capture placement and pointer must point to the captured variable itself. Patch fixes this problem. llvm-svn: 346408
*	[OPENMP]Fix handling of the globals during compilation for the device.	Alexey Bataev	2018-11-07	3	-5/+42
\| \| \| \| \| \| \| \|	Fixed lookup for the target regions in unused virtual functions + fixed processing of the global variables not marked as declare target but emitted during debug info emission. llvm-svn: 346343
*	[OPENMP][NVPTX]Use __kmpc_data_sharing_coalesced_push_stack function.	Alexey Bataev	2018-11-02	2	-5/+5
\| \| \| \| \| \| \| \|	Coalesced memory access requires use of the new function `__kmpc_data_sharing_coalesced_push_stack` instead of the `__kmpc_data_sharing_push_stack`. llvm-svn: 345991
*	[OPENMP]Change the mapping type for lambda captures.	Alexey Bataev	2018-11-02	1	-4/+4
\| \| \| \| \| \|	The previously used combination `PTR_AND_OBJ \| PRIVATE` could be used for mapping of some data in Fortran. Changed it to `PTR_AND_OBJ \| LITERAL`. llvm-svn: 345982
*	[OPENMP][NVPTX]Improve emission of the globalized variables for	Alexey Bataev	2018-11-02	7	-14/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	target/teams/distribute regions. Target/teams/distribute regions exist for all the time the kernel is executed. Thus, if the variable is declared in their context and then escape it, we can allocate global memory statically instead of allocating it dynamically. Patch captures all the globalized variables in target/teams/distribute contexts, merges them into the records, one per each target region. Those records are then joined into the union, one per compilation unit (to save the global memory). Those units are organized into 2 x dimensional arrays, where the first dimension is the number of blocks per SM and the second one is the number of SMs. Runtime functions manage this global memory space between the executing teams. llvm-svn: 345978
*	Add support for 'atomic_default_mem_order' clause on 'requires' directive. ↵	Patrick Lyster	2018-11-02	4	-1/+53
\| \| \| \| \| \|	Also renamed test files relating to 'requires'. Differntial review: https://reviews.llvm.org/D53513 llvm-svn: 345967
*	[OPENMP] Support for mapping of the lambdas in target regions.	Alexey Bataev	2018-10-30	1	-0/+132
\| \| \| \| \| \| \| \| \| \|	Added support for mapping of lambdas in the target regions. It scans all the captures by reference in the lambda, implicitly maps those variables in the target region and then later reinstate the addresses of references in lambda to the correct addresses of the captured\|privatized variables. llvm-svn: 345609
*	[OPENMP]Fix PR39372: Does not complain about loop bound variable not	Alexey Bataev	2018-10-29	50	-348/+354
\| \| \| \| \| \| \| \| \| \| \|	being shared. According to the standard, the variables with unspecified data-sharing attributes in presence of `default(none)` clause must be reported to users. Compiler did not generate error reports for the variables used in other OpenMP regions. Patch fixes this. llvm-svn: 345533
*	[OpenMP] Fix condition.	Gheorghe-Teodor Bercea	2018-10-29	3	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Iteration variable must be strictly less than the number of iterations. This fixes a bug introduced by previous patch D53448. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53827 llvm-svn: 345527
*	[OpenMP][NVPTX] Use single loops when generating code for distribute ↵	Gheorghe-Teodor Bercea	2018-10-29	3	-177/+341
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	parallel for Summary: This patch adds a new code generation path for bound sharing directives containing distribute parallel for. The new code generation scheme applies to chunked schedules on distribute and parallel for directives. The scheme simplifies the code that is being generated by eliminating the need for an outer for loop over chunks for both distribute and parallel for directives. In the case of distribute it applies to any sized chunk while in the parallel for case it only applies when chunk size is 1. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53448 llvm-svn: 345509
*	[OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases.	Gheorghe-Teodor Bercea	2018-10-29	1	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch enables the choosing of the default schedule for parallel for loops even in non-SPMD cases. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53443 llvm-svn: 345507
*	[OPENMP] Do not capture private loop counters.	Alexey Bataev	2018-10-29	6	-9/+15
\| \| \| \| \| \| \| \|	If the loop counter is not declared in the context of the loop and it is private, such loop counters should not be captured in the outlined regions. llvm-svn: 345505
*	[NFC][OpenMP] Add new test for parallel for code generation.	Gheorghe-Teodor Bercea	2018-10-26	1	-0/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a simple test of the parallel for code generation. It will be used to showcase the change introduced by patch D53443. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53772 llvm-svn: 345417
*	[OPENMP]Fix PR39422: variables are not firstprivatized in task context.	Alexey Bataev	2018-10-25	2	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	According to the OpenMP standard, In a task generating construct, if no default clause is present, a variable for which the data-sharing attribute is not determined by the rules above is firstprivatized. Compiler tries to implement this, but if the variable is not directly used in the task context, this variable may not be firstprivatized. Patch fixes this problem. llvm-svn: 345277
*	Do not always request an implicit taskgroup region inside the kmpc_taskloop ↵	Alexey Bataev	2018-10-24	10	-34/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	function Summary: For the following code: ``` int i; #pragma omp taskloop for (i = 0; i < 100; ++i) {} #pragma omp taskloop nogroup for (i = 0; i < 100; ++i) {} ``` Clang emits the following LLVM IR: ``` ... call void @__kmpc_taskgroup(%struct.ident_t* @0, i32 %0) %2 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8) bitcast (i32 (i32, %struct.kmp_task_t_with_privates) @.omp_task_entry. to i32 (i32, i8))) ... call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %2, i32 1, i64* %8, i64* %9, i64 %13, i32 0, i32 0, i64 0, i8* null) call void @__kmpc_end_taskgroup(%struct.ident_t* @0, i32 %0) ... %15 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8) bitcast (i32 (i32, %struct.kmp_task_t_with_privates.1) @.omp_task_entry..2 to i32 (i32, i8))) ... call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %15, i32 1, i64* %21, i64* %22, i64 %26, i32 0, i32 0, i64 0, i8* null) ``` The first set of instructions corresponds to the first taskloop construct. It is important to note that the implicit taskgroup region associated with the taskloop construct has been materialized in our IR: the `__kmpc_taskloop` occurs inside a taskgroup region. Note also that this taskgroup region does not exist in our second taskloop because we are using the `nogroup` clause. The issue here is the 4th argument of the kmpc_taskloop call, starting from the end, is always a zero. Checking the LLVM OpenMP RT implementation, we see that this argument corresponds to the nogroup parameter: ``` void __kmpc_taskloop(ident_t loc, int gtid, kmp_task_t task, int if_val, kmp_uint64 lb, kmp_uint64 ub, kmp_int64 st, int nogroup, int sched, kmp_uint64 grainsize, void *task_dup); ``` So basically we always tell to the RT to do another taskgroup region. For the first taskloop, this means that we create two taskgroup regions. For the second example, it means that despite the fact we had a nogroup clause we are going to have a taskgroup region, so we unnecessary wait until all descendant tasks have been executed. Reviewers: ABataev Reviewed By: ABataev Subscribers: rogfer01, cfe-commits Differential Revision: https://reviews.llvm.org/D53636 llvm-svn: 345180
*	[OPENMP]Fix PR39366: do not try to private field if it is not captured.	Alexey Bataev	2018-10-24	1	-0/+13
\| \| \| \| \| \| \| \| \|	The compiler is crashing if we trying to post-capture the fields implicitly captured inside of the task constructs. Seems, this kind of processing is not supported and such fields should not be firstprivatized. llvm-svn: 345177
*	Revert "[CodeGenCXX] Treat 'this' as noalias in constructors"	Sean Fertile	2018-10-15	51	-311/+311
\| \| \| \| \| \| \|	This reverts commit https://reviews.llvm.org/rL344150 which causes MachineOutliner related failures on the ppc64le multistage buildbot. llvm-svn: 344526
*	[OPENMP][NVPTX]Reduce memory usage in target region.	Alexey Bataev	2018-10-12	3	-34/+13
\| \| \| \| \| \| \|	Additional reduction of the global memory usage in the target regions without parallel regions. llvm-svn: 344413
*	[OPENMP][NVPTX]Reduce memory usage in orphaned functions.	Alexey Bataev	2018-10-12	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \|	if the function has globalized variables and called in context of target/teams/distribute regions, it does not need to globalize 32 copies of the same variables for memory coalescing, it is enough to have just one copy, because there is parallel region. Patch does this by adding call for `__kmpc_parallel_level` function and checking its return value. If the code sees that the parallel level is 0, then only one variable is allocated, not 32. llvm-svn: 344356
*	[OPENMP][NVPTX]Reduce memory use for globalized vars in	Alexey Bataev	2018-10-11	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \|	target/teams/distribute regions. Previously introduced globalization scheme that uses memory coalescing scheme may increase memory usage fr the variables that are devlared in target/teams/distribute contexts. We don't need 32 copies of such variables, just 1. Patch reduces memory use in this case. llvm-svn: 344273
*	Add support for 'dynamic_allocators' clause on 'requires' directive. ↵	Patrick Lyster	2018-10-11	2	-1/+9
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D53079 llvm-svn: 344249
*	[CodeGenCXX] Treat 'this' as noalias in constructors	Anton Bikineev	2018-10-10	51	-311/+311
\| \| \| \| \| \| \| \| \|	This is currently a clang extension and a resolution of the defect report in the C++ Standard. Differential Revision: https://reviews.llvm.org/D46441 llvm-svn: 344150
*	[OPENMP][NVPTX] Support memory coalescing for globalized variables.	Alexey Bataev	2018-10-09	7	-32/+66
\| \| \| \| \| \| \| \| \|	Added support for memory coalescing for better performance for globalized variables. From now on all the globalized variables are represented as arrays of 32 elements and each thread accesses these elements using `tid & 31` as index. llvm-svn: 344049
*	[OPENMP][NVPTX] Fix emission of __kmpc_global_thread_num() for non-SPMD	Alexey Bataev	2018-10-05	1	-2/+2
\| \| \| \| \| \| \| \| \|	mode. __kmpc_global_thread_num() should be called before initialization of the runtime. llvm-svn: 343857