[OPENMP][NVPTX]Improve emission of the globalized variables for

target/teams/distribute regions. Target/teams/distribute regions exist for all the time the kernel is executed. Thus, if the variable is declared in their context and then escape it, we can allocate global memory statically instead of allocating it dynamically. Patch captures all the globalized variables in target/teams/distribute contexts, merges them into the records, one per each target region. Those records are then joined into the union, one per compilation unit (to save the global memory). Those units are organized into 2 x dimensional arrays, where the first dimension is the number of blocks per SM and the second one is the number of SMs. Runtime functions manage this global memory space between the executing teams. llvm-svn: 345978
author: Alexey Bataev <a.bataev@hotmail.com> 2018-11-02 14:54:07 +0000
committer: Alexey Bataev <a.bataev@hotmail.com> 2018-11-02 14:54:07 +0000
commit: e40901806fec4ea87e00cd408f8ca31c5868de0c (patch)
tree: c95f4e8bcc44e5c9343a630516a41b9f4317e9a0 /clang/test/OpenMP/nvptx_parallel_codegen.cpp
parent: c8325b4b597aceb07edf5b3caefdebc59a263683 (diff)
download: bcm5719-llvm-e40901806fec4ea87e00cd408f8ca31c5868de0c.tar.gz
bcm5719-llvm-e40901806fec4ea87e00cd408f8ca31c5868de0c.zip
1 files changed, 12 insertions, 1 deletions
diff --git a/clang/test/OpenMP/nvptx_parallel_codegen.cpp b/clang/test/OpenMP/nvptx_parallel_codegen.cpp
index d1a3104407d..5f0f09af870 100644
--- a/clang/test/OpenMP/nvptx_parallel_codegen.cpp
+++ b/clang/test/OpenMP/nvptx_parallel_codegen.cpp
@@ -71,6 +71,12 @@ int bar(int n){
   return a;
 }
 
+// CHECK: [[MEM_TY:%.+]] = type { [4 x i8] }
+// CHECK-DAG: [[GLOBAL_RD:@.+]] = weak global [{{[0-9]+}} x [{{[0-9]+}} x [[MEM_TY]]]] zeroinitializer
+// CHECK-DAG: [[GLOBAL_RD_PTR:@.+]] = weak unnamed_addr constant i8* getelementptr inbounds ([{{[0-9]+}} x [{{[0-9]+}} x [[MEM_TY]]]], [{{[0-9]+}} x [{{[0-9]+}} x [[MEM_TY]]]]* [[GLOBAL_RD]], i{{[0-9]+}} 0, i{{[0-9]+}} 0, i{{[0-9]+}} 0, i{{[0-9]+}} 0, i{{[0-9]+}} 0)
+// CHECK-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* null
+// CHECK-DAG: [[KERNEL_SIZE:@.+]] = internal unnamed_addr constant i{{64|32}} 4
+
 // CHECK-NOT: define {{.*}}void {{@__omp_offloading_.+template.+l17}}_worker()
 
 // CHECK-LABEL: define {{.*}}void {{@__omp_offloading_.+template.+l26}}_worker()
@@ -318,12 +324,17 @@ int bar(int n){
 // CHECK-32: [[A_ADDR:%.+]] = alloca i32,
 // CHECK-64: [[A_ADDR:%.+]] = alloca i64,
 // CHECK-64: [[CONV:%.+]] = bitcast i64* [[A_ADDR]] to i32*
-// CHECK: [[STACK:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{64|32}} 4, i16 0)
+// CHECK: [[GLOBAL_RD:%.+]] = load i8*, i8** [[GLOBAL_RD_PTR]],
+// CHECK: [[SIZE:%.+]] = load i{{64|32}}, i{{64|32}}* [[KERNEL_SIZE]],
+// CHECK: call void @__kmpc_get_team_static_memory(i8* [[GLOBAL_RD]], i{{64|32}} [[SIZE]], i16 0, i8** addrspacecast (i8* addrspace(3)* [[KERNEL_PTR]] to i8**))
+// CHECK: [[KERNEL_RD:%.+]] = load i8*, i8* addrspace(3)* [[KERNEL_PTR]],
+// CHECK: [[STACK:%.+]] = getelementptr inbounds i8, i8* [[KERNEL_RD]], i{{64|32}} 0
 // CHECK: [[BC:%.+]] = bitcast i8* [[STACK]] to %struct._globalized_locals_ty*
 // CHECK-32: [[A:%.+]] = load i32, i32* [[A_ADDR]],
 // CHECK-64: [[A:%.+]] = load i32, i32* [[CONV]],
 // CHECK: [[GLOBAL_A_ADDR:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[BC]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
 // CHECK: store i32 [[A]], i32* [[GLOBAL_A_ADDR]],
+// CHECK: call void @__kmpc_restore_team_static_memory(i16 0)
 
 // CHECK-LABEL: define internal void @{{.+}}(i32* noalias %{{.+}}, i32* noalias %{{.+}}, i32* dereferenceable{{.*}})
 // CHECK:  [[CC:%.+]] = alloca i32,
author	Alexey Bataev <a.bataev@hotmail.com>	2018-11-02 14:54:07 +0000
committer	Alexey Bataev <a.bataev@hotmail.com>	2018-11-02 14:54:07 +0000
commit	e40901806fec4ea87e00cd408f8ca31c5868de0c (patch)
tree	c95f4e8bcc44e5c9343a630516a41b9f4317e9a0 /clang/test/OpenMP/nvptx_parallel_codegen.cpp
parent	c8325b4b597aceb07edf5b3caefdebc59a263683 (diff)
download	bcm5719-llvm-e40901806fec4ea87e00cd408f8ca31c5868de0c.tar.gz bcm5719-llvm-e40901806fec4ea87e00cd408f8ca31c5868de0c.zip