summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJeroen Ketema <j.ketema@xs4all.nl>2017-10-09 18:36:48 +0000
committerJeroen Ketema <j.ketema@xs4all.nl>2017-10-09 18:36:48 +0000
commit4f5a3d5d6f15d8624cd78b53246d633ca54619c1 (patch)
treeb1155f1aa6716bd38014a0005aaeb3c37bc6ff25
parent1c0c35d034bc323b1aabd05ed75779cd17d7c75b (diff)
downloadbcm5719-llvm-4f5a3d5d6f15d8624cd78b53246d633ca54619c1.tar.gz
bcm5719-llvm-4f5a3d5d6f15d8624cd78b53246d633ca54619c1.zip
Make ptx barrier work irrespective of the cl_mem_fence_flags
This generates a "bar.sync 0” instruction, which not only causes the threads to wait, but does acts as a memory fence, as required by OpenCL. The fence does not differentiate between local and global memory. Unfortunately, there is no similar instruction which does not include a memory fence. Hence, we cannot optimize the case where neither CLK_LOCAL_MEM_FENCE nor CLK_GLOBAL_MEM_FENCE is passed. llvm-svn: 315228
-rw-r--r--libclc/ptx-nvidiacl/lib/synchronization/barrier.cl4
1 files changed, 1 insertions, 3 deletions
diff --git a/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl b/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl
index 88e149396e2..930e36a2853 100644
--- a/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl
+++ b/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl
@@ -1,8 +1,6 @@
#include <clc/clc.h>
_CLC_DEF void barrier(cl_mem_fence_flags flags) {
- if (flags & CLK_LOCAL_MEM_FENCE) {
- __syncthreads();
- }
+ __syncthreads();
}
OpenPOWER on IntegriCloud