diff options
| author | Jeroen Ketema <j.ketema@xs4all.nl> | 2017-10-09 18:36:48 +0000 |
|---|---|---|
| committer | Jeroen Ketema <j.ketema@xs4all.nl> | 2017-10-09 18:36:48 +0000 |
| commit | 4f5a3d5d6f15d8624cd78b53246d633ca54619c1 (patch) | |
| tree | b1155f1aa6716bd38014a0005aaeb3c37bc6ff25 | |
| parent | 1c0c35d034bc323b1aabd05ed75779cd17d7c75b (diff) | |
| download | bcm5719-llvm-4f5a3d5d6f15d8624cd78b53246d633ca54619c1.tar.gz bcm5719-llvm-4f5a3d5d6f15d8624cd78b53246d633ca54619c1.zip | |
Make ptx barrier work irrespective of the cl_mem_fence_flags
This generates a "bar.sync 0” instruction, which not only causes the
threads to wait, but does acts as a memory fence, as required by
OpenCL. The fence does not differentiate between local and global
memory. Unfortunately, there is no similar instruction which does
not include a memory fence. Hence, we cannot optimize the case
where neither CLK_LOCAL_MEM_FENCE nor CLK_GLOBAL_MEM_FENCE is
passed.
llvm-svn: 315228
| -rw-r--r-- | libclc/ptx-nvidiacl/lib/synchronization/barrier.cl | 4 |
1 files changed, 1 insertions, 3 deletions
diff --git a/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl b/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl index 88e149396e2..930e36a2853 100644 --- a/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl +++ b/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl @@ -1,8 +1,6 @@ #include <clc/clc.h> _CLC_DEF void barrier(cl_mem_fence_flags flags) { - if (flags & CLK_LOCAL_MEM_FENCE) { - __syncthreads(); - } + __syncthreads(); } |

