Add AllReduceOp to GPU dialect with lowering to NVVM.

The reduction operation is currently fixed to "add", and the scope is fixed to "workgroup". The implementation is currently limited to sizes that are multiple 32 (warp size) and no larger than 1024. PiperOrigin-RevId: 271290265
author: Christian Sigg <csigg@google.com> 2019-09-26 00:17:13 -0700
committer: A. Unique TensorFlower <gardener@tensorflow.org> 2019-09-26 00:17:50 -0700
commit: 116dac00baa6870aec2a2b469b2d6f95c2fbb316 (patch)
tree: 1ff87872c7a0db12f4fed9ff715327b05921792d /mlir/test/Dialect/GPU
parent: 94298cea933991b29dcb7f340725bc25e78cebcf (diff)
download: bcm5719-llvm-116dac00baa6870aec2a2b469b2d6f95c2fbb316.tar.gz
bcm5719-llvm-116dac00baa6870aec2a2b469b2d6f95c2fbb316.zip
1 files changed, 3 insertions, 0 deletions
diff --git a/mlir/test/Dialect/GPU/ops.mlir b/mlir/test/Dialect/GPU/ops.mlir
index b78da9543fc..7c8f682dcc8 100644
--- a/mlir/test/Dialect/GPU/ops.mlir
+++ b/mlir/test/Dialect/GPU/ops.mlir
@@ -76,6 +76,9 @@ func @kernel_1(%arg0 : f32, %arg1 : memref<?xf32, 1>)
   %gDimY = "gpu.grid_dim"() {dimension = "y"} : () -> (index)
   %gDimZ = "gpu.grid_dim"() {dimension = "z"} : () -> (index)
 
+  %one = constant 1.0 : f32
+  %sum = "gpu.all_reduce"(%one) : (f32) -> (f32)
+
   "some_op"(%bIdX, %tIdX) : (index, index) -> ()
   %42 = load %arg1[%bIdX] : memref<?xf32, 1>
   return
author	Christian Sigg <csigg@google.com>	2019-09-26 00:17:13 -0700
committer	A. Unique TensorFlower <gardener@tensorflow.org>	2019-09-26 00:17:50 -0700
commit	116dac00baa6870aec2a2b469b2d6f95c2fbb316 (patch)
tree	1ff87872c7a0db12f4fed9ff715327b05921792d /mlir/test/Dialect/GPU
parent	94298cea933991b29dcb7f340725bc25e78cebcf (diff)
download	bcm5719-llvm-116dac00baa6870aec2a2b469b2d6f95c2fbb316.tar.gz bcm5719-llvm-116dac00baa6870aec2a2b469b2d6f95c2fbb316.zip