[OpenMP] Parallel reduction on the NVPTX device.

This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295319
author: Arpith Chacko Jacob <acjacob@us.ibm.com> 2017-02-16 14:03:36 +0000
committer: Arpith Chacko Jacob <acjacob@us.ibm.com> 2017-02-16 14:03:36 +0000
commit: 8e170fc8573f2091b1443484bcee5a76083cf710 (patch)
tree: bd4cddcbd7fa78266f5d8a49fe84110151c4c789 /clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h
parent: 3e81c2675e8d4abdfbfe179866dcf03cd5e51398 (diff)
download: bcm5719-llvm-8e170fc8573f2091b1443484bcee5a76083cf710.tar.gz
bcm5719-llvm-8e170fc8573f2091b1443484bcee5a76083cf710.zip
1 files changed, 26 insertions, 7 deletions
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h b/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h
index d1ff0e8a24a..ae25e94759e 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h
@@ -67,12 +67,6 @@ private:
   /// \brief Signal termination of Spmd mode execution.
   void emitSpmdEntryFooter(CodeGenFunction &CGF, EntryFunctionState &EST);
 
-  /// \brief Returns specified OpenMP runtime function for the current OpenMP
-  /// implementation.  Specialized for the NVPTX device.
-  /// \param Function OpenMP runtime function.
-  /// \return Specified function.
-  llvm::Constant *createNVPTXRuntimeFunction(unsigned Function);
-
   //
   // Base class overrides.
   //
@@ -248,7 +242,32 @@ public:
                         ArrayRef<llvm::Value *> CapturedVars,
                         const Expr *IfCond) override;
 
-public:
+  /// Emit a code for reduction clause.
+  ///
+  /// \param Privates List of private copies for original reduction arguments.
+  /// \param LHSExprs List of LHS in \a ReductionOps reduction operations.
+  /// \param RHSExprs List of RHS in \a ReductionOps reduction operations.
+  /// \param ReductionOps List of reduction operations in form 'LHS binop RHS'
+  /// or 'operator binop(LHS, RHS)'.
+  /// \param Options List of options for reduction codegen:
+  ///     WithNowait true if parent directive has also nowait clause, false
+  ///     otherwise.
+  ///     SimpleReduction Emit reduction operation only. Used for omp simd
+  ///     directive on the host.
+  ///     ReductionKind The kind of reduction to perform.
+  virtual void emitReduction(CodeGenFunction &CGF, SourceLocation Loc,
+                             ArrayRef<const Expr *> Privates,
+                             ArrayRef<const Expr *> LHSExprs,
+                             ArrayRef<const Expr *> RHSExprs,
+                             ArrayRef<const Expr *> ReductionOps,
+                             ReductionOptionsTy Options) override;
+
+  /// Returns specified OpenMP runtime function for the current OpenMP
+  /// implementation.  Specialized for the NVPTX device.
+  /// \param Function OpenMP runtime function.
+  /// \return Specified function.
+  llvm::Constant *createNVPTXRuntimeFunction(unsigned Function);
+
   /// Target codegen is specialized based on two programming models: the
   /// 'generic' fork-join model of OpenMP, and a more GPU efficient 'spmd'
   /// model for constructs like 'target parallel' that support it.
author	Arpith Chacko Jacob <acjacob@us.ibm.com>	2017-02-16 14:03:36 +0000
committer	Arpith Chacko Jacob <acjacob@us.ibm.com>	2017-02-16 14:03:36 +0000
commit	8e170fc8573f2091b1443484bcee5a76083cf710 (patch)
tree	bd4cddcbd7fa78266f5d8a49fe84110151c4c789 /clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h
parent	3e81c2675e8d4abdfbfe179866dcf03cd5e51398 (diff)
download	bcm5719-llvm-8e170fc8573f2091b1443484bcee5a76083cf710.tar.gz bcm5719-llvm-8e170fc8573f2091b1443484bcee5a76083cf710.zip