[OpenCL] Add '-cl-uniform-work-group-size' compile option

Summary: OpenCL 2.0 specification defines '-cl-uniform-work-group-size' option, which requires that the global work-size be a multiple of the work-group size specified to clEnqueueNDRangeKernel and allows optimizations that are made possible by this restriction. The patch introduces the support of this option. To keep information about whether an OpenCL kernel has uniform work group size or not, clang generates 'uniform-work-group-size' function attribute for every kernel: - "uniform-work-group-size"="true" for OpenCL 1.2 and lower, - "uniform-work-group-size"="true" for OpenCL 2.0 and higher if '-cl-uniform-work-group-size' option was specified, - "uniform-work-group-size"="false" for OpenCL 2.0 and higher if no '-cl-uniform-work-group-size' options was specified. If the function is not an OpenCL kernel, 'uniform-work-group-size' attribute isn't generated. Patch by: krisb Reviewers: yaxunl, Anastasia, b-sumner Reviewed By: yaxunl, Anastasia Subscribers: nhaehnle, yaxunl, Anastasia, cfe-commits Differential Revision: https://reviews.llvm.org/D43570 llvm-svn: 325771
author: Alexey Sotkin <alexey.sotkin@intel.com> 2018-02-22 11:54:14 +0000
committer: Alexey Sotkin <alexey.sotkin@intel.com> 2018-02-22 11:54:14 +0000
commit: 20f65928e1cd6f5d7d972bfa6a70439ea2da9241 (patch)
tree: 40c80e73351c9d8cc806f0a0be4bcc5cbb6480fc /clang/lib
parent: 16596471d936af603845a3c688d25ad56322fc54 (diff)
download: bcm5719-llvm-20f65928e1cd6f5d7d972bfa6a70439ea2da9241.tar.gz
bcm5719-llvm-20f65928e1cd6f5d7d972bfa6a70439ea2da9241.zip
3 files changed, 18 insertions, 0 deletions
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index d62b080e31c..959df69f22e 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1870,6 +1870,21 @@ void CodeGenModule::ConstructAttributeList(
     }
   }
 
+  if (TargetDecl && TargetDecl->hasAttr<OpenCLKernelAttr>()) {
+    if (getLangOpts().OpenCLVersion <= 120) {
+      // OpenCL v1.2 Work groups are always uniform
+      FuncAttrs.addAttribute("uniform-work-group-size", "true");
+    } else {
+      // OpenCL v2.0 Work groups may be whether uniform or not.
+      // '-cl-uniform-work-group-size' compile option gets a hint
+      // to the compiler that the global work-size be a multiple of
+      // the work-group size specified to clEnqueueNDRangeKernel
+      // (i.e. work groups are uniform).
+      FuncAttrs.addAttribute("uniform-work-group-size",
+                             llvm::toStringRef(CodeGenOpts.UniformWGSize));
+    }
+  }
+
   if (!AttrOnCallSite) {
     bool DisableTailCalls =
         CodeGenOpts.DisableTailCalls ||
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 64bea9e56ab..cc5cf0f042f 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -2379,6 +2379,7 @@ static void RenderOpenCLOptions(const ArgList &Args, ArgStringList &CmdArgs) {
       options::OPT_cl_no_signed_zeros,
       options::OPT_cl_denorms_are_zero,
       options::OPT_cl_fp32_correctly_rounded_divide_sqrt,
+      options::OPT_cl_uniform_work_group_size
   };
 
   if (Arg *A = Args.getLastArg(options::OPT_cl_std_EQ)) {
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 3bdc116b963..5be02c96826 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -659,6 +659,8 @@ static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, InputKind IK,
   Opts.FlushDenorm = Args.hasArg(OPT_cl_denorms_are_zero);
   Opts.CorrectlyRoundedDivSqrt =
       Args.hasArg(OPT_cl_fp32_correctly_rounded_divide_sqrt);
+  Opts.UniformWGSize =
+      Args.hasArg(OPT_cl_uniform_work_group_size);
   Opts.Reciprocals = Args.getAllArgValues(OPT_mrecip_EQ);
   Opts.ReciprocalMath = Args.hasArg(OPT_freciprocal_math);
   Opts.NoTrappingMath = Args.hasArg(OPT_fno_trapping_math);
author	Alexey Sotkin <alexey.sotkin@intel.com>	2018-02-22 11:54:14 +0000
committer	Alexey Sotkin <alexey.sotkin@intel.com>	2018-02-22 11:54:14 +0000
commit	20f65928e1cd6f5d7d972bfa6a70439ea2da9241 (patch)
tree	40c80e73351c9d8cc806f0a0be4bcc5cbb6480fc /clang/lib
parent	16596471d936af603845a3c688d25ad56322fc54 (diff)
download	bcm5719-llvm-20f65928e1cd6f5d7d972bfa6a70439ea2da9241.tar.gz bcm5719-llvm-20f65928e1cd6f5d7d972bfa6a70439ea2da9241.zip