summaryrefslogtreecommitdiffstats
path: root/llvm
diff options
context:
space:
mode:
authorJingyue Wu <jingyue@google.com>2016-03-30 05:05:40 +0000
committerJingyue Wu <jingyue@google.com>2016-03-30 05:05:40 +0000
commitf190ed4355c246d1835bb22e3bc83ed24c77c83c (patch)
treef9973553e441a1215c85f622e3ea0091c605c73b /llvm
parent0a1c6c2ee5cf889dcbd4f8f1657955bb9fc8a7c7 (diff)
downloadbcm5719-llvm-f190ed4355c246d1835bb22e3bc83ed24c77c83c.tar.gz
bcm5719-llvm-f190ed4355c246d1835bb22e3bc83ed24c77c83c.zip
[docs] Add gpucc publication and tutorial.
llvm-svn: 264839
Diffstat (limited to 'llvm')
-rw-r--r--llvm/docs/CompileCudaWithLLVM.rst20
1 files changed, 16 insertions, 4 deletions
diff --git a/llvm/docs/CompileCudaWithLLVM.rst b/llvm/docs/CompileCudaWithLLVM.rst
index 8e7ed0de42f..5ed3f14005a 100644
--- a/llvm/docs/CompileCudaWithLLVM.rst
+++ b/llvm/docs/CompileCudaWithLLVM.rst
@@ -53,7 +53,7 @@ How to Compile CUDA C/C++ with LLVM
===================================
We assume you have installed the CUDA driver and runtime. Consult the `NVIDIA
-CUDA installation Guide
+CUDA installation guide
<https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ if
you have not.
@@ -167,10 +167,9 @@ customizable target-independent optimization pipeline.
straight-line scalar optimizations <https://goo.gl/4Rb9As>`_.
* **Inferring memory spaces**. `This optimization
- <http://www.llvm.org/docs/doxygen/html/NVPTXFavorNonGenericAddrSpaces_8cpp_source.html>`_
+ <https://github.com/llvm-mirror/llvm/blob/master/lib/Target/NVPTX/NVPTXInferAddressSpaces.cpp>`_
infers the memory space of an address so that the backend can emit faster
- special loads and stores from it. Details can be found in the `design
- document for memory space inference <https://goo.gl/5wH2Ct>`_.
+ special loads and stores from it.
* **Aggressive loop unrooling and function inlining**. Loop unrolling and
function inlining need to be more aggressive for GPUs than for CPUs because
@@ -201,6 +200,19 @@ customizable target-independent optimization pipeline.
divides in our benchmarks have a divisor and dividend which fit in 32-bits at
runtime. This optimization provides a fast path for this common case.
+Publication
+===========
+
+| `gpucc: An Open-Source GPGPU Compiler <http://dl.acm.org/citation.cfm?id=2854041>`_
+| Jingyue Wu, Artem Belevich, Eli Bendersky, Mark Heffernan, Chris Leary, Jacques Pienaar, Bjarke Roune, Rob Springer, Xuetian Weng, Robert Hundt
+| *Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO 2016)*
+| `Slides for the CGO talk <http://wujingyue.com/docs/gpucc-talk.pdf>`_
+
+Tutorial
+========
+
+`CGO 2016 gpucc tutorial <http://wujingyue.com/docs/gpucc-tutorial.pdf>`_
+
Obtaining Help
==============
OpenPOWER on IntegriCloud