summaryrefslogtreecommitdiffstats
path: root/polly/lib/External/ppcg/README
diff options
context:
space:
mode:
Diffstat (limited to 'polly/lib/External/ppcg/README')
-rw-r--r--polly/lib/External/ppcg/README226
1 files changed, 226 insertions, 0 deletions
diff --git a/polly/lib/External/ppcg/README b/polly/lib/External/ppcg/README
new file mode 100644
index 00000000000..a582f8db8fd
--- /dev/null
+++ b/polly/lib/External/ppcg/README
@@ -0,0 +1,226 @@
+Requirements:
+
+- automake, autoconf, libtool
+ (not needed when compiling a release)
+- pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config)
+ (not needed when compiling a release using the included isl and pet)
+- gmp (http://gmplib.org/)
+- libyaml (http://pyyaml.org/wiki/LibYAML)
+ (only needed if you want to compile the pet executable)
+- LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html)
+ Unless you have some other reasons for wanting to use the svn version,
+ it is best to install the latest release (3.6).
+ For more details, see pet/README.
+
+If you are installing on Ubuntu, then you can install the following packages:
+
+automake autoconf libtool pkg-config libgmp3-dev libyaml-dev libclang-dev llvm
+
+Note that you need at least version 3.2 of libclang-dev (ubuntu raring).
+Older versions of this package did not include the required libraries.
+If you are using an older version of ubuntu, then you need to compile and
+install LLVM/clang from source.
+
+
+Preparing:
+
+Grab the latest release and extract it or get the source from
+the git repository as follows. This process requires autoconf,
+automake, libtool and pkg-config.
+
+ git clone git://repo.or.cz/ppcg.git
+ cd ppcg
+ git submodule init
+ git submodule update
+ ./autogen.sh
+
+
+Compilation:
+
+ ./configure
+ make
+ make check
+
+If you have installed any of the required libraries in a non-standard
+location, then you may need to use the --with-gmp-prefix,
+--with-libyaml-prefix and/or --with-clang-prefix options
+when calling "./configure".
+
+
+Using PPCG to generate CUDA or OpenCL code
+
+To convert a fragment of a C program to CUDA, insert a line containing
+
+ #pragma scop
+
+before the fragment and add a line containing
+
+ #pragma endscop
+
+after the fragment. To generate CUDA code run
+
+ ppcg --target=cuda file.c
+
+where file.c is the file containing the fragment. The generated
+code is stored in file_host.cu and file_kernel.cu.
+
+To generate OpenCL code run
+
+ ppcg --target=opencl file.c
+
+where file.c is the file containing the fragment. The generated code
+is stored in file_host.c and file_kernel.cl.
+
+
+Specifying tile, grid and block sizes
+
+The iterations space tile size, grid size and block size can
+be specified using the --sizes option. The argument is a union map
+in isl notation mapping kernels identified by their sequence number
+in a "kernel" space to singleton sets in the "tile", "grid" and "block"
+spaces. The sizes are specified outermost to innermost.
+
+The dimension of the "tile" space indicates the (maximal) number of loop
+dimensions to tile. The elements of the single integer tuple
+specify the tile sizes in each dimension.
+
+The dimension of the "grid" space indicates the (maximal) number of block
+dimensions in the grid. The elements of the single integer tuple
+specify the number of blocks in each dimension.
+
+The dimension of the "block" space indicates the (maximal) number of thread
+dimensions in the grid. The elements of the single integer tuple
+specify the number of threads in each dimension.
+
+For example,
+
+ { kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 }
+
+specifies that in kernel 0, two loops should be tiled with a tile
+size of 64 in both dimensions and that all kernels except kernel 4
+should be run using a block of 16 threads.
+
+Since PPCG performs some scheduling, it can be difficult to predict
+what exactly will end up in a kernel. If you want to specify
+tile, grid or block sizes, you may want to run PPCG first with the defaults,
+examine the kernels and then run PPCG again with the desired sizes.
+Instead of examining the kernels, you can also specify the option
+--dump-sizes on the first run to obtain the effectively used default sizes.
+
+
+Compiling the generated CUDA code with nvcc
+
+To get optimal performance from nvcc, it is important to choose --arch
+according to your target GPU. Specifically, use the flag "--arch sm_20"
+for fermi, "--arch sm_30" for GK10x Kepler and "--arch sm_35" for
+GK110 Kepler. We discourage the use of older cards as we have seen
+correctness issues with compilation for older architectures.
+Note that in the absence of any --arch flag, nvcc defaults to
+"--arch sm_13". This will not only be slower, but can also cause
+correctness issues.
+If you want to obtain results that are identical to those obtained
+by the original code, then you may need to disable some optimizations
+by passing the "--fmad=false" option.
+
+
+Compiling the generated OpenCL code with gcc
+
+To compile the host code you need to link against the file
+ocl_utilities.c which contains utility functions used by the generated
+OpenCL host code. To compile the host code with gcc, run
+
+ gcc -std=c99 file_host.c ocl_utilities.c -lOpenCL
+
+Note that we have experienced the generated OpenCL code freezing
+on some inputs (e.g., the PolyBench symm benchmark) when using
+at least some version of the Nvidia OpenCL library, while the
+corresponding CUDA code runs fine.
+We have experienced no such freezes when using AMD, ARM or Intel
+OpenCL libraries.
+
+By default, the compiled executable will need the _kernel.cl file at
+run time. Alternatively, the option --opencl-embed-kernel-code may be
+given to place the kernel code in a string literal. The kernel code is
+then compiled into the host binary, such that the _kernel.cl file is no
+longer needed at run time. Any kernel include files, in particular
+those supplied using --opencl-include-file, will still be required at
+run time.
+
+
+Function calls
+
+Function calls inside the analyzed fragment are reproduced
+in the CUDA or OpenCL code, but for now it is left to the user
+to make sure that the functions that are being called are
+available from the generated kernels.
+
+In the case of OpenCL code, the --opencl-include-file option
+may be used to specify one or more files to be #include'd
+from the generated code. These files may then contain
+the definitions of the functions being called from the
+program fragment. If the pathnames of the included files
+are relative to the current directory, then you may need
+to additionally specify the --opencl-compiler-options=-I.
+to make sure that the files can be found by the OpenCL compiler.
+The included files may contain definitions of types used by the
+generated kernels. By default, PPCG generates definitions for
+types as needed, but these definitions may collide with those in
+the included files, as PPCG does not consider the contents of the
+included files. The --no-opencl-print-kernel-types will prevent
+PPCG from generating type definitions.
+
+
+Processing PolyBench
+
+When processing a PolyBench/C 3.2 benchmark, you should always specify
+-DPOLYBENCH_USE_C99_PROTO on the ppcg command line. Otherwise, the source
+files are inconsistent, having fixed size arrays but parametrically
+bounded loops iterating over them.
+However, you should not specify this define when compiling
+the PPCG generated code using nvcc since CUDA does not support VLAs.
+
+
+CUDA and function overloading
+
+While CUDA supports function overloading based on the arguments types,
+no such function overloading exists in the input language C. Since PPCG
+simply prints out the same function name as in the original code, this
+may result in a different function being called based on the types
+of the arguments. For example, if the original code contains a call
+to the function sqrt() with a float argument, then the argument will
+be promoted to a double and the sqrt() function will be called.
+In the transformed (CUDA) code, however, overloading will cause the
+function sqrtf() to be called. Until this issue has been resolved in PPCG,
+we recommend that users either explicitly call the function sqrtf() or
+explicitly cast the argument to double in the input code.
+
+
+Contact
+
+For bug reports, feature requests and questions,
+contact http://groups.google.com/group/isl-development
+
+
+Citing PPCG
+
+If you use PPCG for your research, you are invited to cite
+the following paper.
+
+@article{Verdoolaege2013PPCG,
+ author = {Verdoolaege, Sven and Juega, Juan Carlos and Cohen, Albert and
+ G\'{o}mez, Jos{\'e} Ignacio and Tenllado, Christian and
+ Catthoor, Francky},
+ title = {Polyhedral parallel code generation for CUDA},
+ journal = {ACM Trans. Archit. Code Optim.},
+ issue_date = {January 2013},
+ volume = {9},
+ number = {4},
+ month = jan,
+ year = {2013},
+ issn = {1544-3566},
+ pages = {54:1--54:23},
+ doi = {10.1145/2400682.2400713},
+ acmid = {2400713},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+}
OpenPOWER on IntegriCloud