diff options
Diffstat (limited to 'polly/lib/External/ppcg/README')
-rw-r--r-- | polly/lib/External/ppcg/README | 226 |
1 files changed, 226 insertions, 0 deletions
diff --git a/polly/lib/External/ppcg/README b/polly/lib/External/ppcg/README new file mode 100644 index 00000000000..a582f8db8fd --- /dev/null +++ b/polly/lib/External/ppcg/README @@ -0,0 +1,226 @@ +Requirements: + +- automake, autoconf, libtool + (not needed when compiling a release) +- pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config) + (not needed when compiling a release using the included isl and pet) +- gmp (http://gmplib.org/) +- libyaml (http://pyyaml.org/wiki/LibYAML) + (only needed if you want to compile the pet executable) +- LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html) + Unless you have some other reasons for wanting to use the svn version, + it is best to install the latest release (3.6). + For more details, see pet/README. + +If you are installing on Ubuntu, then you can install the following packages: + +automake autoconf libtool pkg-config libgmp3-dev libyaml-dev libclang-dev llvm + +Note that you need at least version 3.2 of libclang-dev (ubuntu raring). +Older versions of this package did not include the required libraries. +If you are using an older version of ubuntu, then you need to compile and +install LLVM/clang from source. + + +Preparing: + +Grab the latest release and extract it or get the source from +the git repository as follows. This process requires autoconf, +automake, libtool and pkg-config. + + git clone git://repo.or.cz/ppcg.git + cd ppcg + git submodule init + git submodule update + ./autogen.sh + + +Compilation: + + ./configure + make + make check + +If you have installed any of the required libraries in a non-standard +location, then you may need to use the --with-gmp-prefix, +--with-libyaml-prefix and/or --with-clang-prefix options +when calling "./configure". + + +Using PPCG to generate CUDA or OpenCL code + +To convert a fragment of a C program to CUDA, insert a line containing + + #pragma scop + +before the fragment and add a line containing + + #pragma endscop + +after the fragment. To generate CUDA code run + + ppcg --target=cuda file.c + +where file.c is the file containing the fragment. The generated +code is stored in file_host.cu and file_kernel.cu. + +To generate OpenCL code run + + ppcg --target=opencl file.c + +where file.c is the file containing the fragment. The generated code +is stored in file_host.c and file_kernel.cl. + + +Specifying tile, grid and block sizes + +The iterations space tile size, grid size and block size can +be specified using the --sizes option. The argument is a union map +in isl notation mapping kernels identified by their sequence number +in a "kernel" space to singleton sets in the "tile", "grid" and "block" +spaces. The sizes are specified outermost to innermost. + +The dimension of the "tile" space indicates the (maximal) number of loop +dimensions to tile. The elements of the single integer tuple +specify the tile sizes in each dimension. + +The dimension of the "grid" space indicates the (maximal) number of block +dimensions in the grid. The elements of the single integer tuple +specify the number of blocks in each dimension. + +The dimension of the "block" space indicates the (maximal) number of thread +dimensions in the grid. The elements of the single integer tuple +specify the number of threads in each dimension. + +For example, + + { kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 } + +specifies that in kernel 0, two loops should be tiled with a tile +size of 64 in both dimensions and that all kernels except kernel 4 +should be run using a block of 16 threads. + +Since PPCG performs some scheduling, it can be difficult to predict +what exactly will end up in a kernel. If you want to specify +tile, grid or block sizes, you may want to run PPCG first with the defaults, +examine the kernels and then run PPCG again with the desired sizes. +Instead of examining the kernels, you can also specify the option +--dump-sizes on the first run to obtain the effectively used default sizes. + + +Compiling the generated CUDA code with nvcc + +To get optimal performance from nvcc, it is important to choose --arch +according to your target GPU. Specifically, use the flag "--arch sm_20" +for fermi, "--arch sm_30" for GK10x Kepler and "--arch sm_35" for +GK110 Kepler. We discourage the use of older cards as we have seen +correctness issues with compilation for older architectures. +Note that in the absence of any --arch flag, nvcc defaults to +"--arch sm_13". This will not only be slower, but can also cause +correctness issues. +If you want to obtain results that are identical to those obtained +by the original code, then you may need to disable some optimizations +by passing the "--fmad=false" option. + + +Compiling the generated OpenCL code with gcc + +To compile the host code you need to link against the file +ocl_utilities.c which contains utility functions used by the generated +OpenCL host code. To compile the host code with gcc, run + + gcc -std=c99 file_host.c ocl_utilities.c -lOpenCL + +Note that we have experienced the generated OpenCL code freezing +on some inputs (e.g., the PolyBench symm benchmark) when using +at least some version of the Nvidia OpenCL library, while the +corresponding CUDA code runs fine. +We have experienced no such freezes when using AMD, ARM or Intel +OpenCL libraries. + +By default, the compiled executable will need the _kernel.cl file at +run time. Alternatively, the option --opencl-embed-kernel-code may be +given to place the kernel code in a string literal. The kernel code is +then compiled into the host binary, such that the _kernel.cl file is no +longer needed at run time. Any kernel include files, in particular +those supplied using --opencl-include-file, will still be required at +run time. + + +Function calls + +Function calls inside the analyzed fragment are reproduced +in the CUDA or OpenCL code, but for now it is left to the user +to make sure that the functions that are being called are +available from the generated kernels. + +In the case of OpenCL code, the --opencl-include-file option +may be used to specify one or more files to be #include'd +from the generated code. These files may then contain +the definitions of the functions being called from the +program fragment. If the pathnames of the included files +are relative to the current directory, then you may need +to additionally specify the --opencl-compiler-options=-I. +to make sure that the files can be found by the OpenCL compiler. +The included files may contain definitions of types used by the +generated kernels. By default, PPCG generates definitions for +types as needed, but these definitions may collide with those in +the included files, as PPCG does not consider the contents of the +included files. The --no-opencl-print-kernel-types will prevent +PPCG from generating type definitions. + + +Processing PolyBench + +When processing a PolyBench/C 3.2 benchmark, you should always specify +-DPOLYBENCH_USE_C99_PROTO on the ppcg command line. Otherwise, the source +files are inconsistent, having fixed size arrays but parametrically +bounded loops iterating over them. +However, you should not specify this define when compiling +the PPCG generated code using nvcc since CUDA does not support VLAs. + + +CUDA and function overloading + +While CUDA supports function overloading based on the arguments types, +no such function overloading exists in the input language C. Since PPCG +simply prints out the same function name as in the original code, this +may result in a different function being called based on the types +of the arguments. For example, if the original code contains a call +to the function sqrt() with a float argument, then the argument will +be promoted to a double and the sqrt() function will be called. +In the transformed (CUDA) code, however, overloading will cause the +function sqrtf() to be called. Until this issue has been resolved in PPCG, +we recommend that users either explicitly call the function sqrtf() or +explicitly cast the argument to double in the input code. + + +Contact + +For bug reports, feature requests and questions, +contact http://groups.google.com/group/isl-development + + +Citing PPCG + +If you use PPCG for your research, you are invited to cite +the following paper. + +@article{Verdoolaege2013PPCG, + author = {Verdoolaege, Sven and Juega, Juan Carlos and Cohen, Albert and + G\'{o}mez, Jos{\'e} Ignacio and Tenllado, Christian and + Catthoor, Francky}, + title = {Polyhedral parallel code generation for CUDA}, + journal = {ACM Trans. Archit. Code Optim.}, + issue_date = {January 2013}, + volume = {9}, + number = {4}, + month = jan, + year = {2013}, + issn = {1544-3566}, + pages = {54:1--54:23}, + doi = {10.1145/2400682.2400713}, + acmid = {2400713}, + publisher = {ACM}, + address = {New York, NY, USA}, +} |