diff options
58 files changed, 4022 insertions, 0 deletions
diff --git a/polly/www/content.css b/polly/www/content.css new file mode 100644 index 00000000000..29b2c9886d9 --- /dev/null +++ b/polly/www/content.css @@ -0,0 +1,33 @@ +html { margin: 0px; } body { margin: 8px; } + +html, body { + padding:0px; + font-size:small; font-family:"Lucida Grande", "Lucida Sans Unicode", Arial, Verdana, Helvetica, sans-serif; background-color: #fff; color: #222; + line-height:1.5; +} + +h1, h2, h3, tt { color: #000 } + +h1 { padding-top:0px; margin-top:0px;} +h2 { color:#333333; padding-top:0.5em; } +h3 { padding-top: 0.5em; margin-bottom: -0.25em; color:#2d58b7} +li { padding-bottom: 0.5em; } +ul { padding-left:1.5em; } + +PRE.code {padding-left: 0.5em; background-color: #eeeeee} +PRE {padding-left: 0.5em} + +/* Slides */ +IMG.img_slide { + display: block; + margin-left: auto; + margin-right: auto +} + +.itemTitle { color:#2d58b7 } + +span.error { color:red } +span.caret { color:green; font-weight:bold } + +/* Tables */ +tr { vertical-align:top } diff --git a/polly/www/contributors.html b/polly/www/contributors.html new file mode 100644 index 00000000000..9c92ab27f8d --- /dev/null +++ b/polly/www/contributors.html @@ -0,0 +1,55 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" + "http://www.w3.org/TR/html4/strict.dtd"> +<html> +<head> + <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> + <title>Polly - Contributors</title> + <link type="text/css" rel="stylesheet" href="menu.css" /> + <link type="text/css" rel="stylesheet" href="content.css" /> +</head> +<body> + +<!--#include virtual="menu.html.incl"--> + +<div id="content"> + +<h1>Polly: Contributors</h1> + +Polly is developed by a team of students supported by different universities. + +<h2>People</h2> +<h3>Raghesh Aloor</h3> +<p>Raghesh works on OpenMP code generation. He is funded as Google Summer of Code +Student 2011.</p> + +<h3>Tobias Grosser</h3> +<p>Tobias is one of the two Co-founders of Polly. He designed the overall +architecture and contributed to almost every part of Polly. He did his work +during his diploma studies at University of Passau. Furthermore, he spent 6 +months at Ohio State University where he was founded by the U.S. National +Science Foundation through awards 0811781 and 0926688.</p> + +<p>Website: <a href="http://www.grosser.es">www.grosser.es</a></p> + + +<h3>Andreas Simbuerger</h3> +<p> +Andreas works on the profiling infrastructure during his PhD at University of +Passau. +</p> +<p>Website: <a href="http://www.infosun.fim.uni-passau.de/cl/staff/simbuerger/"> +http://www.infosun.fim.uni-passau.de/cl/staff/simbuerger/</a></p> +<h3>Hongbin Zheng</h3> +<p>Hongbin Zheng is one of the two Co-founders of Polly. He was funded as a +Google Summer of Code Student 2010 and implemented parts of the Polly frontends +as well as the automake/cmake infrastructure.</p> + +<h2> Universities</h2> + +<p>Polly is supported by the following Universities.</p> +<img src="images/iit-madras.png" style="padding:1em" /> +<img src="images/uni-passau.png" style="padding: 1em; padding-bottom:2em;"/> +<img src="images/osu.png" style="padding:1em"/> +<img src="images/sys-uni.png" style="padding:1em"/> +</body> +</html> diff --git a/polly/www/examples.html b/polly/www/examples.html new file mode 100644 index 00000000000..706198b9a8a --- /dev/null +++ b/polly/www/examples.html @@ -0,0 +1,317 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" + "http://www.w3.org/TR/html4/strict.dtd"> +<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> +<html> +<head> + <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> + <title>Polly - Examples</title> + <link type="text/css" rel="stylesheet" href="menu.css"> + <link type="text/css" rel="stylesheet" href="content.css"> +</head> +<body> +<!--#include virtual="menu.html.incl"--> +<div id="content"> + <!--*********************************************************************--> + <h1>Polly: Examples</h1> + <!--*********************************************************************--> +<!--=====================================================================--> +<h2>Optimize Matrix Multiplication Manually</h2> +<!--=====================================================================--> + +<p>Polly does not yet focus on end user, but on research and the development of +new optimizations. Hence for the users of Polly it is often necessary to +understand how Polly works internally. To get an overview of the different steps +taken during polyhedral compilation, we give a step by step example on how to +use the different Polly passes. For this we optimize a simple matrix +multiplication kernel. In case you look for a more automated way of executing +Polly, check out the pollycc tool in utils/pollycc.</p> + +The files used and created in this example are available <a +href="experiments/matmul">here</a>. + +<ol> +<li><h4>Create LLVM-IR from the C code</h4> + +Polly works on LLVM-IR. Hence it is necessary to translate the source files into +LLVM-IR. If more than on file should be optimized the files can be combined into +a single file with llvm-link. + +<pre class="code">clang -S -emit-llvm matmul.c -o matmul.s</pre> +</li> + + +<li><h4>Load Polly automatically when calling the 'opt' tool</h4> + +Polly is not built into opt or bugpoint, but it is a shared library that needs +to be loaded into these tools explicitally. The Polly library is called +LVMPolly.so. For a cmake build it is available in the build/lib/ directory, +autoconf creates the same file in +build/tools/polly/{Release+Asserts|Asserts|Debug}/lib. For convenience we create +an alias that automatically loads Polly if 'opt' is called. +<pre class="code"> +export PATH_TO_POLLY_LIB="~/polly/build/lib/" +alias opt="opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.so"</pre> +</li> + +<li><h4>Prepare the LLVM-IR for Polly</h4> + +Polly is only able to work with code that matches a canonical form. To translate +the LLVM-IR into this form we use a set of canonicalication passes. For this +example only three passes are necessary. To get good coverage on a larger set +of input files a larger set is needed. pollycc contains a set of passes that has +shown to be beneficial. +<pre class="code">opt -S -mem2reg -loop-simplify -indvars matmul.s > matmul.preopt.ll</pre></li> + +<li><h4>Show the SCoPs detected by Polly (optional)</h4> + +To understand if Polly was able to detect some SCoPs, we print the +structure of the detected SCoPs. In our example two SCoPs were detected. One in +'init_array' the other in 'main'. + +<pre class="code">opt -basicaa -polly-cloog -analyze -q matmul.preopt.ll</pre> + +<pre> +init_array(): +for (c2=0;c2<=1023;c2++) { + for (c4=0;c4<=1023;c4++) { + Stmt_5(c2,c4); + } +} + +main(): +for (c2=0;c2<=1023;c2++) { + for (c4=0;c4<=1023;c4++) { + Stmt_4(c2,c4); + for (c6=0;c6<=1023;c6++) { + Stmt_6(c2,c4,c6); + } + } +} +</pre> +</li> +<li><h4>Highlight the detected SCoPs in the CFGs of the program (requires graphviz/dotty)</h4> + +Polly can use graphviz to graphically show a CFG in which the detected SCoPs are +highlighted. It can also create '.dot' files that can be translated by +the 'dot' utility into various graphic formats. + +<pre class="code">opt -basicaa -view-scops -disable-output matmul.preopt.ll +opt -basicaa -view-scops-only -disable-output matmul.preopt.ll</pre> +The output for the different functions<br /> +view-scops: +<a href="experiments/matmul/scops.main.dot.png">main</a>, +<a href="experiments/matmul/scops.init_array.dot.png">init_array</a>, +<a href="experiments/matmul/scops.print_array.dot.png">print_array</a><br /> +view-scops-only: +<a href="experiments/matmul/scopsonly.main.dot.png">main</a>, +<a href="experiments/matmul/scopsonly.init_array.dot.png">init_array</a>, +<a href="experiments/matmul/scopsonly.print_array.dot.png">print_array</a> +</li> + +<li><h4>View the polyhedral representation of the SCoPs</h4> +<pre class="code">opt -basicaa -polly-scops -analyze matmul.preopt.ll</pre> +<pre> +[...] +Printing analysis 'Polly - Create polyhedral description of Scops' for region: '%1 => %17' in function 'init_array': + Context: + { [] } + Statements { + Stmt_5 + Domain := + { Stmt_5[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }; + Scattering := + { Stmt_5[i0, i1] -> scattering[0, i0, 0, i1, 0] }; + WriteAccess := + { Stmt_5[i0, i1] -> MemRef_A[1037i0 + i1] }; + WriteAccess := + { Stmt_5[i0, i1] -> MemRef_B[1047i0 + i1] }; + FinalRead + Domain := + { FinalRead[0] }; + Scattering := + { FinalRead[i0] -> scattering[200000000, o1, o2, o3, o4] }; + ReadAccess := + { FinalRead[i0] -> MemRef_A[o0] }; + ReadAccess := + { FinalRead[i0] -> MemRef_B[o0] }; + } +Printing analysis 'Polly - Create polyhedral description of Scops' for region: '%0 => <Function Return>' in function 'init_array': +[...] +Printing analysis 'Polly - Create polyhedral description of Scops' for region: '%1 => %17' in function 'main': + Context: + { [] } + Statements { + Stmt_4 + Domain := + { Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }; + Scattering := + { Stmt_4[i0, i1] -> scattering[0, i0, 0, i1, 0, 0, 0] }; + WriteAccess := + { Stmt_4[i0, i1] -> MemRef_C[1067i0 + i1] }; + Stmt_6 + Domain := + { Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }; + Scattering := + { Stmt_6[i0, i1, i2] -> scattering[0, i0, 0, i1, 1, i2, 0] }; + ReadAccess := + { Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }; + ReadAccess := + { Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] }; + ReadAccess := + { Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] }; + WriteAccess := + { Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }; + FinalRead + Domain := + { FinalRead[0] }; + Scattering := + { FinalRead[i0] -> scattering[200000000, o1, o2, o3, o4, o5, o6] }; + ReadAccess := + { FinalRead[i0] -> MemRef_C[o0] }; + ReadAccess := + { FinalRead[i0] -> MemRef_A[o0] }; + ReadAccess := + { FinalRead[i0] -> MemRef_B[o0] }; + } +Printing analysis 'Polly - Create polyhedral description of Scops' for region: '%0 => <Function Return>' in function 'main': +Invalid Scop! +</pre> +</li> + +<li><h4>Show the dependences for the SCoPs</h4> +<pre class="code">opt -basicaa -polly-dependences -analyze matmul.preopt.ll</pre> +<pre>Printing analysis 'Polly - Calculate dependences for SCoP' for region: 'for.cond => for.end28' in function 'init_array': + Must dependences: + { } + May dependences: + { } + Must no source: + { } + May no source: + { } +Printing analysis 'Polly - Calculate dependences for SCoP' for region: 'for.cond => for.end48' in function 'main': + Must dependences: + { Stmt_4[i0, i1] -> Stmt_6[i0, i1, 0] : + i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023; + Stmt_6[i0, i1, i2] -> Stmt_6[i0, i1, 1 + i2] : + i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1022; + Stmt_6[i0, i1, 1023] -> FinalRead[0] : + i1 <= 1091540 - 1067i0 and i1 >= -1067i0 and i1 >= 0 and i1 <= 1023; + Stmt_6[1023, i1, 1023] -> FinalRead[0] : + i1 >= 0 and i1 <= 1023 + } + May dependences: + { } + Must no source: + { Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] : + i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023; + Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] : + i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023; + FinalRead[0] -> MemRef_A[o0]; + FinalRead[0] -> MemRef_B[o0] + FinalRead[0] -> MemRef_C[o0] : + o0 >= 1092565 or (exists (e0 = [(o0)/1067]: o0 <= 1091540 and o0 >= 0 + and 1067e0 <= -1024 + o0 and 1067e0 >= -1066 + o0)) or o0 <= -1; + } + May no source: + { } +</pre></li> + +<li><h4>Export jscop files</h4> + +Polly can export the polyhedral representation in so called jscop files. Jscop +files contain the polyhedral representation stored in a JSON file. +<pre class="code">opt -basicaa -polly-export-jscop matmul.preopt.ll</pre> +<pre>Writing SCoP 'for.cond => for.end28' in function 'init_array' to './init_array___%for.cond---%for.end28.jscop'. +Writing SCoP 'for.cond => for.end48' in function 'main' to './main___%for.cond---%for.end48.jscop'. +</pre></li> + +<li><h4>Import the changed jscop files and print the updated SCoP structure +(optional)</h4> +<p>Polly can import jscop files, where the schedules of the statements were +changed. With the help of these updated files we can import transformations into +Polly. It is possible to import different jscop files by providing the postfix +of the jscop file that is imported.</p> +<p> The optimized jscop files for this example are hand written. The schedule +used was inspired by looking at the optimizations PoCC performs. If PoCC is +installed Polly can often calculate such schedules fully automatically.</p> + + +<pre class="code">opt -basicaa -polly-import-jscop -polly-print -disable-output matmul.preopt.ll -polly-import-jscop-postfix=.opt</pre> +<pre>Cannot open file: ./init_array___%for.cond---%for.end28.jscop.opt +Skipping import. +In function: 'init_array' SCoP: for.cond => for.end28: +for (c2=0;c2<=1023;c2++) { + for (c4=0;c4<=1023;c4++) { + %for.body4(c2,c4); + } +} +Reading SCoP 'for.cond => for.end48' in function 'main' from './main___%for.cond---%for.end48.scop.opt.opt'. +In function: 'main' SCoP: for.cond => for.end48: +for (c2=0;c2<=1023;c2++) { + for (c4=0;c4<=1023;c4++) { + %for.body4(c2,c4); + } +} +for (c2=0;c2<=1023;c2++) { + for (c3=0;c3<=1023;c3++) { + for (c4=0;c4<=1023;c4++) { + %for.body12(c2,c4,c3); + } + } +} +</pre></li> + +<li><h4>Codegenerate the SCoPs</h4> +This generates new code for the SCoPs detected by polly. +If -polly-import is present, transformations specified in the imported openscop +files will be applied. +<pre class="code">opt -basicaa -polly-import -polly-import-postfix=.opt -polly-codegen matmul.preopt.ll | opt -O3 > matmul.pollyopt.ll</pre> +<pre> +Cannot open file: ./init_array___%for.cond---%for.end28.scop.opt +Skipping import. +Reading SCoP 'for.cond => for.end48' in function 'main' from './main___%for.cond---%for.end48.scop.opt'.</pre> + +<pre class="code">opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll</pre></li> + +<li><h4>Create the executables</h4> + +Create one executable optimized with plain -O3 as well as a set of executables +optimized in different ways with Polly. One changes only the loop structure, the +other adds tiling, the next adds vectorization and finally we use OpenMP +parallelism. +<pre class="code"> +llc matmul.normalopt.ll -o matmul.normalopt.s && \ + gcc matmul.normalopt.s -o matmul.normalopt.exe +llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && \ + gcc matmul.polly.interchanged.s -o matmul.polly.interchanged.exe +llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && \ + gcc matmul.polly.interchanged+tiled.s -o matmul.polly.interchanged+tiled.exe +llc matmul.polly.interchanged+tiled+vector.ll -o matmul.polly.interchanged+tiled+vector.s && \ + gcc matmul.polly.interchanged+tiled+vector.s -o matmul.polly.interchanged+tiled+vector.exe +llc matmul.polly.interchanged+tiled+vector+openmp.ll -o matmul.polly.interchanged+tiled+vector+openmp.s && \ + gcc -lgomp matmul.polly.interchanged+tiled+vector+openmp.s -o matmul.polly.interchanged+tiled+vector+openmp.exe + </pre> + +<li><h4>Compare the runtime of the executables</h4> + +By comparing the runtimes of the different code snippets we see that a simple +loop interchange gives here the largest performance boost. However by adding +vectorization and by using OpenMP we can further improve the performance +significantly. +<pre class="code">time ./matmul.normalopt.exe</pre> +<pre>42.68 real, 42.55 user, 0.00 sys</pre> +<pre class="code">time ./matmul.polly.interchanged.exe</pre> +<pre>04.33 real, 4.30 user, 0.01 sys</pre> +<pre class="code">time ./matmul.polly.interchanged+tiled.exe</pre> +<pre>04.11 real, 4.10 user, 0.00 sys</pre> +<pre class="code">time ./matmul.polly.interchanged+tiled+vector.exe</pre> +<pre>01.39 real, 1.36 user, 0.01 sys</pre> +<pre class="code">time ./matmul.polly.interchanged+tiled+vector+openmp.exe</pre> +<pre>00.66 real, 2.58 user, 0.02 sys</pre> +</li> +</ol> + +</div> +</body> +</html> diff --git a/polly/www/experiments/matmul/init_array___%1---%19.jscop b/polly/www/experiments/matmul/init_array___%1---%19.jscop new file mode 100644 index 00000000000..c7f9bb8c87a --- /dev/null +++ b/polly/www/experiments/matmul/init_array___%1---%19.jscop @@ -0,0 +1,21 @@ +{ + "context" : "{ [] }", + "name" : "%1 => %19", + "statements" : [ + { + "accesses" : [ + { + "kind" : "write", + "relation" : "{ Stmt_5[i0, i1] -> MemRef_A[1536i0 + i1] }" + }, + { + "kind" : "write", + "relation" : "{ Stmt_5[i0, i1] -> MemRef_B[1536i0 + i1] }" + } + ], + "domain" : "{ Stmt_5[i0, i1] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 }", + "name" : "Stmt_5", + "schedule" : "{ Stmt_5[i0, i1] -> scattering[0, i0, 0, i1, 0] }" + } + ] +} diff --git a/polly/www/experiments/matmul/main___%1---%17.jscop b/polly/www/experiments/matmul/main___%1---%17.jscop new file mode 100644 index 00000000000..a3692e52829 --- /dev/null +++ b/polly/www/experiments/matmul/main___%1---%17.jscop @@ -0,0 +1,40 @@ +{ + "context" : "{ [] }", + "name" : "%1 => %17", + "statements" : [ + { + "accesses" : [ + { + "kind" : "write", + "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1536i0 + i1] }" + } + ], + "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 }", + "name" : "Stmt_4", + "schedule" : "{ Stmt_4[i0, i1] -> scattering[0, i0, 0, i1, 0, 0, 0] }" + }, + { + "accesses" : [ + { + "kind" : "read", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1536i0 + i2] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }" + }, + { + "kind" : "write", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" + } + ], + "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 and i2 >= 0 and i2 <= 1535 }", + "name" : "Stmt_6", + "schedule" : "{ Stmt_6[i0, i1, i2] -> scattering[0, i0, 0, i1, 1, i2, 0] }" + } + ] +} diff --git a/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged b/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged new file mode 100644 index 00000000000..d992fe949aa --- /dev/null +++ b/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged @@ -0,0 +1,40 @@ +{ + "context" : "{ [] }", + "name" : "%1 => %17", + "statements" : [ + { + "accesses" : [ + { + "kind" : "write", + "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1067i0 + i1] }" + } + ], + "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }", + "name" : "Stmt_4", + "schedule" : "{ Stmt_4[i0, i1] -> scattering[0, i0, 0, i1, 0, 0, 0] }" + }, + { + "accesses" : [ + { + "kind" : "read", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] }" + }, + { + "kind" : "write", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }" + } + ], + "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }", + "name" : "Stmt_6", + "schedule" : "{ Stmt_6[i0, i1, i2] -> scattering[1, i0, 0, i2, 0, i1, 0] }" + } + ] +} diff --git a/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged+tiled b/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged+tiled new file mode 100644 index 00000000000..29fcca55f3f --- /dev/null +++ b/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged+tiled @@ -0,0 +1,40 @@ +{ + "context" : "{ [] }", + "name" : "%1 => %17", + "statements" : [ + { + "accesses" : [ + { + "kind" : "write", + "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1067i0 + i1] }" + } + ], + "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }", + "name" : "Stmt_4", + "schedule" : "{ Stmt_4[i0, i1] -> scattering[0, i0, 0, i1, 0, 0, 0] }" + }, + { + "accesses" : [ + { + "kind" : "read", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] }" + }, + { + "kind" : "write", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }" + } + ], + "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }", + "name" : "Stmt_6", + "schedule" : "{ Stmt_6[i0, i1, i2] -> scattering[1, o0, o1, o2, i0, i2, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 }" + } + ] +} diff --git a/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged+tiled+vector b/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged+tiled+vector new file mode 100644 index 00000000000..62a444d0b8a --- /dev/null +++ b/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged+tiled+vector @@ -0,0 +1,40 @@ +{ + "context" : "{ [] }", + "name" : "%1 => %17", + "statements" : [ + { + "accesses" : [ + { + "kind" : "write", + "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1067i0 + i1] }" + } + ], + "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }", + "name" : "Stmt_4", + "schedule" : "{ Stmt_4[i0, i1] -> scattering[0, i0, 0, i1, 0, 0, 0, 0] }" + }, + { + "accesses" : [ + { + "kind" : "read", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] }" + }, + { + "kind" : "read", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] }" + }, + { + "kind" : "write", + "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }" + } + ], + "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }", + "name" : "Stmt_6", + "schedule" : "{ Stmt_6[i0, i1, i2] -> scattering[1, o0, o1, o2, i0, i2, ii1, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 and ii1 % 4 = 0 and ii1 <= i1 < ii1 + 4}" + } + ] +} diff --git a/polly/www/experiments/matmul/matmul.c b/polly/www/experiments/matmul/matmul.c new file mode 100644 index 00000000000..edb2455ae8f --- /dev/null +++ b/polly/www/experiments/matmul/matmul.c @@ -0,0 +1,52 @@ +#include <stdio.h> + +#define N 1536 +float A[N][N]; +float B[N][N]; +float C[N][N]; + +void init_array() +{ + int i, j; + + for (i=0; i<N; i++) { + for (j=0; j<N; j++) { + A[i][j] = (1+(i*j)%1024)/2.0; + B[i][j] = (1+(i*j)%1024)/2.0; + } + } +} + +void print_array() +{ + int i, j; + + for (i=0; i<N; i++) { + for (j=0; j<N; j++) { + fprintf(stdout, "%lf ", C[i][j]); + if (j%80 == 79) fprintf(stdout, "\n"); + } + fprintf(stdout, "\n"); + } +} + +int main() +{ + int i, j, k; + double t_start, t_end; + + init_array(); + + for(i=0; i<N; i++) { + for(j=0; j<N; j++) { + C[i][j] = 0; + for(k=0; k<N; k++) + C[i][j] = C[i][j] + A[i][k] * B[k][j]; + } + } + +#ifdef TEST + print_array(); +#endif + return 0; +} diff --git a/polly/www/experiments/matmul/matmul.normalopt.exe b/polly/www/experiments/matmul/matmul.normalopt.exe Binary files differnew file mode 100755 index 00000000000..73b94752d8e --- /dev/null +++ b/polly/www/experiments/matmul/matmul.normalopt.exe diff --git a/polly/www/experiments/matmul/matmul.normalopt.ll b/polly/www/experiments/matmul/matmul.normalopt.ll Binary files differnew file mode 100644 index 00000000000..182ed9aa221 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.normalopt.ll diff --git a/polly/www/experiments/matmul/matmul.normalopt.s b/polly/www/experiments/matmul/matmul.normalopt.s new file mode 100644 index 00000000000..f10f6441182 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.normalopt.s @@ -0,0 +1,203 @@ + .file "matmul.normalopt.ll" + .section .rodata.cst8,"aM",@progbits,8 + .align 8 +.LCPI0_0: + .quad 4602678819172646912 # double 5.000000e-01 + .text + .globl init_array + .align 16, 0x90 + .type init_array,@function +init_array: # @init_array +# BB#0: + xorl %eax, %eax + movsd .LCPI0_0(%rip), %xmm0 + movq %rax, %rcx + .align 16, 0x90 +.LBB0_1: # %.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB0_2 Depth 2 + movq $-1536, %rdx # imm = 0xFFFFFFFFFFFFFA00 + xorl %esi, %esi + .align 16, 0x90 +.LBB0_2: # Parent Loop BB0_1 Depth=1 + # => This Inner Loop Header: Depth=2 + movl %esi, %edi + sarl $31, %edi + shrl $22, %edi + addl %esi, %edi + andl $-1024, %edi # imm = 0xFFFFFFFFFFFFFC00 + negl %edi + leal 1(%rsi,%rdi), %edi + cvtsi2sd %edi, %xmm1 + mulsd %xmm0, %xmm1 + cvtsd2ss %xmm1, %xmm1 + movss %xmm1, A+6144(%rax,%rdx,4) + movss %xmm1, B+6144(%rax,%rdx,4) + addl %ecx, %esi + incq %rdx + jne .LBB0_2 +# BB#3: # in Loop: Header=BB0_1 Depth=1 + addq $6144, %rax # imm = 0x1800 + incq %rcx + cmpq $1536, %rcx # imm = 0x600 + jne .LBB0_1 +# BB#4: + ret +.Ltmp0: + .size init_array, .Ltmp0-init_array + + .globl print_array + .align 16, 0x90 + .type print_array,@function +print_array: # @print_array +# BB#0: + pushq %r14 + pushq %rbx + pushq %rax + movq $-9437184, %rbx # imm = 0xFFFFFFFFFF700000 + .align 16, 0x90 +.LBB1_1: # %.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB1_2 Depth 2 + xorl %r14d, %r14d + movq stdout(%rip), %rdi + .align 16, 0x90 +.LBB1_2: # Parent Loop BB1_1 Depth=1 + # => This Inner Loop Header: Depth=2 + movss C+9437184(%rbx,%r14,4), %xmm0 + cvtss2sd %xmm0, %xmm0 + movl $.L.str, %esi + movb $1, %al + callq fprintf + movslq %r14d, %rax + imulq $1717986919, %rax, %rcx # imm = 0x66666667 + movq %rcx, %rdx + shrq $63, %rdx + sarq $37, %rcx + addl %edx, %ecx + imull $80, %ecx, %ecx + subl %ecx, %eax + cmpl $79, %eax + jne .LBB1_4 +# BB#3: # in Loop: Header=BB1_2 Depth=2 + movq stdout(%rip), %rsi + movl $10, %edi + callq fputc +.LBB1_4: # in Loop: Header=BB1_2 Depth=2 + incq %r14 + movq stdout(%rip), %rsi + cmpq $1536, %r14 # imm = 0x600 + movq %rsi, %rdi + jne .LBB1_2 +# BB#5: # in Loop: Header=BB1_1 Depth=1 + movl $10, %edi + callq fputc + addq $6144, %rbx # imm = 0x1800 + jne .LBB1_1 +# BB#6: + addq $8, %rsp + popq %rbx + popq %r14 + ret +.Ltmp1: + .size print_array, .Ltmp1-print_array + + .section .rodata.cst8,"aM",@progbits,8 + .align 8 +.LCPI2_0: + .quad 4602678819172646912 # double 5.000000e-01 + .text + .globl main + .align 16, 0x90 + .type main,@function +main: # @main +# BB#0: + xorl %eax, %eax + movsd .LCPI2_0(%rip), %xmm0 + movq %rax, %rcx + .align 16, 0x90 +.LBB2_1: # %.preheader.i + # =>This Loop Header: Depth=1 + # Child Loop BB2_2 Depth 2 + movq $-1536, %rdx # imm = 0xFFFFFFFFFFFFFA00 + xorl %esi, %esi + .align 16, 0x90 +.LBB2_2: # Parent Loop BB2_1 Depth=1 + # => This Inner Loop Header: Depth=2 + movl %esi, %edi + sarl $31, %edi + shrl $22, %edi + addl %esi, %edi + andl $-1024, %edi # imm = 0xFFFFFFFFFFFFFC00 + negl %edi + leal 1(%rsi,%rdi), %edi + cvtsi2sd %edi, %xmm1 + mulsd %xmm0, %xmm1 + cvtsd2ss %xmm1, %xmm1 + movss %xmm1, A+6144(%rax,%rdx,4) + movss %xmm1, B+6144(%rax,%rdx,4) + addl %ecx, %esi + incq %rdx + jne .LBB2_2 +# BB#3: # in Loop: Header=BB2_1 Depth=1 + addq $6144, %rax # imm = 0x1800 + incq %rcx + xorl %edx, %edx + cmpq $1536, %rcx # imm = 0x600 + jne .LBB2_1 + .align 16, 0x90 +.LBB2_4: # %.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB2_5 Depth 2 + # Child Loop BB2_6 Depth 3 + xorl %eax, %eax + xorl %ecx, %ecx + .align 16, 0x90 +.LBB2_5: # Parent Loop BB2_4 Depth=1 + # => This Loop Header: Depth=2 + # Child Loop BB2_6 Depth 3 + movl $0, C(%rcx,%rdx) + leaq B(%rcx), %rsi + pxor %xmm0, %xmm0 + movq %rax, %rdi + .align 16, 0x90 +.LBB2_6: # Parent Loop BB2_4 Depth=1 + # Parent Loop BB2_5 Depth=2 + # => This Inner Loop Header: Depth=3 + movss A(%rdx,%rdi,4), %xmm1 + mulss (%rsi), %xmm1 + addss %xmm1, %xmm0 + addq $6144, %rsi # imm = 0x1800 + incq %rdi + cmpq $1536, %rdi # imm = 0x600 + jne .LBB2_6 +# BB#7: # in Loop: Header=BB2_5 Depth=2 + movss %xmm0, C(%rcx,%rdx) + addq $4, %rcx + cmpq $6144, %rcx # imm = 0x1800 + jne .LBB2_5 +# BB#8: # %init_array.exit + # in Loop: Header=BB2_4 Depth=1 + addq $6144, %rdx # imm = 0x1800 + cmpq $9437184, %rdx # imm = 0x900000 + jne .LBB2_4 +# BB#9: + xorl %eax, %eax + ret +.Ltmp2: + .size main, .Ltmp2-main + + .type A,@object # @A + .comm A,9437184,16 + .type B,@object # @B + .comm B,9437184,16 + .type .L.str,@object # @.str + .section .rodata.str1.1,"aMS",@progbits,1 +.L.str: + .asciz "%lf " + .size .L.str, 5 + + .type C,@object # @C + .comm C,9437184,16 + + .section ".note.GNU-stack","",@progbits diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.exe b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.exe Binary files differnew file mode 100755 index 00000000000..7a2e6de6138 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.exe diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.ll b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.ll Binary files differnew file mode 100644 index 00000000000..710f706f68e --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.ll diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.s b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.s new file mode 100644 index 00000000000..04dc0656c06 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.s @@ -0,0 +1,628 @@ + .file "matmul.polly.interchanged+tiled+vector+openmp.ll" + .text + .globl init_array + .align 16, 0x90 + .type init_array,@function +init_array: # @init_array +# BB#0: # %pollyBB + pushq %rbx + subq $16, %rsp + movq $A, (%rsp) + movq $B, 8(%rsp) + movl $init_array.omp_subfn, %edi + leaq (%rsp), %rbx + xorl %edx, %edx + xorl %ecx, %ecx + movl $1536, %r8d # imm = 0x600 + movl $1, %r9d + movq %rbx, %rsi + callq GOMP_parallel_loop_runtime_start + movq %rbx, %rdi + callq init_array.omp_subfn + callq GOMP_parallel_end + addq $16, %rsp + popq %rbx + ret +.Ltmp0: + .size init_array, .Ltmp0-init_array + + .globl print_array + .align 16, 0x90 + .type print_array,@function +print_array: # @print_array +# BB#0: + pushq %r14 + pushq %rbx + pushq %rax + movq $-9437184, %rbx # imm = 0xFFFFFFFFFF700000 + .align 16, 0x90 +.LBB1_1: # %.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB1_2 Depth 2 + xorl %r14d, %r14d + movq stdout(%rip), %rdi + .align 16, 0x90 +.LBB1_2: # Parent Loop BB1_1 Depth=1 + # => This Inner Loop Header: Depth=2 + movss C+9437184(%rbx,%r14,4), %xmm0 + cvtss2sd %xmm0, %xmm0 + movl $.L.str, %esi + movb $1, %al + callq fprintf + movslq %r14d, %rax + imulq $1717986919, %rax, %rcx # imm = 0x66666667 + movq %rcx, %rdx + shrq $63, %rdx + sarq $37, %rcx + addl %edx, %ecx + imull $80, %ecx, %ecx + subl %ecx, %eax + cmpl $79, %eax + jne .LBB1_4 +# BB#3: # in Loop: Header=BB1_2 Depth=2 + movq stdout(%rip), %rsi + movl $10, %edi + callq fputc +.LBB1_4: # in Loop: Header=BB1_2 Depth=2 + incq %r14 + movq stdout(%rip), %rsi + cmpq $1536, %r14 # imm = 0x600 + movq %rsi, %rdi + jne .LBB1_2 +# BB#5: # in Loop: Header=BB1_1 Depth=1 + movl $10, %edi + callq fputc + addq $6144, %rbx # imm = 0x1800 + jne .LBB1_1 +# BB#6: + addq $8, %rsp + popq %rbx + popq %r14 + ret +.Ltmp1: + .size print_array, .Ltmp1-print_array + + .globl main + .align 16, 0x90 + .type main,@function +main: # @main +# BB#0: # %pollyBB + pushq %rbp + movq %rsp, %rbp + pushq %r15 + pushq %r14 + pushq %r13 + pushq %r12 + pushq %rbx + subq $56, %rsp + movq $A, -72(%rbp) + movq $B, -64(%rbp) + movl $init_array.omp_subfn, %edi + leaq -72(%rbp), %rbx + movq %rbx, %rsi + xorl %edx, %edx + xorl %ecx, %ecx + movl $1536, %r8d # imm = 0x600 + movl $1, %r9d + callq GOMP_parallel_loop_runtime_start + movq %rbx, %rdi + callq init_array.omp_subfn + callq GOMP_parallel_end + movl $main.omp_subfn, %edi + leaq -96(%rbp), %rsi + movq $C, -96(%rbp) + movq $A, -88(%rbp) + movq $B, -80(%rbp) + xorl %edx, %edx + xorl %ecx, %ecx + movl $1536, %r8d # imm = 0x600 + movl $1, %r9d + callq GOMP_parallel_loop_runtime_start + leaq -48(%rbp), %rdi + leaq -56(%rbp), %rsi + callq GOMP_loop_runtime_next + testb $1, %al + je .LBB2_6 +# BB#1: + leaq -48(%rbp), %rbx + leaq -56(%rbp), %r14 + .align 16, 0x90 +.LBB2_3: # %omp.loadIVBounds.i + # =>This Loop Header: Depth=1 + # Child Loop BB2_5 Depth 2 + movq -56(%rbp), %r15 + decq %r15 + movq -48(%rbp), %r12 + cmpq %r15, %r12 + jg .LBB2_2 +# BB#4: # %polly.loop_header2.preheader.lr.ph.i + # in Loop: Header=BB2_3 Depth=1 + leaq (%r12,%r12,2), %rax + shlq $11, %rax + leaq C(%rax), %r13 + .align 16, 0x90 +.LBB2_5: # %polly.loop_header2.preheader.i + # Parent Loop BB2_3 Depth=1 + # => This Inner Loop Header: Depth=2 + movq %r13, %rdi + xorl %esi, %esi + movl $6144, %edx # imm = 0x1800 + callq memset + addq $6144, %r13 # imm = 0x1800 + incq %r12 + cmpq %r15, %r12 + jle .LBB2_5 +.LBB2_2: # %omp.checkNext.loopexit.i + # in Loop: Header=BB2_3 Depth=1 + movq %rbx, %rdi + movq %r14, %rsi + callq GOMP_loop_runtime_next + testb $1, %al + jne .LBB2_3 +.LBB2_6: # %main.omp_subfn.exit + callq GOMP_loop_end_nowait + callq GOMP_parallel_end + movq %rsp, %rax + leaq -32(%rax), %rbx + movl $main.omp_subfn1, %edi + xorl %ecx, %ecx + movl $1536, %r8d # imm = 0x600 + movl $64, %r9d + movq %rbx, %rsp + movq $C, -32(%rax) + movq $A, -24(%rax) + movq $B, -16(%rax) + movq %rbx, %rsi + xorl %edx, %edx + callq GOMP_parallel_loop_runtime_start + movq %rbx, %rdi + callq main.omp_subfn1 + callq GOMP_parallel_end + xorl %eax, %eax + leaq -40(%rbp), %rsp + popq %rbx + popq %r12 + popq %r13 + popq %r14 + popq %r15 + popq %rbp + ret +.Ltmp2: + .size main, .Ltmp2-main + + .section .rodata.cst8,"aM",@progbits,8 + .align 8 +.LCPI3_0: + .quad 4602678819172646912 # double 5.000000e-01 + .text + .align 16, 0x90 + .type init_array.omp_subfn,@function +init_array.omp_subfn: # @init_array.omp_subfn +.Leh_func_begin3: +.Ltmp6: + .cfi_startproc +# BB#0: # %omp.setup + pushq %r14 +.Ltmp7: + .cfi_def_cfa_offset 16 + pushq %rbx +.Ltmp8: + .cfi_def_cfa_offset 24 + subq $24, %rsp +.Ltmp9: + .cfi_def_cfa_offset 48 +.Ltmp10: + .cfi_offset 3, -24 +.Ltmp11: + .cfi_offset 14, -16 + leaq 16(%rsp), %rdi + leaq 8(%rsp), %rsi + callq GOMP_loop_runtime_next + testb $1, %al + je .LBB3_2 +# BB#1: + leaq 16(%rsp), %rbx + leaq 8(%rsp), %r14 + jmp .LBB3_4 +.LBB3_2: # %omp.exit + callq GOMP_loop_end_nowait + addq $24, %rsp + popq %rbx + popq %r14 + ret + .align 16, 0x90 +.LBB3_3: # %omp.checkNext.loopexit + # in Loop: Header=BB3_4 Depth=1 + movq %rbx, %rdi + movq %r14, %rsi + callq GOMP_loop_runtime_next + testb $1, %al + je .LBB3_2 +.LBB3_4: # %omp.loadIVBounds + # =>This Loop Header: Depth=1 + # Child Loop BB3_7 Depth 2 + # Child Loop BB3_8 Depth 3 + movq 8(%rsp), %rax + decq %rax + movq 16(%rsp), %rcx + cmpq %rax, %rcx + jg .LBB3_3 +# BB#5: # %polly.loop_header2.preheader.lr.ph + # in Loop: Header=BB3_4 Depth=1 + movq %rcx, %rdx + shlq $11, %rdx + leaq (%rdx,%rdx,2), %rdx + jmp .LBB3_7 + .align 16, 0x90 +.LBB3_6: # %polly.loop_header.loopexit + # in Loop: Header=BB3_7 Depth=2 + addq $6144, %rdx # imm = 0x1800 + incq %rcx + cmpq %rax, %rcx + jg .LBB3_3 +.LBB3_7: # %polly.loop_header2.preheader + # Parent Loop BB3_4 Depth=1 + # => This Loop Header: Depth=2 + # Child Loop BB3_8 Depth 3 + movq $-1536, %rsi # imm = 0xFFFFFFFFFFFFFA00 + xorl %edi, %edi + .align 16, 0x90 +.LBB3_8: # %polly.loop_body3 + # Parent Loop BB3_4 Depth=1 + # Parent Loop BB3_7 Depth=2 + # => This Inner Loop Header: Depth=3 + movl %edi, %r8d + sarl $31, %r8d + shrl $22, %r8d + addl %edi, %r8d + andl $-1024, %r8d # imm = 0xFFFFFFFFFFFFFC00 + negl %r8d + leal 1(%rdi,%r8), %r8d + cvtsi2sd %r8d, %xmm0 + mulsd .LCPI3_0(%rip), %xmm0 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, A+6144(%rdx,%rsi,4) + movss %xmm0, B+6144(%rdx,%rsi,4) + addl %ecx, %edi + incq %rsi + jne .LBB3_8 + jmp .LBB3_6 +.Ltmp12: + .size init_array.omp_subfn, .Ltmp12-init_array.omp_subfn +.Ltmp13: + .cfi_endproc +.Leh_func_end3: + + .align 16, 0x90 + .type main.omp_subfn,@function +main.omp_subfn: # @main.omp_subfn +.Leh_func_begin4: +.Ltmp20: + .cfi_startproc +# BB#0: # %omp.setup + pushq %r15 +.Ltmp21: + .cfi_def_cfa_offset 16 + pushq %r14 +.Ltmp22: + .cfi_def_cfa_offset 24 + pushq %r13 +.Ltmp23: + .cfi_def_cfa_offset 32 + pushq %r12 +.Ltmp24: + .cfi_def_cfa_offset 40 + pushq %rbx +.Ltmp25: + .cfi_def_cfa_offset 48 + subq $16, %rsp +.Ltmp26: + .cfi_def_cfa_offset 64 +.Ltmp27: + .cfi_offset 3, -48 +.Ltmp28: + .cfi_offset 12, -40 +.Ltmp29: + .cfi_offset 13, -32 +.Ltmp30: + .cfi_offset 14, -24 +.Ltmp31: + .cfi_offset 15, -16 + leaq 8(%rsp), %rdi + leaq (%rsp), %rsi + callq GOMP_loop_runtime_next + testb $1, %al + je .LBB4_2 +# BB#1: + leaq 8(%rsp), %rbx + leaq (%rsp), %r14 + jmp .LBB4_4 +.LBB4_2: # %omp.exit + callq GOMP_loop_end_nowait + addq $16, %rsp + popq %rbx + popq %r12 + popq %r13 + popq %r14 + popq %r15 + ret + .align 16, 0x90 +.LBB4_3: # %omp.checkNext.loopexit + # in Loop: Header=BB4_4 Depth=1 + movq %rbx, %rdi + movq %r14, %rsi + callq GOMP_loop_runtime_next + testb $1, %al + je .LBB4_2 +.LBB4_4: # %omp.loadIVBounds + # =>This Loop Header: Depth=1 + # Child Loop BB4_6 Depth 2 + movq (%rsp), %r15 + decq %r15 + movq 8(%rsp), %r12 + cmpq %r15, %r12 + jg .LBB4_3 +# BB#5: # %polly.loop_header2.preheader.lr.ph + # in Loop: Header=BB4_4 Depth=1 + leaq (%r12,%r12,2), %rax + shlq $11, %rax + leaq C(%rax), %r13 + .align 16, 0x90 +.LBB4_6: # %polly.loop_header2.preheader + # Parent Loop BB4_4 Depth=1 + # => This Inner Loop Header: Depth=2 + movq %r13, %rdi + xorl %esi, %esi + movl $6144, %edx # imm = 0x1800 + callq memset + addq $6144, %r13 # imm = 0x1800 + incq %r12 + cmpq %r15, %r12 + jle .LBB4_6 + jmp .LBB4_3 +.Ltmp32: + .size main.omp_subfn, .Ltmp32-main.omp_subfn +.Ltmp33: + .cfi_endproc +.Leh_func_end4: + + .align 16, 0x90 + .type main.omp_subfn1,@function +main.omp_subfn1: # @main.omp_subfn1 +.Leh_func_begin5: +.Ltmp41: + .cfi_startproc +# BB#0: # %omp.setup + pushq %rbp +.Ltmp42: + .cfi_def_cfa_offset 16 + pushq %r15 +.Ltmp43: + .cfi_def_cfa_offset 24 + pushq %r14 +.Ltmp44: + .cfi_def_cfa_offset 32 + pushq %r13 +.Ltmp45: + .cfi_def_cfa_offset 40 + pushq %r12 +.Ltmp46: + .cfi_def_cfa_offset 48 + pushq %rbx +.Ltmp47: + .cfi_def_cfa_offset 56 + subq $40, %rsp +.Ltmp48: + .cfi_def_cfa_offset 96 +.Ltmp49: + .cfi_offset 3, -56 +.Ltmp50: + .cfi_offset 12, -48 +.Ltmp51: + .cfi_offset 13, -40 +.Ltmp52: + .cfi_offset 14, -32 +.Ltmp53: + .cfi_offset 15, -24 +.Ltmp54: + .cfi_offset 6, -16 + leaq 32(%rsp), %rdi + leaq 24(%rsp), %rsi + jmp .LBB5_1 + .align 16, 0x90 +.LBB5_4: # %omp.loadIVBounds + # in Loop: Header=BB5_1 Depth=1 + movq 24(%rsp), %rax + decq %rax + movq %rax, (%rsp) # 8-byte Spill + movq 32(%rsp), %rcx + cmpq %rax, %rcx + jg .LBB5_3 +# BB#5: # %polly.loop_header2.preheader.lr.ph + # in Loop: Header=BB5_1 Depth=1 + leaq (%rcx,%rcx,2), %rax + movq %rcx, %rdx + shlq $9, %rdx + leaq (%rdx,%rdx,2), %rdx + movq %rdx, 16(%rsp) # 8-byte Spill + shlq $11, %rax + leaq A(%rax), %rax + movq %rax, 8(%rsp) # 8-byte Spill + jmp .LBB5_7 + .align 16, 0x90 +.LBB5_6: # %polly.loop_header.loopexit + # in Loop: Header=BB5_7 Depth=2 + addq $98304, 16(%rsp) # 8-byte Folded Spill + # imm = 0x18000 + addq $393216, 8(%rsp) # 8-byte Folded Spill + # imm = 0x60000 + addq $64, %rcx + cmpq (%rsp), %rcx # 8-byte Folded Reload + jg .LBB5_3 +.LBB5_7: # %polly.loop_header2.preheader + # Parent Loop BB5_1 Depth=1 + # => This Loop Header: Depth=2 + # Child Loop BB5_9 Depth 3 + # Child Loop BB5_11 Depth 4 + # Child Loop BB5_14 Depth 5 + # Child Loop BB5_18 Depth 6 + # Child Loop BB5_19 Depth 7 + leaq 63(%rcx), %rax + xorl %edx, %edx + jmp .LBB5_9 + .align 16, 0x90 +.LBB5_8: # %polly.loop_header2.loopexit + # in Loop: Header=BB5_9 Depth=3 + addq $64, %rdx + cmpq $1536, %rdx # imm = 0x600 + je .LBB5_6 +.LBB5_9: # %polly.loop_header7.preheader + # Parent Loop BB5_1 Depth=1 + # Parent Loop BB5_7 Depth=2 + # => This Loop Header: Depth=3 + # Child Loop BB5_11 Depth 4 + # Child Loop BB5_14 Depth 5 + # Child Loop BB5_18 Depth 6 + # Child Loop BB5_19 Depth 7 + movq 16(%rsp), %rsi # 8-byte Reload + leaq (%rsi,%rdx), %rsi + leaq 63(%rdx), %rdi + xorl %r8d, %r8d + movq 8(%rsp), %r9 # 8-byte Reload + movq %rdx, %r10 + jmp .LBB5_11 + .align 16, 0x90 +.LBB5_10: # %polly.loop_header7.loopexit + # in Loop: Header=BB5_11 Depth=4 + addq $256, %r9 # imm = 0x100 + addq $98304, %r10 # imm = 0x18000 + addq $64, %r8 + cmpq $1536, %r8 # imm = 0x600 + je .LBB5_8 +.LBB5_11: # %polly.loop_body8 + # Parent Loop BB5_1 Depth=1 + # Parent Loop BB5_7 Depth=2 + # Parent Loop BB5_9 Depth=3 + # => This Loop Header: Depth=4 + # Child Loop BB5_14 Depth 5 + # Child Loop BB5_18 Depth 6 + # Child Loop BB5_19 Depth 7 + movabsq $9223372036854775744, %r11 # imm = 0x7FFFFFFFFFFFFFC0 + cmpq %r11, %rcx + jg .LBB5_10 +# BB#12: # %polly.loop_body13.lr.ph + # in Loop: Header=BB5_11 Depth=4 + leaq 63(%r8), %r11 + movq %rcx, %rbx + movq %rsi, %r14 + movq %r9, %r15 + jmp .LBB5_14 + .align 16, 0x90 +.LBB5_13: # %polly.loop_header12.loopexit + # in Loop: Header=BB5_14 Depth=5 + addq $1536, %r14 # imm = 0x600 + addq $6144, %r15 # imm = 0x1800 + incq %rbx + cmpq %rax, %rbx + jg .LBB5_10 +.LBB5_14: # %polly.loop_body13 + # Parent Loop BB5_1 Depth=1 + # Parent Loop BB5_7 Depth=2 + # Parent Loop BB5_9 Depth=3 + # Parent Loop BB5_11 Depth=4 + # => This Loop Header: Depth=5 + # Child Loop BB5_18 Depth 6 + # Child Loop BB5_19 Depth 7 + cmpq %r11, %r8 + jg .LBB5_13 +# BB#15: # %polly.loop_body13 + # in Loop: Header=BB5_14 Depth=5 + cmpq %rdi, %rdx + jg .LBB5_13 +# BB#16: # %polly.loop_body23.lr.ph.preheader + # in Loop: Header=BB5_14 Depth=5 + xorl %r12d, %r12d + movq %r10, %r13 + jmp .LBB5_18 + .align 16, 0x90 +.LBB5_17: # %polly.loop_header17.loopexit + # in Loop: Header=BB5_18 Depth=6 + addq $1536, %r13 # imm = 0x600 + incq %r12 + cmpq $64, %r12 + je .LBB5_13 +.LBB5_18: # %polly.loop_body23.lr.ph + # Parent Loop BB5_1 Depth=1 + # Parent Loop BB5_7 Depth=2 + # Parent Loop BB5_9 Depth=3 + # Parent Loop BB5_11 Depth=4 + # Parent Loop BB5_14 Depth=5 + # => This Loop Header: Depth=6 + # Child Loop BB5_19 Depth 7 + movss (%r15,%r12,4), %xmm0 + pshufd $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0] + xorl %ebp, %ebp + .align 16, 0x90 +.LBB5_19: # %polly.loop_body23 + # Parent Loop BB5_1 Depth=1 + # Parent Loop BB5_7 Depth=2 + # Parent Loop BB5_9 Depth=3 + # Parent Loop BB5_11 Depth=4 + # Parent Loop BB5_14 Depth=5 + # Parent Loop BB5_18 Depth=6 + # => This Inner Loop Header: Depth=7 + movaps B(%rbp,%r13,4), %xmm1 + mulps %xmm0, %xmm1 + addps C(%rbp,%r14,4), %xmm1 + movaps %xmm1, C(%rbp,%r14,4) + addq $16, %rbp + cmpq $256, %rbp # imm = 0x100 + jne .LBB5_19 + jmp .LBB5_17 +.LBB5_3: # %omp.checkNext.loopexit + # in Loop: Header=BB5_1 Depth=1 + leaq 32(%rsp), %rax + movq %rax, %rdi + leaq 24(%rsp), %rax + movq %rax, %rsi +.LBB5_1: # %omp.setup + # =>This Loop Header: Depth=1 + # Child Loop BB5_7 Depth 2 + # Child Loop BB5_9 Depth 3 + # Child Loop BB5_11 Depth 4 + # Child Loop BB5_14 Depth 5 + # Child Loop BB5_18 Depth 6 + # Child Loop BB5_19 Depth 7 + callq GOMP_loop_runtime_next + testb $1, %al + jne .LBB5_4 +# BB#2: # %omp.exit + callq GOMP_loop_end_nowait + addq $40, %rsp + popq %rbx + popq %r12 + popq %r13 + popq %r14 + popq %r15 + popq %rbp + ret +.Ltmp55: + .size main.omp_subfn1, .Ltmp55-main.omp_subfn1 +.Ltmp56: + .cfi_endproc +.Leh_func_end5: + + .type A,@object # @A + .comm A,9437184,16 + .type B,@object # @B + .comm B,9437184,16 + .type .L.str,@object # @.str + .section .rodata.str1.1,"aMS",@progbits,1 +.L.str: + .asciz "%lf " + .size .L.str, 5 + + .type C,@object # @C + .comm C,9437184,16 + + .section ".note.GNU-stack","",@progbits diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.exe b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.exe Binary files differnew file mode 100755 index 00000000000..fac17e21685 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.exe diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.ll b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.ll Binary files differnew file mode 100644 index 00000000000..7217bc92c80 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.ll diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.s b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.s new file mode 100644 index 00000000000..a1d6f0bf9b0 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.s @@ -0,0 +1,318 @@ + .file "matmul.polly.interchanged+tiled+vector.ll" + .section .rodata.cst8,"aM",@progbits,8 + .align 8 +.LCPI0_0: + .quad 4602678819172646912 # double 5.000000e-01 + .text + .globl init_array + .align 16, 0x90 + .type init_array,@function +init_array: # @init_array +# BB#0: # %pollyBB + xorl %eax, %eax + movsd .LCPI0_0(%rip), %xmm0 + movq %rax, %rcx + .align 16, 0x90 +.LBB0_2: # %polly.loop_header1.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB0_3 Depth 2 + movq $-1536, %rdx # imm = 0xFFFFFFFFFFFFFA00 + xorl %esi, %esi + .align 16, 0x90 +.LBB0_3: # %polly.loop_body2 + # Parent Loop BB0_2 Depth=1 + # => This Inner Loop Header: Depth=2 + movl %esi, %edi + sarl $31, %edi + shrl $22, %edi + addl %esi, %edi + andl $-1024, %edi # imm = 0xFFFFFFFFFFFFFC00 + negl %edi + leal 1(%rsi,%rdi), %edi + cvtsi2sd %edi, %xmm1 + mulsd %xmm0, %xmm1 + cvtsd2ss %xmm1, %xmm1 + movss %xmm1, A+6144(%rax,%rdx,4) + movss %xmm1, B+6144(%rax,%rdx,4) + addl %ecx, %esi + incq %rdx + jne .LBB0_3 +# BB#1: # %polly.loop_header.loopexit + # in Loop: Header=BB0_2 Depth=1 + addq $6144, %rax # imm = 0x1800 + incq %rcx + cmpq $1536, %rcx # imm = 0x600 + jne .LBB0_2 +# BB#4: # %polly.after_loop + ret +.Ltmp0: + .size init_array, .Ltmp0-init_array + + .globl print_array + .align 16, 0x90 + .type print_array,@function +print_array: # @print_array +# BB#0: + pushq %r14 + pushq %rbx + pushq %rax + movq $-9437184, %rbx # imm = 0xFFFFFFFFFF700000 + .align 16, 0x90 +.LBB1_1: # %.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB1_2 Depth 2 + xorl %r14d, %r14d + movq stdout(%rip), %rdi + .align 16, 0x90 +.LBB1_2: # Parent Loop BB1_1 Depth=1 + # => This Inner Loop Header: Depth=2 + movss C+9437184(%rbx,%r14,4), %xmm0 + cvtss2sd %xmm0, %xmm0 + movl $.L.str, %esi + movb $1, %al + callq fprintf + movslq %r14d, %rax + imulq $1717986919, %rax, %rcx # imm = 0x66666667 + movq %rcx, %rdx + shrq $63, %rdx + sarq $37, %rcx + addl %edx, %ecx + imull $80, %ecx, %ecx + subl %ecx, %eax + cmpl $79, %eax + jne .LBB1_4 +# BB#3: # in Loop: Header=BB1_2 Depth=2 + movq stdout(%rip), %rsi + movl $10, %edi + callq fputc +.LBB1_4: # in Loop: Header=BB1_2 Depth=2 + incq %r14 + movq stdout(%rip), %rsi + cmpq $1536, %r14 # imm = 0x600 + movq %rsi, %rdi + jne .LBB1_2 +# BB#5: # in Loop: Header=BB1_1 Depth=1 + movl $10, %edi + callq fputc + addq $6144, %rbx # imm = 0x1800 + jne .LBB1_1 +# BB#6: + addq $8, %rsp + popq %rbx + popq %r14 + ret +.Ltmp1: + .size print_array, .Ltmp1-print_array + + .section .rodata.cst8,"aM",@progbits,8 + .align 8 +.LCPI2_0: + .quad 4602678819172646912 # double 5.000000e-01 + .text + .globl main + .align 16, 0x90 + .type main,@function +main: # @main +# BB#0: # %pollyBB + pushq %rbp + pushq %r15 + pushq %r14 + pushq %r13 + pushq %r12 + pushq %rbx + subq $24, %rsp + xorl %eax, %eax + movsd .LCPI2_0(%rip), %xmm0 + movq %rax, %rcx + .align 16, 0x90 +.LBB2_1: # %polly.loop_header1.preheader.i + # =>This Loop Header: Depth=1 + # Child Loop BB2_2 Depth 2 + movq $-1536, %rdx # imm = 0xFFFFFFFFFFFFFA00 + xorl %esi, %esi + .align 16, 0x90 +.LBB2_2: # %polly.loop_body2.i + # Parent Loop BB2_1 Depth=1 + # => This Inner Loop Header: Depth=2 + movl %esi, %edi + sarl $31, %edi + shrl $22, %edi + addl %esi, %edi + andl $-1024, %edi # imm = 0xFFFFFFFFFFFFFC00 + negl %edi + leal 1(%rsi,%rdi), %edi + cvtsi2sd %edi, %xmm1 + mulsd %xmm0, %xmm1 + cvtsd2ss %xmm1, %xmm1 + movss %xmm1, A+6144(%rax,%rdx,4) + movss %xmm1, B+6144(%rax,%rdx,4) + addl %ecx, %esi + incq %rdx + jne .LBB2_2 +# BB#3: # %polly.loop_header.loopexit.i + # in Loop: Header=BB2_1 Depth=1 + addq $6144, %rax # imm = 0x1800 + incq %rcx + cmpq $1536, %rcx # imm = 0x600 + jne .LBB2_1 +# BB#4: # %polly.loop_header.preheader + movl $C, %edi + xorl %esi, %esi + movl $9437184, %edx # imm = 0x900000 + callq memset + xorl %eax, %eax + movq %rax, 16(%rsp) # 8-byte Spill + movq %rax, (%rsp) # 8-byte Spill + jmp .LBB2_6 + .align 16, 0x90 +.LBB2_5: # %polly.loop_header7.loopexit + # in Loop: Header=BB2_6 Depth=1 + addq $393216, (%rsp) # 8-byte Folded Spill + # imm = 0x60000 + movq 16(%rsp), %rax # 8-byte Reload + addq $64, %rax + movq %rax, 16(%rsp) # 8-byte Spill + cmpq $1536, %rax # imm = 0x600 + je .LBB2_7 +.LBB2_6: # %polly.loop_header12.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB2_9 Depth 2 + # Child Loop BB2_11 Depth 3 + # Child Loop BB2_14 Depth 4 + # Child Loop BB2_18 Depth 5 + # Child Loop BB2_19 Depth 6 + movq 16(%rsp), %rax # 8-byte Reload + leaq 63(%rax), %rax + movq (%rsp), %rcx # 8-byte Reload + leaq A(%rcx), %rdx + movq %rdx, 8(%rsp) # 8-byte Spill + xorl %edx, %edx + jmp .LBB2_9 + .align 16, 0x90 +.LBB2_8: # %polly.loop_header12.loopexit + # in Loop: Header=BB2_9 Depth=2 + addq $256, %rcx # imm = 0x100 + addq $64, %rdx + cmpq $1536, %rdx # imm = 0x600 + je .LBB2_5 +.LBB2_9: # %polly.loop_header17.preheader + # Parent Loop BB2_6 Depth=1 + # => This Loop Header: Depth=2 + # Child Loop BB2_11 Depth 3 + # Child Loop BB2_14 Depth 4 + # Child Loop BB2_18 Depth 5 + # Child Loop BB2_19 Depth 6 + leaq 63(%rdx), %rsi + xorl %edi, %edi + movq 8(%rsp), %r8 # 8-byte Reload + movq %rdx, %r9 + jmp .LBB2_11 + .align 16, 0x90 +.LBB2_10: # %polly.loop_header17.loopexit + # in Loop: Header=BB2_11 Depth=3 + addq $256, %r8 # imm = 0x100 + addq $98304, %r9 # imm = 0x18000 + addq $64, %rdi + cmpq $1536, %rdi # imm = 0x600 + je .LBB2_8 +.LBB2_11: # %polly.loop_body18 + # Parent Loop BB2_6 Depth=1 + # Parent Loop BB2_9 Depth=2 + # => This Loop Header: Depth=3 + # Child Loop BB2_14 Depth 4 + # Child Loop BB2_18 Depth 5 + # Child Loop BB2_19 Depth 6 + cmpq %rax, 16(%rsp) # 8-byte Folded Reload + jg .LBB2_10 +# BB#12: # %polly.loop_body23.lr.ph + # in Loop: Header=BB2_11 Depth=3 + leaq 63(%rdi), %r10 + xorl %r11d, %r11d + jmp .LBB2_14 + .align 16, 0x90 +.LBB2_13: # %polly.loop_header22.loopexit + # in Loop: Header=BB2_14 Depth=4 + addq $6144, %r11 # imm = 0x1800 + cmpq $393216, %r11 # imm = 0x60000 + je .LBB2_10 +.LBB2_14: # %polly.loop_body23 + # Parent Loop BB2_6 Depth=1 + # Parent Loop BB2_9 Depth=2 + # Parent Loop BB2_11 Depth=3 + # => This Loop Header: Depth=4 + # Child Loop BB2_18 Depth 5 + # Child Loop BB2_19 Depth 6 + cmpq %r10, %rdi + jg .LBB2_13 +# BB#15: # %polly.loop_body23 + # in Loop: Header=BB2_14 Depth=4 + cmpq %rsi, %rdx + jg .LBB2_13 +# BB#16: # %polly.loop_body33.lr.ph.preheader + # in Loop: Header=BB2_14 Depth=4 + leaq (%r8,%r11), %rbx + xorl %r14d, %r14d + movq %r9, %r15 + movq %r14, %r12 + jmp .LBB2_18 + .align 16, 0x90 +.LBB2_17: # %polly.loop_header27.loopexit + # in Loop: Header=BB2_18 Depth=5 + addq $1536, %r15 # imm = 0x600 + incq %r12 + cmpq $64, %r12 + je .LBB2_13 +.LBB2_18: # %polly.loop_body33.lr.ph + # Parent Loop BB2_6 Depth=1 + # Parent Loop BB2_9 Depth=2 + # Parent Loop BB2_11 Depth=3 + # Parent Loop BB2_14 Depth=4 + # => This Loop Header: Depth=5 + # Child Loop BB2_19 Depth 6 + movss (%rbx,%r12,4), %xmm0 + pshufd $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0] + movq %r14, %r13 + .align 16, 0x90 +.LBB2_19: # %polly.loop_body33 + # Parent Loop BB2_6 Depth=1 + # Parent Loop BB2_9 Depth=2 + # Parent Loop BB2_11 Depth=3 + # Parent Loop BB2_14 Depth=4 + # Parent Loop BB2_18 Depth=5 + # => This Inner Loop Header: Depth=6 + movaps B(%r13,%r15,4), %xmm1 + mulps %xmm0, %xmm1 + leaq (%r11,%r13), %rbp + addps C(%rcx,%rbp), %xmm1 + movaps %xmm1, C(%rcx,%rbp) + addq $16, %r13 + cmpq $256, %r13 # imm = 0x100 + jne .LBB2_19 + jmp .LBB2_17 +.LBB2_7: # %polly.after_loop9 + xorl %eax, %eax + addq $24, %rsp + popq %rbx + popq %r12 + popq %r13 + popq %r14 + popq %r15 + popq %rbp + ret +.Ltmp2: + .size main, .Ltmp2-main + + .type A,@object # @A + .comm A,9437184,16 + .type B,@object # @B + .comm B,9437184,16 + .type .L.str,@object # @.str + .section .rodata.str1.1,"aMS",@progbits,1 +.L.str: + .asciz "%lf " + .size .L.str, 5 + + .type C,@object # @C + .comm C,9437184,16 + + .section ".note.GNU-stack","",@progbits diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.exe b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.exe Binary files differnew file mode 100755 index 00000000000..4334522f458 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.exe diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.ll b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.ll Binary files differnew file mode 100644 index 00000000000..fa301cfa5eb --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.ll diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.s b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.s new file mode 100644 index 00000000000..0f86df25d35 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.s @@ -0,0 +1,323 @@ + .file "matmul.polly.interchanged+tiled.ll" + .section .rodata.cst8,"aM",@progbits,8 + .align 8 +.LCPI0_0: + .quad 4602678819172646912 # double 5.000000e-01 + .text + .globl init_array + .align 16, 0x90 + .type init_array,@function +init_array: # @init_array +# BB#0: # %pollyBB + xorl %eax, %eax + movsd .LCPI0_0(%rip), %xmm0 + movq %rax, %rcx + .align 16, 0x90 +.LBB0_2: # %polly.loop_header1.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB0_3 Depth 2 + movq $-1536, %rdx # imm = 0xFFFFFFFFFFFFFA00 + xorl %esi, %esi + .align 16, 0x90 +.LBB0_3: # %polly.loop_body2 + # Parent Loop BB0_2 Depth=1 + # => This Inner Loop Header: Depth=2 + movl %esi, %edi + sarl $31, %edi + shrl $22, %edi + addl %esi, %edi + andl $-1024, %edi # imm = 0xFFFFFFFFFFFFFC00 + negl %edi + leal 1(%rsi,%rdi), %edi + cvtsi2sd %edi, %xmm1 + mulsd %xmm0, %xmm1 + cvtsd2ss %xmm1, %xmm1 + movss %xmm1, A+6144(%rax,%rdx,4) + movss %xmm1, B+6144(%rax,%rdx,4) + addl %ecx, %esi + incq %rdx + jne .LBB0_3 +# BB#1: # %polly.loop_header.loopexit + # in Loop: Header=BB0_2 Depth=1 + addq $6144, %rax # imm = 0x1800 + incq %rcx + cmpq $1536, %rcx # imm = 0x600 + jne .LBB0_2 +# BB#4: # %polly.after_loop + ret +.Ltmp0: + .size init_array, .Ltmp0-init_array + + .globl print_array + .align 16, 0x90 + .type print_array,@function +print_array: # @print_array +# BB#0: + pushq %r14 + pushq %rbx + pushq %rax + movq $-9437184, %rbx # imm = 0xFFFFFFFFFF700000 + .align 16, 0x90 +.LBB1_1: # %.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB1_2 Depth 2 + xorl %r14d, %r14d + movq stdout(%rip), %rdi + .align 16, 0x90 +.LBB1_2: # Parent Loop BB1_1 Depth=1 + # => This Inner Loop Header: Depth=2 + movss C+9437184(%rbx,%r14,4), %xmm0 + cvtss2sd %xmm0, %xmm0 + movl $.L.str, %esi + movb $1, %al + callq fprintf + movslq %r14d, %rax + imulq $1717986919, %rax, %rcx # imm = 0x66666667 + movq %rcx, %rdx + shrq $63, %rdx + sarq $37, %rcx + addl %edx, %ecx + imull $80, %ecx, %ecx + subl %ecx, %eax + cmpl $79, %eax + jne .LBB1_4 +# BB#3: # in Loop: Header=BB1_2 Depth=2 + movq stdout(%rip), %rsi + movl $10, %edi + callq fputc +.LBB1_4: # in Loop: Header=BB1_2 Depth=2 + incq %r14 + movq stdout(%rip), %rsi + cmpq $1536, %r14 # imm = 0x600 + movq %rsi, %rdi + jne .LBB1_2 +# BB#5: # in Loop: Header=BB1_1 Depth=1 + movl $10, %edi + callq fputc + addq $6144, %rbx # imm = 0x1800 + jne .LBB1_1 +# BB#6: + addq $8, %rsp + popq %rbx + popq %r14 + ret +.Ltmp1: + .size print_array, .Ltmp1-print_array + + .section .rodata.cst8,"aM",@progbits,8 + .align 8 +.LCPI2_0: + .quad 4602678819172646912 # double 5.000000e-01 + .text + .globl main + .align 16, 0x90 + .type main,@function +main: # @main +# BB#0: # %pollyBB + pushq %rbp + pushq %r15 + pushq %r14 + pushq %r13 + pushq %r12 + pushq %rbx + subq $40, %rsp + xorl %eax, %eax + movsd .LCPI2_0(%rip), %xmm0 + movq %rax, %rcx + .align 16, 0x90 +.LBB2_1: # %polly.loop_header1.preheader.i + # =>This Loop Header: Depth=1 + # Child Loop BB2_2 Depth 2 + movq $-1536, %rdx # imm = 0xFFFFFFFFFFFFFA00 + xorl %esi, %esi + .align 16, 0x90 +.LBB2_2: # %polly.loop_body2.i + # Parent Loop BB2_1 Depth=1 + # => This Inner Loop Header: Depth=2 + movl %esi, %edi + sarl $31, %edi + shrl $22, %edi + addl %esi, %edi + andl $-1024, %edi # imm = 0xFFFFFFFFFFFFFC00 + negl %edi + leal 1(%rsi,%rdi), %edi + cvtsi2sd %edi, %xmm1 + mulsd %xmm0, %xmm1 + cvtsd2ss %xmm1, %xmm1 + movss %xmm1, A+6144(%rax,%rdx,4) + movss %xmm1, B+6144(%rax,%rdx,4) + addl %ecx, %esi + incq %rdx + jne .LBB2_2 +# BB#3: # %polly.loop_header.loopexit.i + # in Loop: Header=BB2_1 Depth=1 + addq $6144, %rax # imm = 0x1800 + incq %rcx + cmpq $1536, %rcx # imm = 0x600 + jne .LBB2_1 +# BB#4: # %polly.loop_header.preheader + movl $C, %eax + movq %rax, 8(%rsp) # 8-byte Spill + xorl %esi, %esi + movl $9437184, %edx # imm = 0x900000 + movl $C, %edi + callq memset + movl $A, %eax + movq %rax, 16(%rsp) # 8-byte Spill + movq $0, 32(%rsp) # 8-byte Folded Spill + jmp .LBB2_6 + .align 16, 0x90 +.LBB2_5: # %polly.loop_header7.loopexit + # in Loop: Header=BB2_6 Depth=1 + addq $393216, 16(%rsp) # 8-byte Folded Spill + # imm = 0x60000 + addq $393216, 8(%rsp) # 8-byte Folded Spill + # imm = 0x60000 + movq 32(%rsp), %rax # 8-byte Reload + addq $64, %rax + movq %rax, 32(%rsp) # 8-byte Spill + cmpq $1536, %rax # imm = 0x600 + je .LBB2_7 +.LBB2_6: # %polly.loop_header12.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB2_9 Depth 2 + # Child Loop BB2_11 Depth 3 + # Child Loop BB2_14 Depth 4 + # Child Loop BB2_18 Depth 5 + # Child Loop BB2_19 Depth 6 + movq 32(%rsp), %rax # 8-byte Reload + leaq 63(%rax), %rax + movl $B, %ecx + movq %rcx, 24(%rsp) # 8-byte Spill + xorl %ecx, %ecx + movq 8(%rsp), %rdx # 8-byte Reload + jmp .LBB2_9 + .align 16, 0x90 +.LBB2_8: # %polly.loop_header12.loopexit + # in Loop: Header=BB2_9 Depth=2 + addq $256, %rdx # imm = 0x100 + addq $256, 24(%rsp) # 8-byte Folded Spill + # imm = 0x100 + addq $64, %rcx + cmpq $1536, %rcx # imm = 0x600 + je .LBB2_5 +.LBB2_9: # %polly.loop_header17.preheader + # Parent Loop BB2_6 Depth=1 + # => This Loop Header: Depth=2 + # Child Loop BB2_11 Depth 3 + # Child Loop BB2_14 Depth 4 + # Child Loop BB2_18 Depth 5 + # Child Loop BB2_19 Depth 6 + leaq 63(%rcx), %rsi + xorl %edi, %edi + movq 16(%rsp), %r8 # 8-byte Reload + movq 24(%rsp), %r9 # 8-byte Reload + jmp .LBB2_11 + .align 16, 0x90 +.LBB2_10: # %polly.loop_header17.loopexit + # in Loop: Header=BB2_11 Depth=3 + addq $256, %r8 # imm = 0x100 + addq $393216, %r9 # imm = 0x60000 + addq $64, %rdi + cmpq $1536, %rdi # imm = 0x600 + je .LBB2_8 +.LBB2_11: # %polly.loop_body18 + # Parent Loop BB2_6 Depth=1 + # Parent Loop BB2_9 Depth=2 + # => This Loop Header: Depth=3 + # Child Loop BB2_14 Depth 4 + # Child Loop BB2_18 Depth 5 + # Child Loop BB2_19 Depth 6 + cmpq %rax, 32(%rsp) # 8-byte Folded Reload + jg .LBB2_10 +# BB#12: # %polly.loop_body23.lr.ph + # in Loop: Header=BB2_11 Depth=3 + leaq 63(%rdi), %r10 + xorl %r11d, %r11d + jmp .LBB2_14 + .align 16, 0x90 +.LBB2_13: # %polly.loop_header22.loopexit + # in Loop: Header=BB2_14 Depth=4 + addq $6144, %r11 # imm = 0x1800 + cmpq $393216, %r11 # imm = 0x60000 + je .LBB2_10 +.LBB2_14: # %polly.loop_body23 + # Parent Loop BB2_6 Depth=1 + # Parent Loop BB2_9 Depth=2 + # Parent Loop BB2_11 Depth=3 + # => This Loop Header: Depth=4 + # Child Loop BB2_18 Depth 5 + # Child Loop BB2_19 Depth 6 + cmpq %r10, %rdi + jg .LBB2_13 +# BB#15: # %polly.loop_body23 + # in Loop: Header=BB2_14 Depth=4 + cmpq %rsi, %rcx + jg .LBB2_13 +# BB#16: # %polly.loop_body33.lr.ph.preheader + # in Loop: Header=BB2_14 Depth=4 + leaq (%rdx,%r11), %rbx + leaq (%r8,%r11), %r14 + xorl %r15d, %r15d + movq %r9, %r12 + movq %r15, %r13 + jmp .LBB2_18 + .align 16, 0x90 +.LBB2_17: # %polly.loop_header27.loopexit + # in Loop: Header=BB2_18 Depth=5 + addq $6144, %r12 # imm = 0x1800 + incq %r13 + cmpq $64, %r13 + je .LBB2_13 +.LBB2_18: # %polly.loop_body33.lr.ph + # Parent Loop BB2_6 Depth=1 + # Parent Loop BB2_9 Depth=2 + # Parent Loop BB2_11 Depth=3 + # Parent Loop BB2_14 Depth=4 + # => This Loop Header: Depth=5 + # Child Loop BB2_19 Depth 6 + movss (%r14,%r13,4), %xmm0 + movq %r15, %rbp + .align 16, 0x90 +.LBB2_19: # %polly.loop_body33 + # Parent Loop BB2_6 Depth=1 + # Parent Loop BB2_9 Depth=2 + # Parent Loop BB2_11 Depth=3 + # Parent Loop BB2_14 Depth=4 + # Parent Loop BB2_18 Depth=5 + # => This Inner Loop Header: Depth=6 + movss (%r12,%rbp,4), %xmm1 + mulss %xmm0, %xmm1 + addss (%rbx,%rbp,4), %xmm1 + movss %xmm1, (%rbx,%rbp,4) + incq %rbp + cmpq $64, %rbp + jne .LBB2_19 + jmp .LBB2_17 +.LBB2_7: # %polly.after_loop9 + xorl %eax, %eax + addq $40, %rsp + popq %rbx + popq %r12 + popq %r13 + popq %r14 + popq %r15 + popq %rbp + ret +.Ltmp2: + .size main, .Ltmp2-main + + .type A,@object # @A + .comm A,9437184,16 + .type B,@object # @B + .comm B,9437184,16 + .type .L.str,@object # @.str + .section .rodata.str1.1,"aMS",@progbits,1 +.L.str: + .asciz "%lf " + .size .L.str, 5 + + .type C,@object # @C + .comm C,9437184,16 + + .section ".note.GNU-stack","",@progbits diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged.exe b/polly/www/experiments/matmul/matmul.polly.interchanged.exe Binary files differnew file mode 100755 index 00000000000..cc125c4b2b1 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged.exe diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged.ll b/polly/www/experiments/matmul/matmul.polly.interchanged.ll Binary files differnew file mode 100644 index 00000000000..c0a54bb64f4 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged.ll diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged.s b/polly/www/experiments/matmul/matmul.polly.interchanged.s new file mode 100644 index 00000000000..8bbc523f764 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged.s @@ -0,0 +1,217 @@ + .file "matmul.polly.interchanged.ll" + .section .rodata.cst8,"aM",@progbits,8 + .align 8 +.LCPI0_0: + .quad 4602678819172646912 # double 5.000000e-01 + .text + .globl init_array + .align 16, 0x90 + .type init_array,@function +init_array: # @init_array +# BB#0: # %pollyBB + xorl %eax, %eax + movsd .LCPI0_0(%rip), %xmm0 + movq %rax, %rcx + .align 16, 0x90 +.LBB0_2: # %polly.loop_header1.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB0_3 Depth 2 + movq $-1536, %rdx # imm = 0xFFFFFFFFFFFFFA00 + xorl %esi, %esi + .align 16, 0x90 +.LBB0_3: # %polly.loop_body2 + # Parent Loop BB0_2 Depth=1 + # => This Inner Loop Header: Depth=2 + movl %esi, %edi + sarl $31, %edi + shrl $22, %edi + addl %esi, %edi + andl $-1024, %edi # imm = 0xFFFFFFFFFFFFFC00 + negl %edi + leal 1(%rsi,%rdi), %edi + cvtsi2sd %edi, %xmm1 + mulsd %xmm0, %xmm1 + cvtsd2ss %xmm1, %xmm1 + movss %xmm1, A+6144(%rax,%rdx,4) + movss %xmm1, B+6144(%rax,%rdx,4) + addl %ecx, %esi + incq %rdx + jne .LBB0_3 +# BB#1: # %polly.loop_header.loopexit + # in Loop: Header=BB0_2 Depth=1 + addq $6144, %rax # imm = 0x1800 + incq %rcx + cmpq $1536, %rcx # imm = 0x600 + jne .LBB0_2 +# BB#4: # %polly.after_loop + ret +.Ltmp0: + .size init_array, .Ltmp0-init_array + + .globl print_array + .align 16, 0x90 + .type print_array,@function +print_array: # @print_array +# BB#0: + pushq %r14 + pushq %rbx + pushq %rax + movq $-9437184, %rbx # imm = 0xFFFFFFFFFF700000 + .align 16, 0x90 +.LBB1_1: # %.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB1_2 Depth 2 + xorl %r14d, %r14d + movq stdout(%rip), %rdi + .align 16, 0x90 +.LBB1_2: # Parent Loop BB1_1 Depth=1 + # => This Inner Loop Header: Depth=2 + movss C+9437184(%rbx,%r14,4), %xmm0 + cvtss2sd %xmm0, %xmm0 + movl $.L.str, %esi + movb $1, %al + callq fprintf + movslq %r14d, %rax + imulq $1717986919, %rax, %rcx # imm = 0x66666667 + movq %rcx, %rdx + shrq $63, %rdx + sarq $37, %rcx + addl %edx, %ecx + imull $80, %ecx, %ecx + subl %ecx, %eax + cmpl $79, %eax + jne .LBB1_4 +# BB#3: # in Loop: Header=BB1_2 Depth=2 + movq stdout(%rip), %rsi + movl $10, %edi + callq fputc +.LBB1_4: # in Loop: Header=BB1_2 Depth=2 + incq %r14 + movq stdout(%rip), %rsi + cmpq $1536, %r14 # imm = 0x600 + movq %rsi, %rdi + jne .LBB1_2 +# BB#5: # in Loop: Header=BB1_1 Depth=1 + movl $10, %edi + callq fputc + addq $6144, %rbx # imm = 0x1800 + jne .LBB1_1 +# BB#6: + addq $8, %rsp + popq %rbx + popq %r14 + ret +.Ltmp1: + .size print_array, .Ltmp1-print_array + + .section .rodata.cst8,"aM",@progbits,8 + .align 8 +.LCPI2_0: + .quad 4602678819172646912 # double 5.000000e-01 + .text + .globl main + .align 16, 0x90 + .type main,@function +main: # @main +# BB#0: # %pollyBB + pushq %rax + xorl %eax, %eax + movsd .LCPI2_0(%rip), %xmm0 + movq %rax, %rcx + .align 16, 0x90 +.LBB2_1: # %polly.loop_header1.preheader.i + # =>This Loop Header: Depth=1 + # Child Loop BB2_2 Depth 2 + movq $-1536, %rdx # imm = 0xFFFFFFFFFFFFFA00 + xorl %esi, %esi + .align 16, 0x90 +.LBB2_2: # %polly.loop_body2.i + # Parent Loop BB2_1 Depth=1 + # => This Inner Loop Header: Depth=2 + movl %esi, %edi + sarl $31, %edi + shrl $22, %edi + addl %esi, %edi + andl $-1024, %edi # imm = 0xFFFFFFFFFFFFFC00 + negl %edi + leal 1(%rsi,%rdi), %edi + cvtsi2sd %edi, %xmm1 + mulsd %xmm0, %xmm1 + cvtsd2ss %xmm1, %xmm1 + movss %xmm1, A+6144(%rax,%rdx,4) + movss %xmm1, B+6144(%rax,%rdx,4) + addl %ecx, %esi + incq %rdx + jne .LBB2_2 +# BB#3: # %polly.loop_header.loopexit.i + # in Loop: Header=BB2_1 Depth=1 + addq $6144, %rax # imm = 0x1800 + incq %rcx + cmpq $1536, %rcx # imm = 0x600 + jne .LBB2_1 +# BB#4: # %polly.loop_header.preheader + movl $C, %edi + xorl %esi, %esi + movl $9437184, %edx # imm = 0x900000 + callq memset + xorl %eax, %eax + jmp .LBB2_6 + .align 16, 0x90 +.LBB2_5: # %polly.loop_header7.loopexit + # in Loop: Header=BB2_6 Depth=1 + addq $6144, %rax # imm = 0x1800 + cmpq $9437184, %rax # imm = 0x900000 + je .LBB2_7 +.LBB2_6: # %polly.loop_header12.preheader + # =>This Loop Header: Depth=1 + # Child Loop BB2_9 Depth 2 + # Child Loop BB2_10 Depth 3 + leaq A(%rax), %rcx + movq $-9437184, %rdx # imm = 0xFFFFFFFFFF700000 + jmp .LBB2_9 + .align 16, 0x90 +.LBB2_8: # %polly.loop_header12.loopexit + # in Loop: Header=BB2_9 Depth=2 + addq $4, %rcx + addq $6144, %rdx # imm = 0x1800 + je .LBB2_5 +.LBB2_9: # %polly.loop_header17.preheader + # Parent Loop BB2_6 Depth=1 + # => This Loop Header: Depth=2 + # Child Loop BB2_10 Depth 3 + movss (%rcx), %xmm0 + xorl %esi, %esi + .align 16, 0x90 +.LBB2_10: # %polly.loop_body18 + # Parent Loop BB2_6 Depth=1 + # Parent Loop BB2_9 Depth=2 + # => This Inner Loop Header: Depth=3 + movss B+9437184(%rdx,%rsi,4), %xmm1 + mulss %xmm0, %xmm1 + addss C(%rax,%rsi,4), %xmm1 + movss %xmm1, C(%rax,%rsi,4) + incq %rsi + cmpq $1536, %rsi # imm = 0x600 + jne .LBB2_10 + jmp .LBB2_8 +.LBB2_7: # %polly.after_loop9 + xorl %eax, %eax + popq %rdx + ret +.Ltmp2: + .size main, .Ltmp2-main + + .type A,@object # @A + .comm A,9437184,16 + .type B,@object # @B + .comm B,9437184,16 + .type .L.str,@object # @.str + .section .rodata.str1.1,"aMS",@progbits,1 +.L.str: + .asciz "%lf " + .size .L.str, 5 + + .type C,@object # @C + .comm C,9437184,16 + + .section ".note.GNU-stack","",@progbits diff --git a/polly/www/experiments/matmul/matmul.preopt.ll b/polly/www/experiments/matmul/matmul.preopt.ll new file mode 100644 index 00000000000..9287d7e141b --- /dev/null +++ b/polly/www/experiments/matmul/matmul.preopt.ll @@ -0,0 +1,180 @@ +; ModuleID = 'matmul.s' +target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" +target triple = "x86_64-unknown-linux-gnu" + +%struct._IO_FILE = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %struct._IO_marker*, %struct._IO_FILE*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] } +%struct._IO_marker = type { %struct._IO_marker*, %struct._IO_FILE*, i32 } + +@A = common global [1536 x [1536 x float]] zeroinitializer, align 16 +@B = common global [1536 x [1536 x float]] zeroinitializer, align 16 +@stdout = external global %struct._IO_FILE* +@.str = private unnamed_addr constant [5 x i8] c"%lf \00" +@C = common global [1536 x [1536 x float]] zeroinitializer, align 16 +@.str1 = private unnamed_addr constant [2 x i8] c"\0A\00" + +define void @init_array() nounwind { +; <label>:0 + br label %1 + +; <label>:1 ; preds = %18, %0 + %2 = phi i64 [ %indvar.next2, %18 ], [ 0, %0 ] + %exitcond5 = icmp ne i64 %2, 1536 + br i1 %exitcond5, label %3, label %19 + +; <label>:3 ; preds = %1 + br label %4 + +; <label>:4 ; preds = %16, %3 + %indvar = phi i64 [ %indvar.next, %16 ], [ 0, %3 ] + %scevgep4 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %2, i64 %indvar + %scevgep = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %2, i64 %indvar + %tmp = mul i64 %2, %indvar + %tmp3 = trunc i64 %tmp to i32 + %exitcond = icmp ne i64 %indvar, 1536 + br i1 %exitcond, label %5, label %17 + +; <label>:5 ; preds = %4 + %6 = srem i32 %tmp3, 1024 + %7 = add nsw i32 1, %6 + %8 = sitofp i32 %7 to double + %9 = fdiv double %8, 2.000000e+00 + %10 = fptrunc double %9 to float + store float %10, float* %scevgep4 + %11 = srem i32 %tmp3, 1024 + %12 = add nsw i32 1, %11 + %13 = sitofp i32 %12 to double + %14 = fdiv double %13, 2.000000e+00 + %15 = fptrunc double %14 to float + store float %15, float* %scevgep + br label %16 + +; <label>:16 ; preds = %5 + %indvar.next = add i64 %indvar, 1 + br label %4 + +; <label>:17 ; preds = %4 + br label %18 + +; <label>:18 ; preds = %17 + %indvar.next2 = add i64 %2, 1 + br label %1 + +; <label>:19 ; preds = %1 + ret void +} + +define void @print_array() nounwind { +; <label>:0 + br label %1 + +; <label>:1 ; preds = %19, %0 + %indvar1 = phi i64 [ %indvar.next2, %19 ], [ 0, %0 ] + %exitcond3 = icmp ne i64 %indvar1, 1536 + br i1 %exitcond3, label %2, label %20 + +; <label>:2 ; preds = %1 + br label %3 + +; <label>:3 ; preds = %15, %2 + %indvar = phi i64 [ %indvar.next, %15 ], [ 0, %2 ] + %scevgep = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar1, i64 %indvar + %j.0 = trunc i64 %indvar to i32 + %exitcond = icmp ne i64 %indvar, 1536 + br i1 %exitcond, label %4, label %16 + +; <label>:4 ; preds = %3 + %5 = load %struct._IO_FILE** @stdout, align 8 + %6 = load float* %scevgep + %7 = fpext float %6 to double + %8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %5, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %7) + %9 = srem i32 %j.0, 80 + %10 = icmp eq i32 %9, 79 + br i1 %10, label %11, label %14 + +; <label>:11 ; preds = %4 + %12 = load %struct._IO_FILE** @stdout, align 8 + %13 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %12, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) + br label %14 + +; <label>:14 ; preds = %11, %4 + br label %15 + +; <label>:15 ; preds = %14 + %indvar.next = add i64 %indvar, 1 + br label %3 + +; <label>:16 ; preds = %3 + %17 = load %struct._IO_FILE** @stdout, align 8 + %18 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %17, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) + br label %19 + +; <label>:19 ; preds = %16 + %indvar.next2 = add i64 %indvar1, 1 + br label %1 + +; <label>:20 ; preds = %1 + ret void +} + +declare i32 @fprintf(%struct._IO_FILE*, i8*, ...) + +define i32 @main() nounwind { +; <label>:0 + call void @init_array() + br label %1 + +; <label>:1 ; preds = %16, %0 + %indvar3 = phi i64 [ %indvar.next4, %16 ], [ 0, %0 ] + %exitcond9 = icmp ne i64 %indvar3, 1536 + br i1 %exitcond9, label %2, label %17 + +; <label>:2 ; preds = %1 + br label %3 + +; <label>:3 ; preds = %14, %2 + %indvar1 = phi i64 [ %indvar.next2, %14 ], [ 0, %2 ] + %scevgep8 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar3, i64 %indvar1 + %exitcond6 = icmp ne i64 %indvar1, 1536 + br i1 %exitcond6, label %4, label %15 + +; <label>:4 ; preds = %3 + store float 0.000000e+00, float* %scevgep8 + br label %5 + +; <label>:5 ; preds = %12, %4 + %indvar = phi i64 [ %indvar.next, %12 ], [ 0, %4 ] + %scevgep5 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %indvar3, i64 %indvar + %scevgep = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %indvar, i64 %indvar1 + %exitcond = icmp ne i64 %indvar, 1536 + br i1 %exitcond, label %6, label %13 + +; <label>:6 ; preds = %5 + %7 = load float* %scevgep8 + %8 = load float* %scevgep5 + %9 = load float* %scevgep + %10 = fmul float %8, %9 + %11 = fadd float %7, %10 + store float %11, float* %scevgep8 + br label %12 + +; <label>:12 ; preds = %6 + %indvar.next = add i64 %indvar, 1 + br label %5 + +; <label>:13 ; preds = %5 + br label %14 + +; <label>:14 ; preds = %13 + %indvar.next2 = add i64 %indvar1, 1 + br label %3 + +; <label>:15 ; preds = %3 + br label %16 + +; <label>:16 ; preds = %15 + %indvar.next4 = add i64 %indvar3, 1 + br label %1 + +; <label>:17 ; preds = %1 + ret i32 0 +} diff --git a/polly/www/experiments/matmul/matmul.s b/polly/www/experiments/matmul/matmul.s new file mode 100644 index 00000000000..bec9d2a7504 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.s @@ -0,0 +1,255 @@ +; ModuleID = 'matmul.c' +target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" +target triple = "x86_64-unknown-linux-gnu" + +%struct._IO_FILE = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %struct._IO_marker*, %struct._IO_FILE*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] } +%struct._IO_marker = type { %struct._IO_marker*, %struct._IO_FILE*, i32 } + +@A = common global [1536 x [1536 x float]] zeroinitializer, align 16 +@B = common global [1536 x [1536 x float]] zeroinitializer, align 16 +@stdout = external global %struct._IO_FILE* +@.str = private unnamed_addr constant [5 x i8] c"%lf \00" +@C = common global [1536 x [1536 x float]] zeroinitializer, align 16 +@.str1 = private unnamed_addr constant [2 x i8] c"\0A\00" + +define void @init_array() nounwind { + %i = alloca i32, align 4 + %j = alloca i32, align 4 + store i32 0, i32* %i, align 4 + br label %1 + +; <label>:1 ; preds = %41, %0 + %2 = load i32* %i, align 4 + %3 = icmp slt i32 %2, 1536 + br i1 %3, label %4, label %44 + +; <label>:4 ; preds = %1 + store i32 0, i32* %j, align 4 + br label %5 + +; <label>:5 ; preds = %37, %4 + %6 = load i32* %j, align 4 + %7 = icmp slt i32 %6, 1536 + br i1 %7, label %8, label %40 + +; <label>:8 ; preds = %5 + %9 = load i32* %i, align 4 + %10 = load i32* %j, align 4 + %11 = mul nsw i32 %9, %10 + %12 = srem i32 %11, 1024 + %13 = add nsw i32 1, %12 + %14 = sitofp i32 %13 to double + %15 = fdiv double %14, 2.000000e+00 + %16 = fptrunc double %15 to float + %17 = load i32* %j, align 4 + %18 = sext i32 %17 to i64 + %19 = load i32* %i, align 4 + %20 = sext i32 %19 to i64 + %21 = getelementptr inbounds [1536 x [1536 x float]]* @A, i32 0, i64 %20 + %22 = getelementptr inbounds [1536 x float]* %21, i32 0, i64 %18 + store float %16, float* %22 + %23 = load i32* %i, align 4 + %24 = load i32* %j, align 4 + %25 = mul nsw i32 %23, %24 + %26 = srem i32 %25, 1024 + %27 = add nsw i32 1, %26 + %28 = sitofp i32 %27 to double + %29 = fdiv double %28, 2.000000e+00 + %30 = fptrunc double %29 to float + %31 = load i32* %j, align 4 + %32 = sext i32 %31 to i64 + %33 = load i32* %i, align 4 + %34 = sext i32 %33 to i64 + %35 = getelementptr inbounds [1536 x [1536 x float]]* @B, i32 0, i64 %34 + %36 = getelementptr inbounds [1536 x float]* %35, i32 0, i64 %32 + store float %30, float* %36 + br label %37 + +; <label>:37 ; preds = %8 + %38 = load i32* %j, align 4 + %39 = add nsw i32 %38, 1 + store i32 %39, i32* %j, align 4 + br label %5 + +; <label>:40 ; preds = %5 + br label %41 + +; <label>:41 ; preds = %40 + %42 = load i32* %i, align 4 + %43 = add nsw i32 %42, 1 + store i32 %43, i32* %i, align 4 + br label %1 + +; <label>:44 ; preds = %1 + ret void +} + +define void @print_array() nounwind { + %i = alloca i32, align 4 + %j = alloca i32, align 4 + store i32 0, i32* %i, align 4 + br label %1 + +; <label>:1 ; preds = %32, %0 + %2 = load i32* %i, align 4 + %3 = icmp slt i32 %2, 1536 + br i1 %3, label %4, label %35 + +; <label>:4 ; preds = %1 + store i32 0, i32* %j, align 4 + br label %5 + +; <label>:5 ; preds = %26, %4 + %6 = load i32* %j, align 4 + %7 = icmp slt i32 %6, 1536 + br i1 %7, label %8, label %29 + +; <label>:8 ; preds = %5 + %9 = load %struct._IO_FILE** @stdout, align 8 + %10 = load i32* %j, align 4 + %11 = sext i32 %10 to i64 + %12 = load i32* %i, align 4 + %13 = sext i32 %12 to i64 + %14 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %13 + %15 = getelementptr inbounds [1536 x float]* %14, i32 0, i64 %11 + %16 = load float* %15 + %17 = fpext float %16 to double + %18 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %9, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %17) + %19 = load i32* %j, align 4 + %20 = srem i32 %19, 80 + %21 = icmp eq i32 %20, 79 + br i1 %21, label %22, label %25 + +; <label>:22 ; preds = %8 + %23 = load %struct._IO_FILE** @stdout, align 8 + %24 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %23, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) + br label %25 + +; <label>:25 ; preds = %22, %8 + br label %26 + +; <label>:26 ; preds = %25 + %27 = load i32* %j, align 4 + %28 = add nsw i32 %27, 1 + store i32 %28, i32* %j, align 4 + br label %5 + +; <label>:29 ; preds = %5 + %30 = load %struct._IO_FILE** @stdout, align 8 + %31 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %30, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) + br label %32 + +; <label>:32 ; preds = %29 + %33 = load i32* %i, align 4 + %34 = add nsw i32 %33, 1 + store i32 %34, i32* %i, align 4 + br label %1 + +; <label>:35 ; preds = %1 + ret void +} + +declare i32 @fprintf(%struct._IO_FILE*, i8*, ...) + +define i32 @main() nounwind { + %1 = alloca i32, align 4 + %i = alloca i32, align 4 + %j = alloca i32, align 4 + %k = alloca i32, align 4 + %t_start = alloca double, align 8 + %t_end = alloca double, align 8 + store i32 0, i32* %1 + call void @init_array() + store i32 0, i32* %i, align 4 + br label %2 + +; <label>:2 ; preds = %57, %0 + %3 = load i32* %i, align 4 + %4 = icmp slt i32 %3, 1536 + br i1 %4, label %5, label %60 + +; <label>:5 ; preds = %2 + store i32 0, i32* %j, align 4 + br label %6 + +; <label>:6 ; preds = %53, %5 + %7 = load i32* %j, align 4 + %8 = icmp slt i32 %7, 1536 + br i1 %8, label %9, label %56 + +; <label>:9 ; preds = %6 + %10 = load i32* %j, align 4 + %11 = sext i32 %10 to i64 + %12 = load i32* %i, align 4 + %13 = sext i32 %12 to i64 + %14 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %13 + %15 = getelementptr inbounds [1536 x float]* %14, i32 0, i64 %11 + store float 0.000000e+00, float* %15 + store i32 0, i32* %k, align 4 + br label %16 + +; <label>:16 ; preds = %49, %9 + %17 = load i32* %k, align 4 + %18 = icmp slt i32 %17, 1536 + br i1 %18, label %19, label %52 + +; <label>:19 ; preds = %16 + %20 = load i32* %j, align 4 + %21 = sext i32 %20 to i64 + %22 = load i32* %i, align 4 + %23 = sext i32 %22 to i64 + %24 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %23 + %25 = getelementptr inbounds [1536 x float]* %24, i32 0, i64 %21 + %26 = load float* %25 + %27 = load i32* %k, align 4 + %28 = sext i32 %27 to i64 + %29 = load i32* %i, align 4 + %30 = sext i32 %29 to i64 + %31 = getelementptr inbounds [1536 x [1536 x float]]* @A, i32 0, i64 %30 + %32 = getelementptr inbounds [1536 x float]* %31, i32 0, i64 %28 + %33 = load float* %32 + %34 = load i32* %j, align 4 + %35 = sext i32 %34 to i64 + %36 = load i32* %k, align 4 + %37 = sext i32 %36 to i64 + %38 = getelementptr inbounds [1536 x [1536 x float]]* @B, i32 0, i64 %37 + %39 = getelementptr inbounds [1536 x float]* %38, i32 0, i64 %35 + %40 = load float* %39 + %41 = fmul float %33, %40 + %42 = fadd float %26, %41 + %43 = load i32* %j, align 4 + %44 = sext i32 %43 to i64 + %45 = load i32* %i, align 4 + %46 = sext i32 %45 to i64 + %47 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %46 + %48 = getelementptr inbounds [1536 x float]* %47, i32 0, i64 %44 + store float %42, float* %48 + br label %49 + +; <label>:49 ; preds = %19 + %50 = load i32* %k, align 4 + %51 = add nsw i32 %50, 1 + store i32 %51, i32* %k, align 4 + br label %16 + +; <label>:52 ; preds = %16 + br label %53 + +; <label>:53 ; preds = %52 + %54 = load i32* %j, align 4 + %55 = add nsw i32 %54, 1 + store i32 %55, i32* %j, align 4 + br label %6 + +; <label>:56 ; preds = %6 + br label %57 + +; <label>:57 ; preds = %56 + %58 = load i32* %i, align 4 + %59 = add nsw i32 %58, 1 + store i32 %59, i32* %i, align 4 + br label %2 + +; <label>:60 ; preds = %2 + ret i32 0 +} diff --git a/polly/www/experiments/matmul/runall.sh b/polly/www/experiments/matmul/runall.sh new file mode 100755 index 00000000000..0944bd4fb68 --- /dev/null +++ b/polly/www/experiments/matmul/runall.sh @@ -0,0 +1,92 @@ +#!/bin/sh -a + + +echo "--> 1. Create LLVM-IR from C" +clang -S -emit-llvm matmul.c -o matmul.s + +echo "--> 2. Load Polly automatically when calling the 'opt' tool" +export PATH_TO_POLLY_LIB="~/Projekte/polly/build_clang/lib/" +alias opt="opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.so" + +echo "--> 3. Prepare the LLVM-IR for Polly" +opt -S -mem2reg -loop-simplify -indvars matmul.s > matmul.preopt.ll + +echo "--> 4. Show the SCoPs detected by Polly" +opt -basicaa -polly-cloog -analyze -q matmul.preopt.ll + +echo "--> 5.1 Highlight the detected SCoPs in the CFGs of the program" +# We only create .dot files, as directly -view-scops directly calls graphviz +# which would require user interaction to continue the script. +# opt -basicaa -view-scops -disable-output matmul.preopt.ll +opt -basicaa -dot-scops -disable-output matmul.preopt.ll + +echo "--> 5.2 Highlight the detected SCoPs in the CFGs of the program (print \ +no instructions)" +# We only create .dot files, as directly -view-scops-only directly calls +# graphviz which would require user interaction to continue the script. +# opt -basicaa -view-scops-only -disable-output matmul.preopt.ll +opt -basicaa -dot-scops-only -disable-output matmul.preopt.ll + +echo "--> 5.3 Create .png files from the .dot files" +for i in `ls *.dot`; do dot -Tpng $i > $i.png; done + +echo "--> 6. View the polyhedral representation of the SCoPs" +opt -basicaa -polly-scops -analyze matmul.preopt.ll + +echo "--> 7. Show the dependences for the SCoPs" +opt -basicaa -polly-dependences -analyze matmul.preopt.ll + +echo "--> 8. Export jscop files" +opt -basicaa -polly-export-jscop matmul.preopt.ll + +echo "--> 9. Import the updated jscop files and print the new SCoPs. (optional)" +opt -basicaa -polly-import-jscop -polly-cloog -analyze matmul.preopt.ll \ + -polly-import-jscop-postfix=interchanged +opt -basicaa -polly-import-jscop -polly-cloog -analyze matmul.preopt.ll \ + -polly-import-jscop-postfix=interchanged+tiled + +echo "--> 10. Codegenerate the SCoPs" +opt -basicaa -polly-import-jscop -polly-import-jscop-postfix=interchanged \ + -polly-codegen \ + matmul.preopt.ll | opt -O3 > matmul.polly.interchanged.ll +opt -basicaa -polly-import-jscop \ + -polly-import-jscop-postfix=interchanged+tiled -polly-codegen \ + matmul.preopt.ll | opt -O3 > matmul.polly.interchanged+tiled.ll +opt -basicaa -polly-import-jscop \ + -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen \ + matmul.preopt.ll -enable-polly-vector\ + | opt -O3 > matmul.polly.interchanged+tiled+vector.ll +opt -basicaa -polly-import-jscop \ + -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen \ + matmul.preopt.ll -enable-polly-vector -enable-polly-openmp\ + | opt -O3 > matmul.polly.interchanged+tiled+vector+openmp.ll +opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll + +echo "--> 11. Create the executables" +llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && gcc matmul.polly.interchanged.s \ + -o matmul.polly.interchanged.exe +llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && gcc matmul.polly.interchanged+tiled.s \ + -o matmul.polly.interchanged+tiled.exe +llc matmul.polly.interchanged+tiled+vector.ll \ + -o matmul.polly.interchanged+tiled+vector.s \ + && gcc matmul.polly.interchanged+tiled+vector.s \ + -o matmul.polly.interchanged+tiled+vector.exe +llc matmul.polly.interchanged+tiled+vector+openmp.ll \ + -o matmul.polly.interchanged+tiled+vector+openmp.s \ + && gcc -lgomp matmul.polly.interchanged+tiled+vector+openmp.s \ + -o matmul.polly.interchanged+tiled+vector+openmp.exe +llc matmul.normalopt.ll -o matmul.normalopt.s && gcc matmul.normalopt.s \ + -o matmul.normalopt.exe + +echo "--> 12. Compare the runtime of the executables" + +echo "time ./matmul.normalopt.exe" +time -f "%E real, %U user, %S sys" ./matmul.normalopt.exe +echo "time ./matmul.polly.interchanged.exe" +time -f "%E real, %U user, %S sys" ./matmul.polly.interchanged.exe +echo "time ./matmul.polly.interchanged+tiled.exe" +time -f "%E real, %U user, %S sys" ./matmul.polly.interchanged+tiled.exe +echo "time ./matmul.polly.interchanged+tiled+vector.exe" +time -f "%E real, %U user, %S sys" ./matmul.polly.interchanged+tiled+vector.exe +echo "time ./matmul.polly.interchanged+tiled+vector+openmp.exe" +time -f "%E real, %U user, %S sys" ./matmul.polly.interchanged+tiled+vector+openmp.exe diff --git a/polly/www/experiments/matmul/scops.init_array.dot b/polly/www/experiments/matmul/scops.init_array.dot new file mode 100644 index 00000000000..1b3f09284f9 --- /dev/null +++ b/polly/www/experiments/matmul/scops.init_array.dot @@ -0,0 +1,47 @@ +digraph "Scop Graph for 'init_array' function" { + label="Scop Graph for 'init_array' function"; + + Node0x26ade30 [shape=record,label="{%0:\l\l br label %1\l}"]; + Node0x26ade30 -> Node0x26acdd0; + Node0x26acdd0 [shape=record,label="{%1:\l\l %2 = phi i64 [ %indvar.next2, %18 ], [ 0, %0 ]\l %exitcond5 = icmp ne i64 %2, 1536\l br i1 %exitcond5, label %3, label %19\l}"]; + Node0x26acdd0 -> Node0x26acdf0; + Node0x26acdd0 -> Node0x26adce0; + Node0x26acdf0 [shape=record,label="{%3:\l\l br label %4\l}"]; + Node0x26acdf0 -> Node0x26addc0; + Node0x26addc0 [shape=record,label="{%4:\l\l %indvar = phi i64 [ %indvar.next, %16 ], [ 0, %3 ]\l %scevgep4 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %2, i64 %indvar\l %scevgep = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %2, i64 %indvar\l %tmp = mul i64 %2, %indvar\l %tmp3 = trunc i64 %tmp to i32\l %exitcond = icmp ne i64 %indvar, 1536\l br i1 %exitcond, label %5, label %17\l}"]; + Node0x26addc0 -> Node0x26ace70; + Node0x26addc0 -> Node0x26ad010; + Node0x26ace70 [shape=record,label="{%5:\l\l %6 = srem i32 %tmp3, 1024\l %7 = add nsw i32 1, %6\l %8 = sitofp i32 %7 to double\l %9 = fdiv double %8, 2.000000e+00\l %10 = fptrunc double %9 to float\l store float %10, float* %scevgep4\l %11 = srem i32 %tmp3, 1024\l %12 = add nsw i32 1, %11\l %13 = sitofp i32 %12 to double\l %14 = fdiv double %13, 2.000000e+00\l %15 = fptrunc double %14 to float\l store float %15, float* %scevgep\l br label %16\l}"]; + Node0x26ace70 -> Node0x26ace90; + Node0x26ace90 [shape=record,label="{%16:\l\l %indvar.next = add i64 %indvar, 1\l br label %4\l}"]; + Node0x26ace90 -> Node0x26addc0[constraint=false]; + Node0x26ad010 [shape=record,label="{%17:\l\l br label %18\l}"]; + Node0x26ad010 -> Node0x26ad6c0; + Node0x26ad6c0 [shape=record,label="{%18:\l\l %indvar.next2 = add i64 %2, 1\l br label %1\l}"]; + Node0x26ad6c0 -> Node0x26acdd0[constraint=false]; + Node0x26adce0 [shape=record,label="{%19:\l\l ret void\l}"]; + colorscheme = "paired12" + subgraph cluster_0x26a94c0 { + label = ""; + style = solid; + color = 1 + subgraph cluster_0x26aa4e0 { + label = ""; + style = filled; + color = 3 subgraph cluster_0x26a9780 { + label = ""; + style = solid; + color = 5 + Node0x26addc0; + Node0x26ace70; + Node0x26ace90; + } + Node0x26acdd0; + Node0x26acdf0; + Node0x26ad010; + Node0x26ad6c0; + } + Node0x26ade30; + Node0x26adce0; + } +} diff --git a/polly/www/experiments/matmul/scops.init_array.dot.png b/polly/www/experiments/matmul/scops.init_array.dot.png Binary files differnew file mode 100644 index 00000000000..ee04e8b7018 --- /dev/null +++ b/polly/www/experiments/matmul/scops.init_array.dot.png diff --git a/polly/www/experiments/matmul/scops.main.dot b/polly/www/experiments/matmul/scops.main.dot new file mode 100644 index 00000000000..0459c48fb50 --- /dev/null +++ b/polly/www/experiments/matmul/scops.main.dot @@ -0,0 +1,65 @@ +digraph "Scop Graph for 'main' function" { + label="Scop Graph for 'main' function"; + + Node0x26ace10 [shape=record,label="{%0:\l\l call void @init_array()\l br label %1\l}"]; + Node0x26ace10 -> Node0x26acd60; + Node0x26acd60 [shape=record,label="{%1:\l\l %indvar3 = phi i64 [ %indvar.next4, %16 ], [ 0, %0 ]\l %exitcond9 = icmp ne i64 %indvar3, 1536\l br i1 %exitcond9, label %2, label %17\l}"]; + Node0x26acd60 -> Node0x26acd80; + Node0x26acd60 -> Node0x26af2e0; + Node0x26acd80 [shape=record,label="{%2:\l\l br label %3\l}"]; + Node0x26acd80 -> Node0x26aee80; + Node0x26aee80 [shape=record,label="{%3:\l\l %indvar1 = phi i64 [ %indvar.next2, %14 ], [ 0, %2 ]\l %scevgep8 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar3, i64 %indvar1\l %exitcond6 = icmp ne i64 %indvar1, 1536\l br i1 %exitcond6, label %4, label %15\l}"]; + Node0x26aee80 -> Node0x26aeea0; + Node0x26aee80 -> Node0x26aeec0; + Node0x26aeea0 [shape=record,label="{%4:\l\l store float 0.000000e+00, float* %scevgep8\l br label %5\l}"]; + Node0x26aeea0 -> Node0x26aced0; + Node0x26aced0 [shape=record,label="{%5:\l\l %indvar = phi i64 [ %indvar.next, %12 ], [ 0, %4 ]\l %scevgep5 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %indvar3, i64 %indvar\l %scevgep = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %indvar, i64 %indvar1\l %exitcond = icmp ne i64 %indvar, 1536\l br i1 %exitcond, label %6, label %13\l}"]; + Node0x26aced0 -> Node0x26ace60; + Node0x26aced0 -> Node0x26af5e0; + Node0x26ace60 [shape=record,label="{%6:\l\l %7 = load float* %scevgep8\l %8 = load float* %scevgep5\l %9 = load float* %scevgep\l %10 = fmul float %8, %9\l %11 = fadd float %7, %10\l store float %11, float* %scevgep8\l br label %12\l}"]; + Node0x26ace60 -> Node0x26af640; + Node0x26af640 [shape=record,label="{%12:\l\l %indvar.next = add i64 %indvar, 1\l br label %5\l}"]; + Node0x26af640 -> Node0x26aced0[constraint=false]; + Node0x26af5e0 [shape=record,label="{%13:\l\l br label %14\l}"]; + Node0x26af5e0 -> Node0x26af6e0; + Node0x26af6e0 [shape=record,label="{%14:\l\l %indvar.next2 = add i64 %indvar1, 1\l br label %3\l}"]; + Node0x26af6e0 -> Node0x26aee80[constraint=false]; + Node0x26aeec0 [shape=record,label="{%15:\l\l br label %16\l}"]; + Node0x26aeec0 -> Node0x26af740; + Node0x26af740 [shape=record,label="{%16:\l\l %indvar.next4 = add i64 %indvar3, 1\l br label %1\l}"]; + Node0x26af740 -> Node0x26acd60[constraint=false]; + Node0x26af2e0 [shape=record,label="{%17:\l\l ret i32 0\l}"]; + colorscheme = "paired12" + subgraph cluster_0x26a8b20 { + label = ""; + style = solid; + color = 1 + subgraph cluster_0x26a9220 { + label = ""; + style = filled; + color = 3 subgraph cluster_0x26ad500 { + label = ""; + style = solid; + color = 5 + subgraph cluster_0x26ad480 { + label = ""; + style = solid; + color = 7 + Node0x26aced0; + Node0x26ace60; + Node0x26af640; + } + Node0x26aee80; + Node0x26aeea0; + Node0x26af5e0; + Node0x26af6e0; + } + Node0x26acd60; + Node0x26acd80; + Node0x26aeec0; + Node0x26af740; + } + Node0x26ace10; + Node0x26af2e0; + } +} diff --git a/polly/www/experiments/matmul/scops.main.dot.png b/polly/www/experiments/matmul/scops.main.dot.png Binary files differnew file mode 100644 index 00000000000..404d5f19f38 --- /dev/null +++ b/polly/www/experiments/matmul/scops.main.dot.png diff --git a/polly/www/experiments/matmul/scops.print_array.dot b/polly/www/experiments/matmul/scops.print_array.dot new file mode 100644 index 00000000000..6aafb40d666 --- /dev/null +++ b/polly/www/experiments/matmul/scops.print_array.dot @@ -0,0 +1,60 @@ +digraph "Scop Graph for 'print_array' function" { + label="Scop Graph for 'print_array' function"; + + Node0x26ac9a0 [shape=record,label="{%0:\l\l br label %1\l}"]; + Node0x26ac9a0 -> Node0x26acd00; + Node0x26acd00 [shape=record,label="{%1:\l\l %indvar1 = phi i64 [ %indvar.next2, %19 ], [ 0, %0 ]\l %exitcond3 = icmp ne i64 %indvar1, 1536\l br i1 %exitcond3, label %2, label %20\l}"]; + Node0x26acd00 -> Node0x26a8ac0; + Node0x26acd00 -> Node0x26ac9c0; + Node0x26a8ac0 [shape=record,label="{%2:\l\l br label %3\l}"]; + Node0x26a8ac0 -> Node0x26ad940; + Node0x26ad940 [shape=record,label="{%3:\l\l %indvar = phi i64 [ %indvar.next, %15 ], [ 0, %2 ]\l %scevgep = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar1, i64 %indvar\l %j.0 = trunc i64 %indvar to i32\l %exitcond = icmp ne i64 %indvar, 1536\l br i1 %exitcond, label %4, label %16\l}"]; + Node0x26ad940 -> Node0x26acde0; + Node0x26ad940 -> Node0x26ad9e0; + Node0x26acde0 [shape=record,label="{%4:\l\l %5 = load %struct._IO_FILE** @stdout, align 8\l %6 = load float* %scevgep\l %7 = fpext float %6 to double\l %8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %5, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %7)\l %9 = srem i32 %j.0, 80\l %10 = icmp eq i32 %9, 79\l br i1 %10, label %11, label %14\l}"]; + Node0x26acde0 -> Node0x26ada40; + Node0x26acde0 -> Node0x26acfa0; + Node0x26ada40 [shape=record,label="{%11:\l\l %12 = load %struct._IO_FILE** @stdout, align 8\l %13 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %12, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))\l br label %14\l}"]; + Node0x26ada40 -> Node0x26acfa0; + Node0x26acfa0 [shape=record,label="{%14:\l\l br label %15\l}"]; + Node0x26acfa0 -> Node0x26ad6c0; + Node0x26ad6c0 [shape=record,label="{%15:\l\l %indvar.next = add i64 %indvar, 1\l br label %3\l}"]; + Node0x26ad6c0 -> Node0x26ad940[constraint=false]; + Node0x26ad9e0 [shape=record,label="{%16:\l\l %17 = load %struct._IO_FILE** @stdout, align 8\l %18 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %17, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))\l br label %19\l}"]; + Node0x26ad9e0 -> Node0x26ace00; + Node0x26ace00 [shape=record,label="{%19:\l\l %indvar.next2 = add i64 %indvar1, 1\l br label %1\l}"]; + Node0x26ace00 -> Node0x26acd00[constraint=false]; + Node0x26ac9c0 [shape=record,label="{%20:\l\l ret void\l}"]; + colorscheme = "paired12" + subgraph cluster_0x26adae0 { + label = ""; + style = solid; + color = 1 + subgraph cluster_0x26aa030 { + label = ""; + style = solid; + color = 6 + subgraph cluster_0x26a9fb0 { + label = ""; + style = solid; + color = 5 + subgraph cluster_0x26adb60 { + label = ""; + style = solid; + color = 7 + Node0x26acde0; + Node0x26ada40; + } + Node0x26ad940; + Node0x26acfa0; + Node0x26ad6c0; + } + Node0x26acd00; + Node0x26a8ac0; + Node0x26ad9e0; + Node0x26ace00; + } + Node0x26ac9a0; + Node0x26ac9c0; + } +} diff --git a/polly/www/experiments/matmul/scops.print_array.dot.png b/polly/www/experiments/matmul/scops.print_array.dot.png Binary files differnew file mode 100644 index 00000000000..5b1658a291f --- /dev/null +++ b/polly/www/experiments/matmul/scops.print_array.dot.png diff --git a/polly/www/experiments/matmul/scopsonly.init_array.dot b/polly/www/experiments/matmul/scopsonly.init_array.dot new file mode 100644 index 00000000000..7ef7b1397a5 --- /dev/null +++ b/polly/www/experiments/matmul/scopsonly.init_array.dot @@ -0,0 +1,47 @@ +digraph "Scop Graph for 'init_array' function" { + label="Scop Graph for 'init_array' function"; + + Node0x24dfca0 [shape=record,label="{%0}"]; + Node0x24dfca0 -> Node0x24dfdf0; + Node0x24dfdf0 [shape=record,label="{%1}"]; + Node0x24dfdf0 -> Node0x24dee50; + Node0x24dfdf0 -> Node0x24def50; + Node0x24dee50 [shape=record,label="{%3}"]; + Node0x24dee50 -> Node0x24deec0; + Node0x24deec0 [shape=record,label="{%4}"]; + Node0x24deec0 -> Node0x24dfdc0; + Node0x24deec0 -> Node0x24df0c0; + Node0x24dfdc0 [shape=record,label="{%5}"]; + Node0x24dfdc0 -> Node0x24defb0; + Node0x24defb0 [shape=record,label="{%16}"]; + Node0x24defb0 -> Node0x24deec0[constraint=false]; + Node0x24df0c0 [shape=record,label="{%17}"]; + Node0x24df0c0 -> Node0x24deee0; + Node0x24deee0 [shape=record,label="{%18}"]; + Node0x24deee0 -> Node0x24dfdf0[constraint=false]; + Node0x24def50 [shape=record,label="{%19}"]; + colorscheme = "paired12" + subgraph cluster_0x24db4c0 { + label = ""; + style = solid; + color = 1 + subgraph cluster_0x24dc4e0 { + label = ""; + style = filled; + color = 3 subgraph cluster_0x24db780 { + label = ""; + style = solid; + color = 5 + Node0x24deec0; + Node0x24dfdc0; + Node0x24defb0; + } + Node0x24dfdf0; + Node0x24dee50; + Node0x24df0c0; + Node0x24deee0; + } + Node0x24dfca0; + Node0x24def50; + } +} diff --git a/polly/www/experiments/matmul/scopsonly.init_array.dot.png b/polly/www/experiments/matmul/scopsonly.init_array.dot.png Binary files differnew file mode 100644 index 00000000000..92c4f9882bd --- /dev/null +++ b/polly/www/experiments/matmul/scopsonly.init_array.dot.png diff --git a/polly/www/experiments/matmul/scopsonly.main.dot b/polly/www/experiments/matmul/scopsonly.main.dot new file mode 100644 index 00000000000..d375349730a --- /dev/null +++ b/polly/www/experiments/matmul/scopsonly.main.dot @@ -0,0 +1,65 @@ +digraph "Scop Graph for 'main' function" { + label="Scop Graph for 'main' function"; + + Node0x24deb60 [shape=record,label="{%0}"]; + Node0x24deb60 -> Node0x24deaa0; + Node0x24deaa0 [shape=record,label="{%1}"]; + Node0x24deaa0 -> Node0x24e12a0; + Node0x24deaa0 -> Node0x24e0e30; + Node0x24e12a0 [shape=record,label="{%2}"]; + Node0x24e12a0 -> Node0x24e0e00; + Node0x24e0e00 [shape=record,label="{%3}"]; + Node0x24e0e00 -> Node0x24e1410; + Node0x24e0e00 -> Node0x24e1470; + Node0x24e1410 [shape=record,label="{%4}"]; + Node0x24e1410 -> Node0x24e1380; + Node0x24e1380 [shape=record,label="{%5}"]; + Node0x24e1380 -> Node0x24deaf0; + Node0x24e1380 -> Node0x24e1620; + Node0x24deaf0 [shape=record,label="{%6}"]; + Node0x24deaf0 -> Node0x24e1680; + Node0x24e1680 [shape=record,label="{%12}"]; + Node0x24e1680 -> Node0x24e1380[constraint=false]; + Node0x24e1620 [shape=record,label="{%13}"]; + Node0x24e1620 -> Node0x24e16e0; + Node0x24e16e0 [shape=record,label="{%14}"]; + Node0x24e16e0 -> Node0x24e0e00[constraint=false]; + Node0x24e1470 [shape=record,label="{%15}"]; + Node0x24e1470 -> Node0x24e01a0; + Node0x24e01a0 [shape=record,label="{%16}"]; + Node0x24e01a0 -> Node0x24deaa0[constraint=false]; + Node0x24e0e30 [shape=record,label="{%17}"]; + colorscheme = "paired12" + subgraph cluster_0x24dfc10 { + label = ""; + style = solid; + color = 1 + subgraph cluster_0x24de570 { + label = ""; + style = filled; + color = 3 subgraph cluster_0x24de7a0 { + label = ""; + style = solid; + color = 5 + subgraph cluster_0x24de720 { + label = ""; + style = solid; + color = 7 + Node0x24e1380; + Node0x24deaf0; + Node0x24e1680; + } + Node0x24e0e00; + Node0x24e1410; + Node0x24e1620; + Node0x24e16e0; + } + Node0x24deaa0; + Node0x24e12a0; + Node0x24e1470; + Node0x24e01a0; + } + Node0x24deb60; + Node0x24e0e30; + } +} diff --git a/polly/www/experiments/matmul/scopsonly.main.dot.png b/polly/www/experiments/matmul/scopsonly.main.dot.png Binary files differnew file mode 100644 index 00000000000..f0cf154bc79 --- /dev/null +++ b/polly/www/experiments/matmul/scopsonly.main.dot.png diff --git a/polly/www/experiments/matmul/scopsonly.print_array.dot b/polly/www/experiments/matmul/scopsonly.print_array.dot new file mode 100644 index 00000000000..7c46729e31d --- /dev/null +++ b/polly/www/experiments/matmul/scopsonly.print_array.dot @@ -0,0 +1,60 @@ +digraph "Scop Graph for 'print_array' function" { + label="Scop Graph for 'print_array' function"; + + Node0x24df2c0 [shape=record,label="{%0}"]; + Node0x24df2c0 -> Node0x24df2a0; + Node0x24df2a0 [shape=record,label="{%1}"]; + Node0x24df2a0 -> Node0x24dee90; + Node0x24df2a0 -> Node0x24dee20; + Node0x24dee90 [shape=record,label="{%2}"]; + Node0x24dee90 -> Node0x24debd0; + Node0x24debd0 [shape=record,label="{%3}"]; + Node0x24debd0 -> Node0x24df150; + Node0x24debd0 -> Node0x24de990; + Node0x24df150 [shape=record,label="{%4}"]; + Node0x24df150 -> Node0x24df3a0; + Node0x24df150 -> Node0x24defb0; + Node0x24df3a0 [shape=record,label="{%11}"]; + Node0x24df3a0 -> Node0x24defb0; + Node0x24defb0 [shape=record,label="{%14}"]; + Node0x24defb0 -> Node0x24df530; + Node0x24df530 [shape=record,label="{%15}"]; + Node0x24df530 -> Node0x24debd0[constraint=false]; + Node0x24de990 [shape=record,label="{%16}"]; + Node0x24de990 -> Node0x24df9a0; + Node0x24df9a0 [shape=record,label="{%19}"]; + Node0x24df9a0 -> Node0x24df2a0[constraint=false]; + Node0x24dee20 [shape=record,label="{%20}"]; + colorscheme = "paired12" + subgraph cluster_0x24dbe40 { + label = ""; + style = solid; + color = 1 + subgraph cluster_0x24db6e0 { + label = ""; + style = solid; + color = 6 + subgraph cluster_0x24db660 { + label = ""; + style = solid; + color = 5 + subgraph cluster_0x24db5e0 { + label = ""; + style = solid; + color = 7 + Node0x24df150; + Node0x24df3a0; + } + Node0x24debd0; + Node0x24defb0; + Node0x24df530; + } + Node0x24df2a0; + Node0x24dee90; + Node0x24de990; + Node0x24df9a0; + } + Node0x24df2c0; + Node0x24dee20; + } +} diff --git a/polly/www/experiments/matmul/scopsonly.print_array.dot.png b/polly/www/experiments/matmul/scopsonly.print_array.dot.png Binary files differnew file mode 100644 index 00000000000..3426e7b06fb --- /dev/null +++ b/polly/www/experiments/matmul/scopsonly.print_array.dot.png diff --git a/polly/www/get_started.html b/polly/www/get_started.html new file mode 100644 index 00000000000..8c176f41620 --- /dev/null +++ b/polly/www/get_started.html @@ -0,0 +1,136 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" + "http://www.w3.org/TR/html4/strict.dtd"> +<html> +<head> + <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> + <title>Polly - Getting Started</title> + <link type="text/css" rel="stylesheet" href="menu.css" /> + <link type="text/css" rel="stylesheet" href="content.css" /> +</head> +<body> + +<!--#include virtual="menu.html.incl"--> + +<div id="content"> + +<h1>Getting Started: Building and Installing Polly</h1> + + +<h2 id="prerequisites"> Prerequisites </h2> + +The following prerequisites can be installed through the package management +system of your operating system. + +<ul> +<li>libgmp (library + developer package)</li> +</ul> + +<h3> Install ISL / CLooG libraries </h3> + +Polly requires the latest versions of <a href="http://www.cloog.org">CLooG</a> +and <a href="http://repo.or.cz/w/isl.git">isl</a> to be installed. The CLooG git +repository contains both the latest version of CLooG and isl. + +<pre> +git clone git://repo.or.cz/cloog.git +cd cloog +./get_submodules.sh +./autogen.sh +./configure --with-gmp-prefix=/path/to/gmp/installation --prefix=/path/to/cloog/installation +make +make install +</pre> + +<h3> Install Pocc (Optional) </h3> + +Polly can use <a href="http://www.cse.ohio-state.edu/~pouchet/software/pocc"> +PoCC</a> as an external optimizer. PoCC provides an +integrated version of <a href="http://pluto.sf.net">Pluto</a>, an advanced +data-locality and tileability optimizer. To enable this feature install PoCC +1.0-rc3.1 (the one with Polly support) and add it to your PATH. + +<pre> +wget <a +href="http://www.cse.ohio-state.edu/~pouchet/software/pocc/download/pocc-1.0-rc3.1-full.tar.gz">http://www.cse.ohio-state.edu/~pouchet/software/pocc/download/pocc-1.0-rc3.1-full.tar.gz</a> +tar xzf pocc-1.0-rc3.1-full.tar.gz +cd pocc-1.0-rc3.1 +./install.sh +export PATH=`pwd`/bin +</pre> + +Furthermore, scoplib-0.2.0 has to be installed such that polly can link to +it. + +<pre> +wget <a +href="http://www.cse.ohio-state.edu/~pouchet/software/pocc/download/modules/scoplib-0.2.0.tar.gz" +>http://www.cse.ohio-state.edu/~pouchet/software/pocc/download/modules/scoplib-0.2.0.tar.gz</a> +tar xzf scoplib-0.2.0.tar.gz +cd scoplib-0.2.0 +./configure --enable-mp-version --prefix=/path/to/scoplib/installation +make && make install +</pre> + +<h2 id="source"> Get the code </h2> + +<p> +The Polly source code is available in the LLVM SVN repository. For convenience +we also provide a git mirror. To build Polly we extract its source code into the +<em>tools</em> directory of the llvm sources.</p> +<b>A recent LLVM checkout is needed. Older versions may not work!</b> + +<h3>SVN</h3> +<pre> +export LLVM_SRC=`pwd`/llvm +svn checkout http://llvm.org/svn/llvm-project/llvm/trunk ${LLVM_SRC} +cd ${LLVM_SRC}/tools +svn checkout http://llvm.org/svn/llvm-project/polly/trunk polly +</pre> +<h3>GIT</h3> +<pre> +export LLVM_SRC=`pwd`/llvm +git clone http://llvm.org/git/llvm.git ${LLVM_SRC} +cd ${LLVM_SRC}/tools +git clone git://repo.or.cz/polly.git +</pre> + + + +<h2 id="build">Build Polly</h2> + +To build Polly you can either use the autoconf or the cmake build system. At the +moment only the autoconf build system allows to run the llvm test-suite and only +the cmake build system allows to run 'make polly-test'. + +<h3>CMake</h3> + +<pre> +mkdir build +cd build +cmake ${LLVM_SRC} + +# If CMAKE cannot find CLooG and ISL +cmake -DCMAKE_PREFIX_PATH=/cloog/installation . + +# To point CMAKE to the scoplib source +cmake -DCMAKE_PREFIX_PATH=/scoplib/installation . + +make +</pre> + +<h3> Autoconf </h2> + +<pre> +mkdir build +cd build +${LLVM_SRC}/configure --with-cloog=/cloog/installation --with-isl=/cloog/installation --with-scoplib=/scoplib/installation +make +</pre> + +<h2> Test Polly</h2> + +To check if Polly works correctly you can run <em>make polly-test</em>. This +currently works only with a cmake build. +</div> +</body> +</html> diff --git a/polly/www/images/architecture.png b/polly/www/images/architecture.png Binary files differnew file mode 100644 index 00000000000..fdd26a075f9 --- /dev/null +++ b/polly/www/images/architecture.png diff --git a/polly/www/images/iit-madras.png b/polly/www/images/iit-madras.png Binary files differnew file mode 100644 index 00000000000..caf90ab0ad6 --- /dev/null +++ b/polly/www/images/iit-madras.png diff --git a/polly/www/images/osu.png b/polly/www/images/osu.png Binary files differnew file mode 100644 index 00000000000..154a04b1c67 --- /dev/null +++ b/polly/www/images/osu.png diff --git a/polly/www/images/performance/parallel-large.png b/polly/www/images/performance/parallel-large.png Binary files differnew file mode 100644 index 00000000000..76261bb4206 --- /dev/null +++ b/polly/www/images/performance/parallel-large.png diff --git a/polly/www/images/performance/parallel-small.png b/polly/www/images/performance/parallel-small.png Binary files differnew file mode 100644 index 00000000000..3c9f6ba3246 --- /dev/null +++ b/polly/www/images/performance/parallel-small.png diff --git a/polly/www/images/performance/sequential-large.png b/polly/www/images/performance/sequential-large.png Binary files differnew file mode 100644 index 00000000000..5c88354f188 --- /dev/null +++ b/polly/www/images/performance/sequential-large.png diff --git a/polly/www/images/performance/sequential-small.png b/polly/www/images/performance/sequential-small.png Binary files differnew file mode 100644 index 00000000000..94b248de8b7 --- /dev/null +++ b/polly/www/images/performance/sequential-small.png diff --git a/polly/www/images/sys-uni.png b/polly/www/images/sys-uni.png Binary files differnew file mode 100644 index 00000000000..e6b84e16acd --- /dev/null +++ b/polly/www/images/sys-uni.png diff --git a/polly/www/images/uni-passau.png b/polly/www/images/uni-passau.png Binary files differnew file mode 100644 index 00000000000..4bbfa216315 --- /dev/null +++ b/polly/www/images/uni-passau.png diff --git a/polly/www/index.html b/polly/www/index.html new file mode 100644 index 00000000000..e26ea6983d8 --- /dev/null +++ b/polly/www/index.html @@ -0,0 +1,73 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" + "http://www.w3.org/TR/html4/strict.dtd"> +<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> +<html> +<head> + <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> + <title>Polly - Polyhedral optimizations for LLVM</title> + <link type="text/css" rel="stylesheet" href="menu.css"> + <link type="text/css" rel="stylesheet" href="content.css"> +</head> +<body> +<!--#include virtual="menu.html.incl"--> +<div id="content"> + <!--*********************************************************************--> + <h1>Polly: Polyhedral optimizations for LLVM</h1> + <!--*********************************************************************--> + + <p>Polly is a project that works on advanced optimizations for data-locality + and parallelism. It uses the polyhedral model, a high-level mathematical + abstraction, to analyse and optimize the memory access pattern of a program. + Due to the use of a polyhedral representation Polly can easily calculate + detailed data dependency information which it uses to derive an optimized loop + structure. Polly can speed up sequential code by improving memory locality and + consequently the cache use. Furthermore, Polly is able to expose different + kinds of parallelism which it exploits by introducing (basic) OpenMP and SIMD + code. The automatic use of vector accelerators is planned and will take + avantage of the ongoing work on the LLVM PTX backend. + </p> + + <em> Polly is still a research project, that is not production quality. We are + working on a robust implementation of Polly's core. You are invited to join us + by directly contributing to Polly or by using it for your own research.</em> + + <!--=====================================================================--> + <h2>Major changes in Polly</h2> + <!--=====================================================================--> + + <ul> + <li>April 2011 - Polly moves to the LLVM infrastructure </li> + <li>March 2011 - Polly is presented at <a + href="http://impact2011.inrialpes.fr/">CGO/IMPACT 2011</a>, Polly can compile + polybench 2.0 with vectorization and OpenMP code generation. </li> + <li> Februar 2011 - pollycc - a script to automatically compile with + polyhedral optimizations </li> + <li> Januar 2011 - Basic OpenMP support, Alias analysis integration, + Pluto/POCC support </li> + <li> Dezember 2010 - Basic vectorization support </li> + <li> November 2010 - Talk about Polly at the <a + href="http://llvm.org/devmtg/2010-11/">LLVM Developer Meeting</a> </li> + <li> October 2010 - Added dependency analysis </li> + <li> October 2010 - Finished Phase 1 - Get something working </li> + <li> October 2010 - Support for scalar dependences and sequential SCoPs </li> + <li> August 2010 - RegionInfo pass committed to llvm </li> + <li> August 2010 - llvm-test suite compiles </li> + <li> July 2010 - Code generation works for normal SCoPs. </li> + <li> June 2010 - OpenSCoP import/export works (as far as openscop is finished) + </li> + <li> May 2010 - The CLooG AST can be parsed </li> + <li> April 2010 - SCoPs can automatically be detected (WIP) </li> + <li> March 2010 - The RegionInfo framework is almost completed. </li> + <li> February 2010 - Translating a simple loop to Polly-IR and passing it to + CLooG-isl to regenerate a loop structure works. </li> + <li> February 2010 - ISL and CLooG are integrated. </li> + <li> January 2010 - The RegionInfo pass is finished. </li> + <li> End of 2009 - Work on the infrastructure started. </li> + </ul> + <!--=====================================================================--> + <h2> The architecture of Polly</h2> + <!--=====================================================================--> + <img src='images/architecture.png' /> +</div> +</body> +</html> diff --git a/polly/www/menu.css b/polly/www/menu.css new file mode 100644 index 00000000000..9f26687b437 --- /dev/null +++ b/polly/www/menu.css @@ -0,0 +1,39 @@ +/***************/ +/* page layout */ +/***************/ + +[id=menu] { + width:25ex; + float: left; +} +[id=content] { + /* ***** EDIT THIS VALUE IF CONTENT OVERLAPS MENU ***** */ + position:absolute; + left:29ex; + padding-right:4ex; +} + +/**************/ +/* menu style */ +/**************/ + +#menu .submenu { + padding-top:1em; + display:block; +} + +#menu label { + display:block; + font-weight: bold; + text-align: center; + background-color: rgb(192,192,192); +} +#menu a { + padding:0 .2em; + display:block; + text-align: center; + background-color: rgb(235,235,235); +} +#menu a:visited { + color:rgb(100,50,100); +} diff --git a/polly/www/menu.html.incl b/polly/www/menu.html.incl new file mode 100644 index 00000000000..803c724b819 --- /dev/null +++ b/polly/www/menu.html.incl @@ -0,0 +1,36 @@ +<div id="menu"> + <div> + <a href="http://llvm.org/">LLVM Home</a> + </div> + + <div class="submenu"> + <label>Polly Info</label> + <a href="index.html">About</a> + <a href="todo.html">Todo</a> + <a href="passes.html">LLVM Passes</a> +<!-- <a href="examples.html">Examples</a> --> + <a href="performance.html">Performance</a> + <a href="publications.html">Publications</a> + <a href="contributors.html">Contributors</a> + </div> + + <div class="submenu"> + <label>Communication</label> + <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits"> + llvm-commits List + </a> + <a href="http://groups.google.com/group/polly-dev">polly-dev List</a> + <a href="http://llvm.org/bugs/">Bug Reports</a> + </div> + + <div class="submenu"> + <label>The Code</label> + <a href="get_started.html#prerequisites">Prerequisites</a> + <a href="get_started.html#source">Download</a> + <a href="get_started.html#build">Build</a> + <a href="http://llvm.org/viewvc/llvm-project/polly/trunk/"> + Browse (ViewVC) + </a> + <a href="http://repo.or.cz/w/polly-mirror.git">Browse (GitWeb)</a> + </div> +</div> diff --git a/polly/www/passes.html b/polly/www/passes.html new file mode 100644 index 00000000000..d53ccba6b2f --- /dev/null +++ b/polly/www/passes.html @@ -0,0 +1,68 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" + "http://www.w3.org/TR/html4/strict.dtd"> +<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> +<html> +<head> + <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> + <title>Polly - The available LLVM passes</title> + <link type="text/css" rel="stylesheet" href="menu.css"> + <link type="text/css" rel="stylesheet" href="content.css"> +</head> +<body> +<!--#include virtual="menu.html.incl"--> +<div id="content"> + <!--*********************************************************************--> + <h1>Polly: The available LLVM passes</h1> + <!--*********************************************************************--> + + <p>Polly consists of a set of LLVM passes. </p> + +<h2>Front End</h2> +<ul> +<li><em>polly-prepare</em> Prepare code for Polly</li> +<li><em>polly-region-simplify</em> Transform refined regions into simple regions</li> +<li><em>polly-detect</em> Detect SCoPs in functions</li> +<li><em>polly-analyze-ir</em> Analyse the LLVM-IR in the detected SCoPs</li> +<li><em>polly-independent</em> Create independent blocks</li> +<li><em>polly-scops</em> Create polyhedral description of SCoPs</li> +</ul> +<h2>Middle End</h2> +<ul> +<li><em>polly-dependences</em> Calculate the dependences in a SCoPs</li> +<li><em>polly-interchange</em> Perform loop interchange (work in progress)</li> +<li><em>polly-optimize</em> Optimize the SCoP using PoCC</li> +<li>Import/Export +<ul> +<li><em>polly-export-cloog</em> Export the CLooG input file +(Writes a .cloog file for each SCoP)</li> +<li><em>polly-export</em> Export SCoPs with OpenScop library +(Writes a .scop file for each SCoP)</li> +<li><em>polly-import</em> Import SCoPs with OpenScop library +(Reads a .scop file for each SCoP)</li> +<li><em>polly-export-scoplib</em> Export SCoPs with ScopLib library +(Writes a .scoplib file for each SCoP)</li> +<li><em>polly-import-scoplib</em> Import SCoPs with ScopLib library +(Reads a .scoplib file for each SCoP)</li> +<li><em>polly-export-jscop</em> Export SCoPs as JSON +(Writes a .jscop file for each SCoP)</li> +<li><em>polly-import-jscop</em> Import SCoPs from JSON +(Reads a .jscop file for each SCoP)</li> +</ul> +</li> +<li>Graphviz +<ul> +<li><em>dot-scops</em> Print SCoPs of function</li> +<li><em>dot-scops-only</em> Print SCoPs of function (without function bodies)</li> +<li><em>view-scops</em> View SCoPs of function</li> +<li><em>view-scops-only</em> View SCoPs of function (without function bodies)</li> +</ul></li> +</ul> +<h2>Back End</h2> +<ul> +<li><em>polly-cloog</em> Execute CLooG code generation</li> +<li><em>polly-codegen</em> Create LLVM-IR from the polyhedral information</li> +</ul> + +</div> +</body> +</html> diff --git a/polly/www/performance.html b/polly/www/performance.html new file mode 100644 index 00000000000..0d4475b92bd --- /dev/null +++ b/polly/www/performance.html @@ -0,0 +1,66 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" + "http://www.w3.org/TR/html4/strict.dtd"> +<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> +<html> +<head> <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> + <title>Polly - Performance</title> + <link type="text/css" rel="stylesheet" href="menu.css"> + <link type="text/css" rel="stylesheet" href="content.css"> +</head> +<body> +<!--#include virtual="menu.html.incl"--> +<div id="content"> +<h1>Polly: Performance</h1> + +<p>To evaluate the performance benefits Polly currently provides we compiled the +<a href="http://www.cse.ohio-state.edu/~pouchet/software/polybench/">Polybench +2.0</a> benchmark suite. Each benchmark was run with double precision floating +point values on an Intel Core Xeon X5670 CPU @ 2.93GHz (12 cores, 24 thread) +system. We used <a href="http://pocc.sf.net">PoCC</a> and the included <a +href="http://pluto-compiler.sf.net">Pluto</a> transformations to optimize the +code. The source code of Polly and LLVM/clang was checked out on +25/03/2011.</p> + +<p>The results shown were created fully automatically without manual +interaction. We did not yet spend any time to tune the results. Hence +further improvments may be achieved by tuning the code generated by Polly, the +heuristics used by Pluto or by investigating if more code could be optimized. +As Pluto was never used at such a low level, its heuristics are probably +far from perfect. Another area where we expect larger performance improvements +is the SIMD vector code generation. At the moment, it rarely yields to +performance improvements, as we did not yet include vectorization in our +heuristics. By changing this we should be able to significantly increase the +number of test cases that show improvements.</p> + +<p>The polybench test suite contains computation kernels from linear algebra +routines, stencil computations, image processing and data mining. Polly +recognices the majority of them and is able to show good speedup. However, +to show similar speedup on larger examples like the SPEC CPU benchmarks Polly +still misses support for integer casts, variable-sized multi-dimensional arrays +and probably several other construts. This support is necessary as such +constructs appear in larger programs, but not in our limited test suite. + +<h2> Sequential runs</h2> + +For the sequential runs we used Polly to create a program structure that is +optimized for data-locality. One of the major optimizations performed is tiling. +The speedups shown are without the use of any multi-core parallelism. No +additional hardware is used, but the single available core is used more +efficiently. +<h3> Small data size</h3> +<img src="images/performance/sequential-small.png" /><br /> +<h3> Large data size</h3> +<img src="images/performance/sequential-large.png" /> +<h2> Parallel runs</h2> +For the parallel runs we used Polly to expose parallelism and to add calls to an +OpenMP runtime library. With OpenMP we can use all 12 hardware cores +instead of the single core that was used before. We can see that in several +cases we obtain more than linear speedup. This additional speedup is due to +improved data-locality. +<h3> Small data size</h3> +<img src="images/performance/parallel-small.png" /><br /> +<h3> Large data size</h3> +<img src="images/performance/parallel-large.png" /> +</div> +</body> +</html> diff --git a/polly/www/publications.html b/polly/www/publications.html new file mode 100644 index 00000000000..afde2b1346a --- /dev/null +++ b/polly/www/publications.html @@ -0,0 +1,43 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" + "http://www.w3.org/TR/html4/strict.dtd"> +<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> +<html> +<head> + <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> + <title>Polly - Publications</title> + <link type="text/css" rel="stylesheet" href="menu.css"> + <link type="text/css" rel="stylesheet" href="content.css"> +</head> +<body> +<!--#include virtual="menu.html.incl"--> +<div id="content"> + <!--*********************************************************************--> + <h1>Polly: Publications</h1> + <!--*********************************************************************--> + + <h2> 2011 </h2> + <ul> + <li>Polly - Polyhedral Optimization in LLVM<br /> + Tobias Grosser, Hongbin Zheng, Ragesh Aloor, Andreas Simbürger, Armin + Größlinger, Louis-Noël Pouchet<br /> + IMPACT at CGO 2011 <br /> + <a + href="publications/grosser-impact-2011.pdf">Paper</a>, <a + href="publications/grosser-impact-2011-slides.pdf">Slides + </a> + </li> + </ul> + <h2> 2010 </h2> + <ul> + <li>Polly - Polyhedral Transformations for LLVM<br /> + Tobias Grosser, Hongbin Zheng<br /> + LLVM Developer Metting <br /><a + href="http://llvm.org/devmtg/2010-11/Grosser-Polly.pdf">Slides</a>, <a + href="http://llvm.org/devmtg/2010-11/videos/Grosser_Polly-desktop.mp4">Video + (Computer)</a>, <a + href="http://llvm.org/devmtg/2010-11/videos/Grosser_Polly-mobile.mp4">Video + (Mobile)</a></li> + </ul> +</div> +</body> +</html> diff --git a/polly/www/publications/grosser-impact-2011-slides.pdf b/polly/www/publications/grosser-impact-2011-slides.pdf Binary files differnew file mode 100644 index 00000000000..3e604108a8d --- /dev/null +++ b/polly/www/publications/grosser-impact-2011-slides.pdf diff --git a/polly/www/publications/grosser-impact-2011.pdf b/polly/www/publications/grosser-impact-2011.pdf Binary files differnew file mode 100644 index 00000000000..9b79bd26fa0 --- /dev/null +++ b/polly/www/publications/grosser-impact-2011.pdf diff --git a/polly/www/todo.html b/polly/www/todo.html new file mode 100644 index 00000000000..a8b589d3b62 --- /dev/null +++ b/polly/www/todo.html @@ -0,0 +1,363 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" + "http://www.w3.org/TR/html4/strict.dtd"> +<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> +<html> +<head> <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> + <title>Polly - Todo</title> + <link type="text/css" rel="stylesheet" href="menu.css"> + <link type="text/css" rel="stylesheet" href="content.css"> +</head> +<body> +<!--#include virtual="menu.html.incl"--> +<div id="content"> +<h3> Setup infrastructure at LLVM </h3> + +<p>We are currently moving to the LLVM infrastructure +</p> +<table class="wikitable" cellpadding="2"> + +<tbody><tr> +<th width="400px"> Task +</th><th width="150px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Move to LLVM svn +</th><td align="center"> done +</td><td> Tobias + +</td></tr> +<tr> +<th align="left"> Get git mirror at llvm.org/git/polly.git +</th><td align="center"> done +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Setup on commit mails +</th><td align="center"> +</td><td> Tobias +</td></tr> +<tr> + +<th align="left"> Add LLVM Bugzilla category +</th></tr> +<tr> +<th align="left"> Create polly.llvm.org website +</th><td> +</td></tr> +<tr> +<th align="left"> Add Polly buildbot that runs 'polly-test' +</th><td> +</td></tr> +<tr> +<th align="left"> Run nightly performance/coverage tests with the llvm +test-suite +</th><td> + +</td></tr> +</tbody></table> +<h3> Phase 2 </h3> +<p>The second phase of Polly can build on a robust, but very limited framework. +In this phase work on removing limitations and extending the framework is +planned. Also we plan the first very simple transformations. Furthermore the +build system will be improved to simplify deployment. +</p> +<table class="wikitable" cellpadding="2"> + +<tbody><tr> + +<th colspan="3" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> Frontend +</th></tr> +<tr> +<th width="400px"> Task +</th><th width="150px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Support for casts in expressions +</th><td> +</td><td> +</td><td> +</td></tr> + +<tr> +<th align="left"> Support multi dimensional arrays. +</th><td align="center"> planning +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Alias sets +</th></tr> +<tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> Middle end +</th></tr> +<tr> + +<th width="400px"> Task +</th><th width="80px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Implement ISL dependency analysis pass +</th><td align="center"> working +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Connect pocc/pluto + +</th><td align="center"> working +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Finish OpenSCoP support +</th><td> +</td><td> +</td><td> +</td></tr> + +<tr> +<th align="left"> Add SCoPLib 0.2 support to connect pocc +</th><td align="center">done + +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Write simple loop blocking +</th><td> +</td><td> +</td><td> +</td></tr> +<tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> Backend +</th></tr> +<tr> +<th width="400px"> Task + +</th><th width="80px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Code generation for non 64bit targets +</th><td> +</td><td> +</td><td> +</td></tr> +<tr> +<th align="left"> Add write support for data access functions +</th><td> +</td><td> +</td><td> + +</td></tr> +<tr> +<th align="left"> Create vector loops +</th><td align="center">70% done +</td><td>Tobias +</td></tr> +<tr> +<th align="left">Create OpenMP +loops +</th><td align="center">90% done +</td><td> Raghesh & Tobias + +</td></tr> +<tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> General tasks +</th></tr> +<tr> +<th width="300px"> Task +</th><th width="80px"> Status +</th><th>Owner +</th></tr> + +<tr> +<th align="left"> Commit RegionPass patch upstream +</th><td align="center"> done + +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Build against an installed LLVM +</th><td> works for cmake +</td><td> ether +</td></tr> +<tr> +<th align="left"> Setup buildbot regression testers using LNT +</th><td> +</td><td> +</td><td> + +</td></tr> +</tbody></table> +<h3>Phase 1 - Get something +working (Finished October 2010)</span></h3> +<p>The first iteration of this project aims to create a minimal working version +of this framework, that is capable to transform an LLVM-IR program to the +polyhedral model +and back to LLVM-IR without applying any transformations. +</p> +<table class="wikitable" cellpadding="2"> + +<tbody><tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> Frontend +</th></tr> + +<tr> +<th width="300px"> Task +</th><th width="150px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Region detection +</td><td> works + Committed upstream +</td><td>Ether +</td></tr> +<tr> +<th align="left"> Access Functions +</td><td> Working +</td><td> John + Ether +</td></tr> +<tr> +<th align="left"> Alias sets +</td><td> Still open +</td></tr> +<tr> +<th align="left"> Scalar evolution to affine expression +</td><td> Done + +</td><td> +Ether +</td></tr> +<tr> +<th align="left"> SCoP extraction +</td><td> Working +</td><td>Tobias + Ether + +</td></tr> +<tr> +<th align="left"> SCoPs to polyhedral model +</td><td> Working +</td><td>Tobias + Ether +</td></tr> +<tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> Middle end + +</th></tr> +<tr> +<th width="300px"> Task +</th><th width="80px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Define polyhedral description +</td><td> Working +</td><td>Tobias + +</td></tr> +<tr> +<th align="left"> Import/Export using openscop +</td><td> working +</td><td>Tobias +</td></tr> +<tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> Backend +</th></tr> +<tr> + +<th width="300px"> Task +</th><th width="80px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Create LLVM-IR using CLooG +</td><td> Working +</td><td> Tobias + +</td></tr> +<tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> General tasks +</th></tr> +<tr> +<th width="300px"> Task +</th><th width="80px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Setup git repositories + +</td><td> Done +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Add CLooG/isl to build system +</td><td> Works on Unix +</td><td> Tobias + +</td></tr></tbody></table> +<h3>Further projects </h3> +<p>There are several great projects related to polly that can already be started +or are possible in the near future. </p> +<h4>Extend the post dominance analysis for infinite loops (small)</h4> + +<p>At the moment the post dominance analysis cannot handle infinite loops. All +basic blocks in the CFG that do not return are - at the moment - not part of the +post dominance tree. +However by adding some virtual edges, they could be added to the post dominator +tree. Where to add the edges needs some research. +</p><p>This is a small project, that is is well defined. As it is directly in +LLVM it can be easily committed upstream. It is useful for polly, as the +RegionInfo pass will be able to detect regions in parts of the CFG that never +return. +</p><p><i>A good starter to get into LLVM</i> +</p> +<h4>Vectorization </h4> +<p>It is planned to use Polly to support vectorization in LLVM. +</p><p>The basic idea is to use Polly and the polyhedral tools to transform code +such that the innermost loops can be executed in parallel. Afterwards during +code generation in LLVM the loops will be created using vector instructions. +</p><p>To start we plan to use <a href="http://pluto-compiler.sf.net" +class="external text" title="http://pluto-compiler.sf.net" +rel="nofollow">Pluto</a> to transform the loop nests. Pluto can generate vector +parallel code and annotate the vector parallel loops. Some impressive results +were shown on code that was afterwards vectorized by the icc enforcing +vectorization. We believe LLVM can do even better, as it can interact directly +with the polyhedral information. + +</p><p>As an example simple matrix multiplication: +</p> +<pre> +for(i=0; i<M; i++) + for(j=0; j<N; j++) + for(k=0; k<K; k++) + C[i][j] = beta*C[i][j] + alpha*A[i][k] * B[k][j]; +</pre> +<p>After plutos transformations with added tiling and +vectorization hints: +</p> +<pre> +if ((K >= 1) && (M >= 1) && (N >= 1)) + for (t1=0;t1<=floord(M-1,32);t1++) + for (t2=0;t2<=floord(N-1,32);t2++) + for (t3=0;t3<=floord(K-1,32);t3++) + for (t4=32*t1;t4<=min(M-1,32*t1+31);t4++) + for (t5=32*t3;t5<=min(K-1,32*t3+31);t5++) { + lbv=32*t2; + ubv=min(N-1,32*t2+31); + #pragma ivdep + #pragma vector always + for (t6=lbv; t6<=ubv; t6++) + C[t4][t6]=beta*C[t4][t6]+alpha*A[t4][t5]*B[t5][t6];; + } +</pre> +<p>In this example the innermost loop is parallel without any dependencies. </p> +</div> +</body> +</html> |

