diff options
| author | Tobias Grosser <grosser@fim.uni-passau.de> | 2011-05-02 07:48:29 +0000 | 
|---|---|---|
| committer | Tobias Grosser <grosser@fim.uni-passau.de> | 2011-05-02 07:48:29 +0000 | 
| commit | fe6d9849a1a425e2e87bc8db7292bfa5324aa5bc (patch) | |
| tree | 500f31fab9b3eaba1cded84e7a65b2ffac71efd1 | |
| parent | 5e9f05c2b84314e3a5e717c5dfd56862db5ebb39 (diff) | |
| download | bcm5719-llvm-fe6d9849a1a425e2e87bc8db7292bfa5324aa5bc.tar.gz bcm5719-llvm-fe6d9849a1a425e2e87bc8db7292bfa5324aa5bc.zip | |
Add new website for Polly
Use the content of the Polly wiki page[1] to create a new website. I do not yet
plan to officially promote this website, but it is already a solid base that we
can improve and peer review.
[1] http://wiki.llvm.org/Polly
llvm-svn: 130689
58 files changed, 4022 insertions, 0 deletions
| diff --git a/polly/www/content.css b/polly/www/content.css new file mode 100644 index 00000000000..29b2c9886d9 --- /dev/null +++ b/polly/www/content.css @@ -0,0 +1,33 @@ +html { margin: 0px; } body { margin: 8px; } + +html, body { +  padding:0px; +  font-size:small; font-family:"Lucida Grande", "Lucida Sans Unicode", Arial, Verdana, Helvetica, sans-serif; background-color: #fff; color: #222; +  line-height:1.5; +} + +h1, h2, h3, tt { color: #000 } + +h1 { padding-top:0px; margin-top:0px;} +h2 { color:#333333; padding-top:0.5em; } +h3 { padding-top: 0.5em; margin-bottom: -0.25em; color:#2d58b7} +li { padding-bottom: 0.5em; } +ul { padding-left:1.5em; } + +PRE.code {padding-left: 0.5em; background-color: #eeeeee} +PRE {padding-left: 0.5em} + +/* Slides */ +IMG.img_slide { +    display: block; +    margin-left: auto; +    margin-right: auto +} + +.itemTitle { color:#2d58b7 } + +span.error { color:red } +span.caret { color:green; font-weight:bold } + +/* Tables */ +tr { vertical-align:top } diff --git a/polly/www/contributors.html b/polly/www/contributors.html new file mode 100644 index 00000000000..9c92ab27f8d --- /dev/null +++ b/polly/www/contributors.html @@ -0,0 +1,55 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" +          "http://www.w3.org/TR/html4/strict.dtd"> +<html> +<head> +  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> +  <title>Polly - Contributors</title> +  <link type="text/css" rel="stylesheet" href="menu.css" /> +  <link type="text/css" rel="stylesheet" href="content.css" /> +</head> +<body> + +<!--#include virtual="menu.html.incl"--> + +<div id="content"> + +<h1>Polly: Contributors</h1> + +Polly is developed by a team of students supported by different universities. + +<h2>People</h2> +<h3>Raghesh Aloor</h3> +<p>Raghesh works on OpenMP code generation. He is funded as Google Summer of Code +Student 2011.</p> + +<h3>Tobias Grosser</h3> +<p>Tobias is one of the two Co-founders of Polly. He designed the overall +architecture and contributed to almost every part of Polly. He did his work +during his diploma studies at University of Passau. Furthermore, he spent 6 +months at Ohio State University where he was founded by the U.S. National +Science Foundation through awards 0811781 and 0926688.</p> + +<p>Website: <a href="http://www.grosser.es">www.grosser.es</a></p> + + +<h3>Andreas Simbuerger</h3> +<p> +Andreas works on the profiling infrastructure during his PhD at University of +Passau. +</p> +<p>Website: <a href="http://www.infosun.fim.uni-passau.de/cl/staff/simbuerger/"> +http://www.infosun.fim.uni-passau.de/cl/staff/simbuerger/</a></p> +<h3>Hongbin Zheng</h3> +<p>Hongbin Zheng is one of the two Co-founders of Polly. He was funded as a +Google Summer of Code Student 2010 and implemented parts of the Polly frontends +as well as the automake/cmake infrastructure.</p> + +<h2> Universities</h2> + +<p>Polly is supported by the following Universities.</p> +<img src="images/iit-madras.png" style="padding:1em" /> +<img src="images/uni-passau.png" style="padding: 1em; padding-bottom:2em;"/> +<img src="images/osu.png" style="padding:1em"/> +<img src="images/sys-uni.png" style="padding:1em"/> +</body> +</html> diff --git a/polly/www/examples.html b/polly/www/examples.html new file mode 100644 index 00000000000..706198b9a8a --- /dev/null +++ b/polly/www/examples.html @@ -0,0 +1,317 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"  +          "http://www.w3.org/TR/html4/strict.dtd"> +<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> +<html> +<head> +  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> +  <title>Polly - Examples</title> +  <link type="text/css" rel="stylesheet" href="menu.css"> +  <link type="text/css" rel="stylesheet" href="content.css"> +</head> +<body> +<!--#include virtual="menu.html.incl"--> +<div id="content"> +  <!--*********************************************************************--> +  <h1>Polly: Examples</h1> +  <!--*********************************************************************--> +<!--=====================================================================--> +<h2>Optimize Matrix Multiplication Manually</h2> +<!--=====================================================================--> + +<p>Polly does not yet focus on end user, but on research and the development of +new optimizations. Hence for the users of Polly it is often necessary to +understand how Polly works internally. To get an overview of the different steps +taken during polyhedral compilation, we give a step by step example on how to +use the different Polly passes. For this we optimize a simple matrix +multiplication kernel. In case you look for a more automated way of executing +Polly, check out the pollycc tool in utils/pollycc.</p> + +The files used and created in this example are available <a +href="experiments/matmul">here</a>. + +<ol> +<li><h4>Create LLVM-IR from the C code</h4> + +Polly works on LLVM-IR. Hence it is necessary to translate the source files into +LLVM-IR. If more than on file should be optimized the files can be combined into +a single file with llvm-link. + +<pre class="code">clang -S -emit-llvm matmul.c -o matmul.s</pre> +</li> + + +<li><h4>Load Polly automatically when calling the 'opt' tool</h4> + +Polly is not built into opt or bugpoint, but it is a shared library that needs +to be loaded into these tools explicitally. The Polly library is called +LVMPolly.so. For a cmake build it is available in the build/lib/ directory, +autoconf creates the same file in +build/tools/polly/{Release+Asserts|Asserts|Debug}/lib. For convenience we create +an alias that automatically loads Polly if 'opt' is called. +<pre class="code"> +export PATH_TO_POLLY_LIB="~/polly/build/lib/" +alias opt="opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.so"</pre> +</li> + +<li><h4>Prepare the LLVM-IR for Polly</h4> + +Polly is only able to work with code that matches a canonical form. To translate +the LLVM-IR into this form we use a set of canonicalication passes. For this +example only three passes are necessary. To get good coverage on a larger set +of input files a larger set is needed. pollycc contains a set of passes that has +shown to be beneficial. +<pre class="code">opt -S -mem2reg -loop-simplify -indvars matmul.s > matmul.preopt.ll</pre></li> + +<li><h4>Show the SCoPs detected by Polly (optional)</h4> + +To understand if Polly was able to detect some SCoPs, we print the +structure of the detected SCoPs. In our example two SCoPs were detected. One in +'init_array' the other in 'main'. + +<pre class="code">opt -basicaa -polly-cloog -analyze -q matmul.preopt.ll</pre> + +<pre> +init_array(): +for (c2=0;c2<=1023;c2++) { +  for (c4=0;c4<=1023;c4++) { +    Stmt_5(c2,c4); +  } +} + +main(): +for (c2=0;c2<=1023;c2++) { +  for (c4=0;c4<=1023;c4++) { +    Stmt_4(c2,c4); +    for (c6=0;c6<=1023;c6++) { +      Stmt_6(c2,c4,c6); +    } +  } +} +</pre> +</li> +<li><h4>Highlight the detected SCoPs in the CFGs of the program (requires graphviz/dotty)</h4> + +Polly can use graphviz to graphically show a CFG in which the detected SCoPs are +highlighted. It can also create '.dot' files that can be translated by +the 'dot' utility into various graphic formats. + +<pre class="code">opt -basicaa -view-scops -disable-output matmul.preopt.ll +opt -basicaa -view-scops-only -disable-output matmul.preopt.ll</pre> +The output for the different functions<br /> +view-scops: +<a href="experiments/matmul/scops.main.dot.png">main</a>, +<a href="experiments/matmul/scops.init_array.dot.png">init_array</a>, +<a href="experiments/matmul/scops.print_array.dot.png">print_array</a><br /> +view-scops-only: +<a href="experiments/matmul/scopsonly.main.dot.png">main</a>, +<a href="experiments/matmul/scopsonly.init_array.dot.png">init_array</a>, +<a href="experiments/matmul/scopsonly.print_array.dot.png">print_array</a> +</li> + +<li><h4>View the polyhedral representation of the SCoPs</h4> +<pre class="code">opt -basicaa -polly-scops -analyze matmul.preopt.ll</pre> +<pre> +[...] +Printing analysis 'Polly - Create polyhedral description of Scops' for region: '%1 => %17' in function 'init_array': +   Context: +   { [] } +   Statements { +   	Stmt_5 +           Domain := +               { Stmt_5[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }; +           Scattering := +               { Stmt_5[i0, i1] -> scattering[0, i0, 0, i1, 0] }; +           WriteAccess :=  +               { Stmt_5[i0, i1] -> MemRef_A[1037i0 + i1] }; +           WriteAccess :=  +               { Stmt_5[i0, i1] -> MemRef_B[1047i0 + i1] }; +   	FinalRead +           Domain := +               { FinalRead[0] }; +           Scattering := +               { FinalRead[i0] -> scattering[200000000, o1, o2, o3, o4] }; +           ReadAccess :=  +               { FinalRead[i0] -> MemRef_A[o0] }; +           ReadAccess :=  +               { FinalRead[i0] -> MemRef_B[o0] }; +   } +Printing analysis 'Polly - Create polyhedral description of Scops' for region: '%0 => <Function  Return>' in function 'init_array': +[...] +Printing analysis 'Polly - Create polyhedral description of Scops' for region: '%1 => %17' in function 'main': +   Context: +   { [] } +   Statements { +   	Stmt_4 +           Domain := +               { Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }; +           Scattering := +               { Stmt_4[i0, i1] -> scattering[0, i0, 0, i1, 0, 0, 0] }; +           WriteAccess :=  +               { Stmt_4[i0, i1] -> MemRef_C[1067i0 + i1] }; +   	Stmt_6 +           Domain := +               { Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }; +           Scattering := +               { Stmt_6[i0, i1, i2] -> scattering[0, i0, 0, i1, 1, i2, 0] }; +           ReadAccess :=  +               { Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }; +           ReadAccess :=  +               { Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] }; +           ReadAccess :=  +               { Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] }; +           WriteAccess :=  +               { Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }; +   	FinalRead +           Domain := +               { FinalRead[0] }; +           Scattering := +               { FinalRead[i0] -> scattering[200000000, o1, o2, o3, o4, o5, o6] }; +           ReadAccess :=  +               { FinalRead[i0] -> MemRef_C[o0] }; +           ReadAccess :=  +               { FinalRead[i0] -> MemRef_A[o0] }; +           ReadAccess :=  +               { FinalRead[i0] -> MemRef_B[o0] }; +   } +Printing analysis 'Polly - Create polyhedral description of Scops' for region: '%0 => <Function  Return>' in function 'main': +Invalid Scop! +</pre> +</li> + +<li><h4>Show the dependences for the SCoPs</h4> +<pre class="code">opt -basicaa -polly-dependences -analyze matmul.preopt.ll</pre> +<pre>Printing analysis 'Polly - Calculate dependences for SCoP' for region: 'for.cond => for.end28' in function 'init_array': +   Must dependences: +       {  } +   May dependences: +       {  } +   Must no source: +       {  } +   May no source: +       {  } +Printing analysis 'Polly - Calculate dependences for SCoP' for region: 'for.cond => for.end48' in function 'main': +   Must dependences: +       {  Stmt_4[i0, i1] -> Stmt_6[i0, i1, 0] : +              i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023; +          Stmt_6[i0, i1, i2] -> Stmt_6[i0, i1, 1 + i2] : +              i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1022;  +          Stmt_6[i0, i1, 1023] -> FinalRead[0] : +              i1 <= 1091540 - 1067i0 and i1 >= -1067i0 and i1 >= 0 and i1 <= 1023; +          Stmt_6[1023, i1, 1023] -> FinalRead[0] : +              i1 >= 0 and i1 <= 1023  +       } +   May dependences: +       {  } +   Must no source: +       {  Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] : +              i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023;  +          Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] : +              i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023;  +          FinalRead[0] -> MemRef_A[o0]; +          FinalRead[0] -> MemRef_B[o0] +          FinalRead[0] -> MemRef_C[o0] : +              o0 >= 1092565 or (exists (e0 = [(o0)/1067]: o0 <= 1091540 and o0 >= 0 +              and 1067e0 <= -1024 + o0 and 1067e0 >= -1066 + o0)) or o0 <= -1; +       } +   May no source: +       {  } +</pre></li> + +<li><h4>Export jscop files</h4> + +Polly can export the polyhedral representation in so called jscop files. Jscop +files contain the polyhedral representation stored in a JSON file. +<pre class="code">opt -basicaa -polly-export-jscop matmul.preopt.ll</pre> +<pre>Writing SCoP 'for.cond => for.end28' in function 'init_array' to './init_array___%for.cond---%for.end28.jscop'. +Writing SCoP 'for.cond => for.end48' in function 'main' to './main___%for.cond---%for.end48.jscop'. +</pre></li> + +<li><h4>Import the changed jscop files and print the updated SCoP structure +(optional)</h4> +<p>Polly can import jscop files, where the schedules of the statements were +changed. With the help of these updated files we can import transformations into +Polly. It is possible to import different jscop files by providing the postfix +of the jscop file that is imported.</p> +<p> The optimized jscop files for this example are hand written. The schedule +used was inspired by looking at the optimizations PoCC performs. If PoCC is +installed Polly can often calculate such schedules fully automatically.</p> + + +<pre class="code">opt -basicaa -polly-import-jscop -polly-print -disable-output matmul.preopt.ll -polly-import-jscop-postfix=.opt</pre> +<pre>Cannot open file: ./init_array___%for.cond---%for.end28.jscop.opt +Skipping import. +In function: 'init_array' SCoP: for.cond => for.end28: +for (c2=0;c2<=1023;c2++) { +  for (c4=0;c4<=1023;c4++) { +    %for.body4(c2,c4); +  } +} +Reading SCoP 'for.cond => for.end48' in function 'main' from './main___%for.cond---%for.end48.scop.opt.opt'. +In function: 'main' SCoP: for.cond => for.end48: +for (c2=0;c2<=1023;c2++) { +  for (c4=0;c4<=1023;c4++) { +    %for.body4(c2,c4); +  } +} +for (c2=0;c2<=1023;c2++) { +  for (c3=0;c3<=1023;c3++) { +    for (c4=0;c4<=1023;c4++) { +      %for.body12(c2,c4,c3); +    } +  } +} +</pre></li> + +<li><h4>Codegenerate the SCoPs</h4> +This generates new code for the SCoPs detected by polly. +If -polly-import is present, transformations specified in the imported openscop +files will be applied. +<pre class="code">opt -basicaa -polly-import -polly-import-postfix=.opt -polly-codegen matmul.preopt.ll | opt -O3 > matmul.pollyopt.ll</pre> +<pre> +Cannot open file: ./init_array___%for.cond---%for.end28.scop.opt +Skipping import. +Reading SCoP 'for.cond => for.end48' in function 'main' from './main___%for.cond---%for.end48.scop.opt'.</pre> + +<pre class="code">opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll</pre></li> + +<li><h4>Create the executables</h4> + +Create one executable optimized with plain -O3 as well as a set of executables +optimized in different ways with Polly. One changes only the loop structure, the +other adds tiling, the next adds vectorization and finally we use OpenMP +parallelism. +<pre class="code"> +llc matmul.normalopt.ll -o matmul.normalopt.s && \ +    gcc matmul.normalopt.s -o matmul.normalopt.exe +llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && \ +    gcc matmul.polly.interchanged.s -o matmul.polly.interchanged.exe +llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && \ +    gcc matmul.polly.interchanged+tiled.s -o matmul.polly.interchanged+tiled.exe +llc matmul.polly.interchanged+tiled+vector.ll -o matmul.polly.interchanged+tiled+vector.s && \ +    gcc matmul.polly.interchanged+tiled+vector.s -o matmul.polly.interchanged+tiled+vector.exe +llc matmul.polly.interchanged+tiled+vector+openmp.ll -o matmul.polly.interchanged+tiled+vector+openmp.s && \ +    gcc -lgomp matmul.polly.interchanged+tiled+vector+openmp.s -o matmul.polly.interchanged+tiled+vector+openmp.exe +    </pre> + +<li><h4>Compare the runtime of the executables</h4> + +By comparing the runtimes of the different code snippets we see that a simple +loop interchange gives here the largest performance boost. However by adding +vectorization and by using OpenMP we can further improve the performance +significantly. +<pre class="code">time ./matmul.normalopt.exe</pre> +<pre>42.68 real, 42.55 user, 0.00 sys</pre> +<pre class="code">time ./matmul.polly.interchanged.exe</pre> +<pre>04.33 real, 4.30 user, 0.01 sys</pre> +<pre class="code">time ./matmul.polly.interchanged+tiled.exe</pre> +<pre>04.11 real, 4.10 user, 0.00 sys</pre> +<pre class="code">time ./matmul.polly.interchanged+tiled+vector.exe</pre> +<pre>01.39 real, 1.36 user, 0.01 sys</pre> +<pre class="code">time ./matmul.polly.interchanged+tiled+vector+openmp.exe</pre> +<pre>00.66 real, 2.58 user, 0.02 sys</pre> +</li> +</ol> + +</div> +</body> +</html> diff --git a/polly/www/experiments/matmul/init_array___%1---%19.jscop b/polly/www/experiments/matmul/init_array___%1---%19.jscop new file mode 100644 index 00000000000..c7f9bb8c87a --- /dev/null +++ b/polly/www/experiments/matmul/init_array___%1---%19.jscop @@ -0,0 +1,21 @@ +{ +   "context" : "{ [] }", +   "name" : "%1 => %19", +   "statements" : [ +      { +         "accesses" : [ +            { +               "kind" : "write", +               "relation" : "{ Stmt_5[i0, i1] -> MemRef_A[1536i0 + i1] }" +            }, +            { +               "kind" : "write", +               "relation" : "{ Stmt_5[i0, i1] -> MemRef_B[1536i0 + i1] }" +            } +         ], +         "domain" : "{ Stmt_5[i0, i1] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 }", +         "name" : "Stmt_5", +         "schedule" : "{ Stmt_5[i0, i1] -> scattering[0, i0, 0, i1, 0] }" +      } +   ] +} diff --git a/polly/www/experiments/matmul/main___%1---%17.jscop b/polly/www/experiments/matmul/main___%1---%17.jscop new file mode 100644 index 00000000000..a3692e52829 --- /dev/null +++ b/polly/www/experiments/matmul/main___%1---%17.jscop @@ -0,0 +1,40 @@ +{ +   "context" : "{ [] }", +   "name" : "%1 => %17", +   "statements" : [ +      { +         "accesses" : [ +            { +               "kind" : "write", +               "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1536i0 + i1] }" +            } +         ], +         "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 }", +         "name" : "Stmt_4", +         "schedule" : "{ Stmt_4[i0, i1] -> scattering[0, i0, 0, i1, 0, 0, 0] }" +      }, +      { +         "accesses" : [ +            { +               "kind" : "read", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" +            }, +            { +               "kind" : "read", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1536i0 + i2] }" +            }, +            { +               "kind" : "read", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }" +            }, +            { +               "kind" : "write", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" +            } +         ], +         "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 and i2 >= 0 and i2 <= 1535 }", +         "name" : "Stmt_6", +         "schedule" : "{ Stmt_6[i0, i1, i2] -> scattering[0, i0, 0, i1, 1, i2, 0] }" +      } +   ] +} diff --git a/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged b/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged new file mode 100644 index 00000000000..d992fe949aa --- /dev/null +++ b/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged @@ -0,0 +1,40 @@ +{ +   "context" : "{ [] }", +   "name" : "%1 => %17", +   "statements" : [ +      { +         "accesses" : [ +            { +               "kind" : "write", +               "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1067i0 + i1] }" +            } +         ], +         "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }", +         "name" : "Stmt_4", +         "schedule" : "{ Stmt_4[i0, i1] -> scattering[0, i0, 0, i1, 0, 0, 0] }" +      }, +      { +         "accesses" : [ +            { +               "kind" : "read", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }" +            }, +            { +               "kind" : "read", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] }" +            }, +            { +               "kind" : "read", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] }" +            }, +            { +               "kind" : "write", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }" +            } +         ], +         "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }", +         "name" : "Stmt_6", +         "schedule" : "{ Stmt_6[i0, i1, i2] -> scattering[1, i0, 0, i2, 0, i1, 0] }" +      } +   ] +} diff --git a/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged+tiled b/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged+tiled new file mode 100644 index 00000000000..29fcca55f3f --- /dev/null +++ b/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged+tiled @@ -0,0 +1,40 @@ +{ +   "context" : "{ [] }", +   "name" : "%1 => %17", +   "statements" : [ +      { +         "accesses" : [ +            { +               "kind" : "write", +               "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1067i0 + i1] }" +            } +         ], +         "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }", +         "name" : "Stmt_4", +         "schedule" : "{ Stmt_4[i0, i1] -> scattering[0, i0, 0, i1, 0, 0, 0] }" +      }, +      { +         "accesses" : [ +            { +               "kind" : "read", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }" +            }, +            { +               "kind" : "read", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] }" +            }, +            { +               "kind" : "read", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] }" +            }, +            { +               "kind" : "write", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }" +            } +         ], +         "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }", +         "name" : "Stmt_6", +         "schedule" : "{ Stmt_6[i0, i1, i2] -> scattering[1, o0, o1, o2, i0, i2, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 }" +      } +   ] +} diff --git a/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged+tiled+vector b/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged+tiled+vector new file mode 100644 index 00000000000..62a444d0b8a --- /dev/null +++ b/polly/www/experiments/matmul/main___%1---%17.jscop.interchanged+tiled+vector @@ -0,0 +1,40 @@ +{ +   "context" : "{ [] }", +   "name" : "%1 => %17", +   "statements" : [ +      { +         "accesses" : [ +            { +               "kind" : "write", +               "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1067i0 + i1] }" +            } +         ], +         "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }", +         "name" : "Stmt_4", +         "schedule" : "{ Stmt_4[i0, i1] -> scattering[0, i0, 0, i1, 0, 0, 0, 0] }" +      }, +      { +         "accesses" : [ +            { +               "kind" : "read", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }" +            }, +            { +               "kind" : "read", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] }" +            }, +            { +               "kind" : "read", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] }" +            }, +            { +               "kind" : "write", +               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] }" +            } +         ], +         "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }", +         "name" : "Stmt_6", +         "schedule" : "{ Stmt_6[i0, i1, i2] -> scattering[1, o0, o1, o2, i0, i2, ii1, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 and ii1 % 4 = 0 and ii1 <= i1 < ii1 + 4}" +      } +   ] +} diff --git a/polly/www/experiments/matmul/matmul.c b/polly/www/experiments/matmul/matmul.c new file mode 100644 index 00000000000..edb2455ae8f --- /dev/null +++ b/polly/www/experiments/matmul/matmul.c @@ -0,0 +1,52 @@ +#include <stdio.h> + +#define N 1536 +float A[N][N]; +float B[N][N]; +float C[N][N]; + +void init_array() +{ +    int i, j; + +    for (i=0; i<N; i++) { +        for (j=0; j<N; j++) { +            A[i][j] = (1+(i*j)%1024)/2.0; +            B[i][j] = (1+(i*j)%1024)/2.0; +        } +    } +} + +void print_array() +{ +    int i, j; + +    for (i=0; i<N; i++) { +        for (j=0; j<N; j++) { +            fprintf(stdout, "%lf ", C[i][j]); +            if (j%80 == 79) fprintf(stdout, "\n"); +        } +        fprintf(stdout, "\n"); +    } +} + +int main() +{ +    int i, j, k; +    double t_start, t_end; + +    init_array(); + +    for(i=0; i<N; i++)  { +        for(j=0; j<N; j++)  { +            C[i][j] = 0; +            for(k=0; k<N; k++) +                C[i][j] = C[i][j] + A[i][k] * B[k][j]; +        } +    } + +#ifdef TEST +    print_array(); +#endif +    return 0; +} diff --git a/polly/www/experiments/matmul/matmul.normalopt.exe b/polly/www/experiments/matmul/matmul.normalopt.exeBinary files differ new file mode 100755 index 00000000000..73b94752d8e --- /dev/null +++ b/polly/www/experiments/matmul/matmul.normalopt.exe diff --git a/polly/www/experiments/matmul/matmul.normalopt.ll b/polly/www/experiments/matmul/matmul.normalopt.llBinary files differ new file mode 100644 index 00000000000..182ed9aa221 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.normalopt.ll diff --git a/polly/www/experiments/matmul/matmul.normalopt.s b/polly/www/experiments/matmul/matmul.normalopt.s new file mode 100644 index 00000000000..f10f6441182 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.normalopt.s @@ -0,0 +1,203 @@ +	.file	"matmul.normalopt.ll" +	.section	.rodata.cst8,"aM",@progbits,8 +	.align	8 +.LCPI0_0: +	.quad	4602678819172646912     # double 5.000000e-01 +	.text +	.globl	init_array +	.align	16, 0x90 +	.type	init_array,@function +init_array:                             # @init_array +# BB#0: +	xorl	%eax, %eax +	movsd	.LCPI0_0(%rip), %xmm0 +	movq	%rax, %rcx +	.align	16, 0x90 +.LBB0_1:                                # %.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB0_2 Depth 2 +	movq	$-1536, %rdx            # imm = 0xFFFFFFFFFFFFFA00 +	xorl	%esi, %esi +	.align	16, 0x90 +.LBB0_2:                                #   Parent Loop BB0_1 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movl	%esi, %edi +	sarl	$31, %edi +	shrl	$22, %edi +	addl	%esi, %edi +	andl	$-1024, %edi            # imm = 0xFFFFFFFFFFFFFC00 +	negl	%edi +	leal	1(%rsi,%rdi), %edi +	cvtsi2sd	%edi, %xmm1 +	mulsd	%xmm0, %xmm1 +	cvtsd2ss	%xmm1, %xmm1 +	movss	%xmm1, A+6144(%rax,%rdx,4) +	movss	%xmm1, B+6144(%rax,%rdx,4) +	addl	%ecx, %esi +	incq	%rdx +	jne	.LBB0_2 +# BB#3:                                 #   in Loop: Header=BB0_1 Depth=1 +	addq	$6144, %rax             # imm = 0x1800 +	incq	%rcx +	cmpq	$1536, %rcx             # imm = 0x600 +	jne	.LBB0_1 +# BB#4: +	ret +.Ltmp0: +	.size	init_array, .Ltmp0-init_array + +	.globl	print_array +	.align	16, 0x90 +	.type	print_array,@function +print_array:                            # @print_array +# BB#0: +	pushq	%r14 +	pushq	%rbx +	pushq	%rax +	movq	$-9437184, %rbx         # imm = 0xFFFFFFFFFF700000 +	.align	16, 0x90 +.LBB1_1:                                # %.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB1_2 Depth 2 +	xorl	%r14d, %r14d +	movq	stdout(%rip), %rdi +	.align	16, 0x90 +.LBB1_2:                                #   Parent Loop BB1_1 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movss	C+9437184(%rbx,%r14,4), %xmm0 +	cvtss2sd	%xmm0, %xmm0 +	movl	$.L.str, %esi +	movb	$1, %al +	callq	fprintf +	movslq	%r14d, %rax +	imulq	$1717986919, %rax, %rcx # imm = 0x66666667 +	movq	%rcx, %rdx +	shrq	$63, %rdx +	sarq	$37, %rcx +	addl	%edx, %ecx +	imull	$80, %ecx, %ecx +	subl	%ecx, %eax +	cmpl	$79, %eax +	jne	.LBB1_4 +# BB#3:                                 #   in Loop: Header=BB1_2 Depth=2 +	movq	stdout(%rip), %rsi +	movl	$10, %edi +	callq	fputc +.LBB1_4:                                #   in Loop: Header=BB1_2 Depth=2 +	incq	%r14 +	movq	stdout(%rip), %rsi +	cmpq	$1536, %r14             # imm = 0x600 +	movq	%rsi, %rdi +	jne	.LBB1_2 +# BB#5:                                 #   in Loop: Header=BB1_1 Depth=1 +	movl	$10, %edi +	callq	fputc +	addq	$6144, %rbx             # imm = 0x1800 +	jne	.LBB1_1 +# BB#6: +	addq	$8, %rsp +	popq	%rbx +	popq	%r14 +	ret +.Ltmp1: +	.size	print_array, .Ltmp1-print_array + +	.section	.rodata.cst8,"aM",@progbits,8 +	.align	8 +.LCPI2_0: +	.quad	4602678819172646912     # double 5.000000e-01 +	.text +	.globl	main +	.align	16, 0x90 +	.type	main,@function +main:                                   # @main +# BB#0: +	xorl	%eax, %eax +	movsd	.LCPI2_0(%rip), %xmm0 +	movq	%rax, %rcx +	.align	16, 0x90 +.LBB2_1:                                # %.preheader.i +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB2_2 Depth 2 +	movq	$-1536, %rdx            # imm = 0xFFFFFFFFFFFFFA00 +	xorl	%esi, %esi +	.align	16, 0x90 +.LBB2_2:                                #   Parent Loop BB2_1 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movl	%esi, %edi +	sarl	$31, %edi +	shrl	$22, %edi +	addl	%esi, %edi +	andl	$-1024, %edi            # imm = 0xFFFFFFFFFFFFFC00 +	negl	%edi +	leal	1(%rsi,%rdi), %edi +	cvtsi2sd	%edi, %xmm1 +	mulsd	%xmm0, %xmm1 +	cvtsd2ss	%xmm1, %xmm1 +	movss	%xmm1, A+6144(%rax,%rdx,4) +	movss	%xmm1, B+6144(%rax,%rdx,4) +	addl	%ecx, %esi +	incq	%rdx +	jne	.LBB2_2 +# BB#3:                                 #   in Loop: Header=BB2_1 Depth=1 +	addq	$6144, %rax             # imm = 0x1800 +	incq	%rcx +	xorl	%edx, %edx +	cmpq	$1536, %rcx             # imm = 0x600 +	jne	.LBB2_1 +	.align	16, 0x90 +.LBB2_4:                                # %.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB2_5 Depth 2 +                                        #       Child Loop BB2_6 Depth 3 +	xorl	%eax, %eax +	xorl	%ecx, %ecx +	.align	16, 0x90 +.LBB2_5:                                #   Parent Loop BB2_4 Depth=1 +                                        # =>  This Loop Header: Depth=2 +                                        #       Child Loop BB2_6 Depth 3 +	movl	$0, C(%rcx,%rdx) +	leaq	B(%rcx), %rsi +	pxor	%xmm0, %xmm0 +	movq	%rax, %rdi +	.align	16, 0x90 +.LBB2_6:                                #   Parent Loop BB2_4 Depth=1 +                                        #     Parent Loop BB2_5 Depth=2 +                                        # =>    This Inner Loop Header: Depth=3 +	movss	A(%rdx,%rdi,4), %xmm1 +	mulss	(%rsi), %xmm1 +	addss	%xmm1, %xmm0 +	addq	$6144, %rsi             # imm = 0x1800 +	incq	%rdi +	cmpq	$1536, %rdi             # imm = 0x600 +	jne	.LBB2_6 +# BB#7:                                 #   in Loop: Header=BB2_5 Depth=2 +	movss	%xmm0, C(%rcx,%rdx) +	addq	$4, %rcx +	cmpq	$6144, %rcx             # imm = 0x1800 +	jne	.LBB2_5 +# BB#8:                                 # %init_array.exit +                                        #   in Loop: Header=BB2_4 Depth=1 +	addq	$6144, %rdx             # imm = 0x1800 +	cmpq	$9437184, %rdx          # imm = 0x900000 +	jne	.LBB2_4 +# BB#9: +	xorl	%eax, %eax +	ret +.Ltmp2: +	.size	main, .Ltmp2-main + +	.type	A,@object               # @A +	.comm	A,9437184,16 +	.type	B,@object               # @B +	.comm	B,9437184,16 +	.type	.L.str,@object          # @.str +	.section	.rodata.str1.1,"aMS",@progbits,1 +.L.str: +	.asciz	 "%lf " +	.size	.L.str, 5 + +	.type	C,@object               # @C +	.comm	C,9437184,16 + +	.section	".note.GNU-stack","",@progbits diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.exe b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.exeBinary files differ new file mode 100755 index 00000000000..7a2e6de6138 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.exe diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.ll b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.llBinary files differ new file mode 100644 index 00000000000..710f706f68e --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.ll diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.s b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.s new file mode 100644 index 00000000000..04dc0656c06 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector+openmp.s @@ -0,0 +1,628 @@ +	.file	"matmul.polly.interchanged+tiled+vector+openmp.ll" +	.text +	.globl	init_array +	.align	16, 0x90 +	.type	init_array,@function +init_array:                             # @init_array +# BB#0:                                 # %pollyBB +	pushq	%rbx +	subq	$16, %rsp +	movq	$A, (%rsp) +	movq	$B, 8(%rsp) +	movl	$init_array.omp_subfn, %edi +	leaq	(%rsp), %rbx +	xorl	%edx, %edx +	xorl	%ecx, %ecx +	movl	$1536, %r8d             # imm = 0x600 +	movl	$1, %r9d +	movq	%rbx, %rsi +	callq	GOMP_parallel_loop_runtime_start +	movq	%rbx, %rdi +	callq	init_array.omp_subfn +	callq	GOMP_parallel_end +	addq	$16, %rsp +	popq	%rbx +	ret +.Ltmp0: +	.size	init_array, .Ltmp0-init_array + +	.globl	print_array +	.align	16, 0x90 +	.type	print_array,@function +print_array:                            # @print_array +# BB#0: +	pushq	%r14 +	pushq	%rbx +	pushq	%rax +	movq	$-9437184, %rbx         # imm = 0xFFFFFFFFFF700000 +	.align	16, 0x90 +.LBB1_1:                                # %.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB1_2 Depth 2 +	xorl	%r14d, %r14d +	movq	stdout(%rip), %rdi +	.align	16, 0x90 +.LBB1_2:                                #   Parent Loop BB1_1 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movss	C+9437184(%rbx,%r14,4), %xmm0 +	cvtss2sd	%xmm0, %xmm0 +	movl	$.L.str, %esi +	movb	$1, %al +	callq	fprintf +	movslq	%r14d, %rax +	imulq	$1717986919, %rax, %rcx # imm = 0x66666667 +	movq	%rcx, %rdx +	shrq	$63, %rdx +	sarq	$37, %rcx +	addl	%edx, %ecx +	imull	$80, %ecx, %ecx +	subl	%ecx, %eax +	cmpl	$79, %eax +	jne	.LBB1_4 +# BB#3:                                 #   in Loop: Header=BB1_2 Depth=2 +	movq	stdout(%rip), %rsi +	movl	$10, %edi +	callq	fputc +.LBB1_4:                                #   in Loop: Header=BB1_2 Depth=2 +	incq	%r14 +	movq	stdout(%rip), %rsi +	cmpq	$1536, %r14             # imm = 0x600 +	movq	%rsi, %rdi +	jne	.LBB1_2 +# BB#5:                                 #   in Loop: Header=BB1_1 Depth=1 +	movl	$10, %edi +	callq	fputc +	addq	$6144, %rbx             # imm = 0x1800 +	jne	.LBB1_1 +# BB#6: +	addq	$8, %rsp +	popq	%rbx +	popq	%r14 +	ret +.Ltmp1: +	.size	print_array, .Ltmp1-print_array + +	.globl	main +	.align	16, 0x90 +	.type	main,@function +main:                                   # @main +# BB#0:                                 # %pollyBB +	pushq	%rbp +	movq	%rsp, %rbp +	pushq	%r15 +	pushq	%r14 +	pushq	%r13 +	pushq	%r12 +	pushq	%rbx +	subq	$56, %rsp +	movq	$A, -72(%rbp) +	movq	$B, -64(%rbp) +	movl	$init_array.omp_subfn, %edi +	leaq	-72(%rbp), %rbx +	movq	%rbx, %rsi +	xorl	%edx, %edx +	xorl	%ecx, %ecx +	movl	$1536, %r8d             # imm = 0x600 +	movl	$1, %r9d +	callq	GOMP_parallel_loop_runtime_start +	movq	%rbx, %rdi +	callq	init_array.omp_subfn +	callq	GOMP_parallel_end +	movl	$main.omp_subfn, %edi +	leaq	-96(%rbp), %rsi +	movq	$C, -96(%rbp) +	movq	$A, -88(%rbp) +	movq	$B, -80(%rbp) +	xorl	%edx, %edx +	xorl	%ecx, %ecx +	movl	$1536, %r8d             # imm = 0x600 +	movl	$1, %r9d +	callq	GOMP_parallel_loop_runtime_start +	leaq	-48(%rbp), %rdi +	leaq	-56(%rbp), %rsi +	callq	GOMP_loop_runtime_next +	testb	$1, %al +	je	.LBB2_6 +# BB#1: +	leaq	-48(%rbp), %rbx +	leaq	-56(%rbp), %r14 +	.align	16, 0x90 +.LBB2_3:                                # %omp.loadIVBounds.i +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB2_5 Depth 2 +	movq	-56(%rbp), %r15 +	decq	%r15 +	movq	-48(%rbp), %r12 +	cmpq	%r15, %r12 +	jg	.LBB2_2 +# BB#4:                                 # %polly.loop_header2.preheader.lr.ph.i +                                        #   in Loop: Header=BB2_3 Depth=1 +	leaq	(%r12,%r12,2), %rax +	shlq	$11, %rax +	leaq	C(%rax), %r13 +	.align	16, 0x90 +.LBB2_5:                                # %polly.loop_header2.preheader.i +                                        #   Parent Loop BB2_3 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movq	%r13, %rdi +	xorl	%esi, %esi +	movl	$6144, %edx             # imm = 0x1800 +	callq	memset +	addq	$6144, %r13             # imm = 0x1800 +	incq	%r12 +	cmpq	%r15, %r12 +	jle	.LBB2_5 +.LBB2_2:                                # %omp.checkNext.loopexit.i +                                        #   in Loop: Header=BB2_3 Depth=1 +	movq	%rbx, %rdi +	movq	%r14, %rsi +	callq	GOMP_loop_runtime_next +	testb	$1, %al +	jne	.LBB2_3 +.LBB2_6:                                # %main.omp_subfn.exit +	callq	GOMP_loop_end_nowait +	callq	GOMP_parallel_end +	movq	%rsp, %rax +	leaq	-32(%rax), %rbx +	movl	$main.omp_subfn1, %edi +	xorl	%ecx, %ecx +	movl	$1536, %r8d             # imm = 0x600 +	movl	$64, %r9d +	movq	%rbx, %rsp +	movq	$C, -32(%rax) +	movq	$A, -24(%rax) +	movq	$B, -16(%rax) +	movq	%rbx, %rsi +	xorl	%edx, %edx +	callq	GOMP_parallel_loop_runtime_start +	movq	%rbx, %rdi +	callq	main.omp_subfn1 +	callq	GOMP_parallel_end +	xorl	%eax, %eax +	leaq	-40(%rbp), %rsp +	popq	%rbx +	popq	%r12 +	popq	%r13 +	popq	%r14 +	popq	%r15 +	popq	%rbp +	ret +.Ltmp2: +	.size	main, .Ltmp2-main + +	.section	.rodata.cst8,"aM",@progbits,8 +	.align	8 +.LCPI3_0: +	.quad	4602678819172646912     # double 5.000000e-01 +	.text +	.align	16, 0x90 +	.type	init_array.omp_subfn,@function +init_array.omp_subfn:                   # @init_array.omp_subfn +.Leh_func_begin3: +.Ltmp6: +	.cfi_startproc +# BB#0:                                 # %omp.setup +	pushq	%r14 +.Ltmp7: +	.cfi_def_cfa_offset 16 +	pushq	%rbx +.Ltmp8: +	.cfi_def_cfa_offset 24 +	subq	$24, %rsp +.Ltmp9: +	.cfi_def_cfa_offset 48 +.Ltmp10: +	.cfi_offset 3, -24 +.Ltmp11: +	.cfi_offset 14, -16 +	leaq	16(%rsp), %rdi +	leaq	8(%rsp), %rsi +	callq	GOMP_loop_runtime_next +	testb	$1, %al +	je	.LBB3_2 +# BB#1: +	leaq	16(%rsp), %rbx +	leaq	8(%rsp), %r14 +	jmp	.LBB3_4 +.LBB3_2:                                # %omp.exit +	callq	GOMP_loop_end_nowait +	addq	$24, %rsp +	popq	%rbx +	popq	%r14 +	ret +	.align	16, 0x90 +.LBB3_3:                                # %omp.checkNext.loopexit +                                        #   in Loop: Header=BB3_4 Depth=1 +	movq	%rbx, %rdi +	movq	%r14, %rsi +	callq	GOMP_loop_runtime_next +	testb	$1, %al +	je	.LBB3_2 +.LBB3_4:                                # %omp.loadIVBounds +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB3_7 Depth 2 +                                        #       Child Loop BB3_8 Depth 3 +	movq	8(%rsp), %rax +	decq	%rax +	movq	16(%rsp), %rcx +	cmpq	%rax, %rcx +	jg	.LBB3_3 +# BB#5:                                 # %polly.loop_header2.preheader.lr.ph +                                        #   in Loop: Header=BB3_4 Depth=1 +	movq	%rcx, %rdx +	shlq	$11, %rdx +	leaq	(%rdx,%rdx,2), %rdx +	jmp	.LBB3_7 +	.align	16, 0x90 +.LBB3_6:                                # %polly.loop_header.loopexit +                                        #   in Loop: Header=BB3_7 Depth=2 +	addq	$6144, %rdx             # imm = 0x1800 +	incq	%rcx +	cmpq	%rax, %rcx +	jg	.LBB3_3 +.LBB3_7:                                # %polly.loop_header2.preheader +                                        #   Parent Loop BB3_4 Depth=1 +                                        # =>  This Loop Header: Depth=2 +                                        #       Child Loop BB3_8 Depth 3 +	movq	$-1536, %rsi            # imm = 0xFFFFFFFFFFFFFA00 +	xorl	%edi, %edi +	.align	16, 0x90 +.LBB3_8:                                # %polly.loop_body3 +                                        #   Parent Loop BB3_4 Depth=1 +                                        #     Parent Loop BB3_7 Depth=2 +                                        # =>    This Inner Loop Header: Depth=3 +	movl	%edi, %r8d +	sarl	$31, %r8d +	shrl	$22, %r8d +	addl	%edi, %r8d +	andl	$-1024, %r8d            # imm = 0xFFFFFFFFFFFFFC00 +	negl	%r8d +	leal	1(%rdi,%r8), %r8d +	cvtsi2sd	%r8d, %xmm0 +	mulsd	.LCPI3_0(%rip), %xmm0 +	cvtsd2ss	%xmm0, %xmm0 +	movss	%xmm0, A+6144(%rdx,%rsi,4) +	movss	%xmm0, B+6144(%rdx,%rsi,4) +	addl	%ecx, %edi +	incq	%rsi +	jne	.LBB3_8 +	jmp	.LBB3_6 +.Ltmp12: +	.size	init_array.omp_subfn, .Ltmp12-init_array.omp_subfn +.Ltmp13: +	.cfi_endproc +.Leh_func_end3: + +	.align	16, 0x90 +	.type	main.omp_subfn,@function +main.omp_subfn:                         # @main.omp_subfn +.Leh_func_begin4: +.Ltmp20: +	.cfi_startproc +# BB#0:                                 # %omp.setup +	pushq	%r15 +.Ltmp21: +	.cfi_def_cfa_offset 16 +	pushq	%r14 +.Ltmp22: +	.cfi_def_cfa_offset 24 +	pushq	%r13 +.Ltmp23: +	.cfi_def_cfa_offset 32 +	pushq	%r12 +.Ltmp24: +	.cfi_def_cfa_offset 40 +	pushq	%rbx +.Ltmp25: +	.cfi_def_cfa_offset 48 +	subq	$16, %rsp +.Ltmp26: +	.cfi_def_cfa_offset 64 +.Ltmp27: +	.cfi_offset 3, -48 +.Ltmp28: +	.cfi_offset 12, -40 +.Ltmp29: +	.cfi_offset 13, -32 +.Ltmp30: +	.cfi_offset 14, -24 +.Ltmp31: +	.cfi_offset 15, -16 +	leaq	8(%rsp), %rdi +	leaq	(%rsp), %rsi +	callq	GOMP_loop_runtime_next +	testb	$1, %al +	je	.LBB4_2 +# BB#1: +	leaq	8(%rsp), %rbx +	leaq	(%rsp), %r14 +	jmp	.LBB4_4 +.LBB4_2:                                # %omp.exit +	callq	GOMP_loop_end_nowait +	addq	$16, %rsp +	popq	%rbx +	popq	%r12 +	popq	%r13 +	popq	%r14 +	popq	%r15 +	ret +	.align	16, 0x90 +.LBB4_3:                                # %omp.checkNext.loopexit +                                        #   in Loop: Header=BB4_4 Depth=1 +	movq	%rbx, %rdi +	movq	%r14, %rsi +	callq	GOMP_loop_runtime_next +	testb	$1, %al +	je	.LBB4_2 +.LBB4_4:                                # %omp.loadIVBounds +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB4_6 Depth 2 +	movq	(%rsp), %r15 +	decq	%r15 +	movq	8(%rsp), %r12 +	cmpq	%r15, %r12 +	jg	.LBB4_3 +# BB#5:                                 # %polly.loop_header2.preheader.lr.ph +                                        #   in Loop: Header=BB4_4 Depth=1 +	leaq	(%r12,%r12,2), %rax +	shlq	$11, %rax +	leaq	C(%rax), %r13 +	.align	16, 0x90 +.LBB4_6:                                # %polly.loop_header2.preheader +                                        #   Parent Loop BB4_4 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movq	%r13, %rdi +	xorl	%esi, %esi +	movl	$6144, %edx             # imm = 0x1800 +	callq	memset +	addq	$6144, %r13             # imm = 0x1800 +	incq	%r12 +	cmpq	%r15, %r12 +	jle	.LBB4_6 +	jmp	.LBB4_3 +.Ltmp32: +	.size	main.omp_subfn, .Ltmp32-main.omp_subfn +.Ltmp33: +	.cfi_endproc +.Leh_func_end4: + +	.align	16, 0x90 +	.type	main.omp_subfn1,@function +main.omp_subfn1:                        # @main.omp_subfn1 +.Leh_func_begin5: +.Ltmp41: +	.cfi_startproc +# BB#0:                                 # %omp.setup +	pushq	%rbp +.Ltmp42: +	.cfi_def_cfa_offset 16 +	pushq	%r15 +.Ltmp43: +	.cfi_def_cfa_offset 24 +	pushq	%r14 +.Ltmp44: +	.cfi_def_cfa_offset 32 +	pushq	%r13 +.Ltmp45: +	.cfi_def_cfa_offset 40 +	pushq	%r12 +.Ltmp46: +	.cfi_def_cfa_offset 48 +	pushq	%rbx +.Ltmp47: +	.cfi_def_cfa_offset 56 +	subq	$40, %rsp +.Ltmp48: +	.cfi_def_cfa_offset 96 +.Ltmp49: +	.cfi_offset 3, -56 +.Ltmp50: +	.cfi_offset 12, -48 +.Ltmp51: +	.cfi_offset 13, -40 +.Ltmp52: +	.cfi_offset 14, -32 +.Ltmp53: +	.cfi_offset 15, -24 +.Ltmp54: +	.cfi_offset 6, -16 +	leaq	32(%rsp), %rdi +	leaq	24(%rsp), %rsi +	jmp	.LBB5_1 +	.align	16, 0x90 +.LBB5_4:                                # %omp.loadIVBounds +                                        #   in Loop: Header=BB5_1 Depth=1 +	movq	24(%rsp), %rax +	decq	%rax +	movq	%rax, (%rsp)            # 8-byte Spill +	movq	32(%rsp), %rcx +	cmpq	%rax, %rcx +	jg	.LBB5_3 +# BB#5:                                 # %polly.loop_header2.preheader.lr.ph +                                        #   in Loop: Header=BB5_1 Depth=1 +	leaq	(%rcx,%rcx,2), %rax +	movq	%rcx, %rdx +	shlq	$9, %rdx +	leaq	(%rdx,%rdx,2), %rdx +	movq	%rdx, 16(%rsp)          # 8-byte Spill +	shlq	$11, %rax +	leaq	A(%rax), %rax +	movq	%rax, 8(%rsp)           # 8-byte Spill +	jmp	.LBB5_7 +	.align	16, 0x90 +.LBB5_6:                                # %polly.loop_header.loopexit +                                        #   in Loop: Header=BB5_7 Depth=2 +	addq	$98304, 16(%rsp)        # 8-byte Folded Spill +                                        # imm = 0x18000 +	addq	$393216, 8(%rsp)        # 8-byte Folded Spill +                                        # imm = 0x60000 +	addq	$64, %rcx +	cmpq	(%rsp), %rcx            # 8-byte Folded Reload +	jg	.LBB5_3 +.LBB5_7:                                # %polly.loop_header2.preheader +                                        #   Parent Loop BB5_1 Depth=1 +                                        # =>  This Loop Header: Depth=2 +                                        #       Child Loop BB5_9 Depth 3 +                                        #         Child Loop BB5_11 Depth 4 +                                        #           Child Loop BB5_14 Depth 5 +                                        #             Child Loop BB5_18 Depth 6 +                                        #               Child Loop BB5_19 Depth 7 +	leaq	63(%rcx), %rax +	xorl	%edx, %edx +	jmp	.LBB5_9 +	.align	16, 0x90 +.LBB5_8:                                # %polly.loop_header2.loopexit +                                        #   in Loop: Header=BB5_9 Depth=3 +	addq	$64, %rdx +	cmpq	$1536, %rdx             # imm = 0x600 +	je	.LBB5_6 +.LBB5_9:                                # %polly.loop_header7.preheader +                                        #   Parent Loop BB5_1 Depth=1 +                                        #     Parent Loop BB5_7 Depth=2 +                                        # =>    This Loop Header: Depth=3 +                                        #         Child Loop BB5_11 Depth 4 +                                        #           Child Loop BB5_14 Depth 5 +                                        #             Child Loop BB5_18 Depth 6 +                                        #               Child Loop BB5_19 Depth 7 +	movq	16(%rsp), %rsi          # 8-byte Reload +	leaq	(%rsi,%rdx), %rsi +	leaq	63(%rdx), %rdi +	xorl	%r8d, %r8d +	movq	8(%rsp), %r9            # 8-byte Reload +	movq	%rdx, %r10 +	jmp	.LBB5_11 +	.align	16, 0x90 +.LBB5_10:                               # %polly.loop_header7.loopexit +                                        #   in Loop: Header=BB5_11 Depth=4 +	addq	$256, %r9               # imm = 0x100 +	addq	$98304, %r10            # imm = 0x18000 +	addq	$64, %r8 +	cmpq	$1536, %r8              # imm = 0x600 +	je	.LBB5_8 +.LBB5_11:                               # %polly.loop_body8 +                                        #   Parent Loop BB5_1 Depth=1 +                                        #     Parent Loop BB5_7 Depth=2 +                                        #       Parent Loop BB5_9 Depth=3 +                                        # =>      This Loop Header: Depth=4 +                                        #           Child Loop BB5_14 Depth 5 +                                        #             Child Loop BB5_18 Depth 6 +                                        #               Child Loop BB5_19 Depth 7 +	movabsq	$9223372036854775744, %r11 # imm = 0x7FFFFFFFFFFFFFC0 +	cmpq	%r11, %rcx +	jg	.LBB5_10 +# BB#12:                                # %polly.loop_body13.lr.ph +                                        #   in Loop: Header=BB5_11 Depth=4 +	leaq	63(%r8), %r11 +	movq	%rcx, %rbx +	movq	%rsi, %r14 +	movq	%r9, %r15 +	jmp	.LBB5_14 +	.align	16, 0x90 +.LBB5_13:                               # %polly.loop_header12.loopexit +                                        #   in Loop: Header=BB5_14 Depth=5 +	addq	$1536, %r14             # imm = 0x600 +	addq	$6144, %r15             # imm = 0x1800 +	incq	%rbx +	cmpq	%rax, %rbx +	jg	.LBB5_10 +.LBB5_14:                               # %polly.loop_body13 +                                        #   Parent Loop BB5_1 Depth=1 +                                        #     Parent Loop BB5_7 Depth=2 +                                        #       Parent Loop BB5_9 Depth=3 +                                        #         Parent Loop BB5_11 Depth=4 +                                        # =>        This Loop Header: Depth=5 +                                        #             Child Loop BB5_18 Depth 6 +                                        #               Child Loop BB5_19 Depth 7 +	cmpq	%r11, %r8 +	jg	.LBB5_13 +# BB#15:                                # %polly.loop_body13 +                                        #   in Loop: Header=BB5_14 Depth=5 +	cmpq	%rdi, %rdx +	jg	.LBB5_13 +# BB#16:                                # %polly.loop_body23.lr.ph.preheader +                                        #   in Loop: Header=BB5_14 Depth=5 +	xorl	%r12d, %r12d +	movq	%r10, %r13 +	jmp	.LBB5_18 +	.align	16, 0x90 +.LBB5_17:                               # %polly.loop_header17.loopexit +                                        #   in Loop: Header=BB5_18 Depth=6 +	addq	$1536, %r13             # imm = 0x600 +	incq	%r12 +	cmpq	$64, %r12 +	je	.LBB5_13 +.LBB5_18:                               # %polly.loop_body23.lr.ph +                                        #   Parent Loop BB5_1 Depth=1 +                                        #     Parent Loop BB5_7 Depth=2 +                                        #       Parent Loop BB5_9 Depth=3 +                                        #         Parent Loop BB5_11 Depth=4 +                                        #           Parent Loop BB5_14 Depth=5 +                                        # =>          This Loop Header: Depth=6 +                                        #               Child Loop BB5_19 Depth 7 +	movss	(%r15,%r12,4), %xmm0 +	pshufd	$0, %xmm0, %xmm0        # xmm0 = xmm0[0,0,0,0] +	xorl	%ebp, %ebp +	.align	16, 0x90 +.LBB5_19:                               # %polly.loop_body23 +                                        #   Parent Loop BB5_1 Depth=1 +                                        #     Parent Loop BB5_7 Depth=2 +                                        #       Parent Loop BB5_9 Depth=3 +                                        #         Parent Loop BB5_11 Depth=4 +                                        #           Parent Loop BB5_14 Depth=5 +                                        #             Parent Loop BB5_18 Depth=6 +                                        # =>            This Inner Loop Header: Depth=7 +	movaps	B(%rbp,%r13,4), %xmm1 +	mulps	%xmm0, %xmm1 +	addps	C(%rbp,%r14,4), %xmm1 +	movaps	%xmm1, C(%rbp,%r14,4) +	addq	$16, %rbp +	cmpq	$256, %rbp              # imm = 0x100 +	jne	.LBB5_19 +	jmp	.LBB5_17 +.LBB5_3:                                # %omp.checkNext.loopexit +                                        #   in Loop: Header=BB5_1 Depth=1 +	leaq	32(%rsp), %rax +	movq	%rax, %rdi +	leaq	24(%rsp), %rax +	movq	%rax, %rsi +.LBB5_1:                                # %omp.setup +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB5_7 Depth 2 +                                        #       Child Loop BB5_9 Depth 3 +                                        #         Child Loop BB5_11 Depth 4 +                                        #           Child Loop BB5_14 Depth 5 +                                        #             Child Loop BB5_18 Depth 6 +                                        #               Child Loop BB5_19 Depth 7 +	callq	GOMP_loop_runtime_next +	testb	$1, %al +	jne	.LBB5_4 +# BB#2:                                 # %omp.exit +	callq	GOMP_loop_end_nowait +	addq	$40, %rsp +	popq	%rbx +	popq	%r12 +	popq	%r13 +	popq	%r14 +	popq	%r15 +	popq	%rbp +	ret +.Ltmp55: +	.size	main.omp_subfn1, .Ltmp55-main.omp_subfn1 +.Ltmp56: +	.cfi_endproc +.Leh_func_end5: + +	.type	A,@object               # @A +	.comm	A,9437184,16 +	.type	B,@object               # @B +	.comm	B,9437184,16 +	.type	.L.str,@object          # @.str +	.section	.rodata.str1.1,"aMS",@progbits,1 +.L.str: +	.asciz	 "%lf " +	.size	.L.str, 5 + +	.type	C,@object               # @C +	.comm	C,9437184,16 + +	.section	".note.GNU-stack","",@progbits diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.exe b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.exeBinary files differ new file mode 100755 index 00000000000..fac17e21685 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.exe diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.ll b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.llBinary files differ new file mode 100644 index 00000000000..7217bc92c80 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.ll diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.s b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.s new file mode 100644 index 00000000000..a1d6f0bf9b0 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled+vector.s @@ -0,0 +1,318 @@ +	.file	"matmul.polly.interchanged+tiled+vector.ll" +	.section	.rodata.cst8,"aM",@progbits,8 +	.align	8 +.LCPI0_0: +	.quad	4602678819172646912     # double 5.000000e-01 +	.text +	.globl	init_array +	.align	16, 0x90 +	.type	init_array,@function +init_array:                             # @init_array +# BB#0:                                 # %pollyBB +	xorl	%eax, %eax +	movsd	.LCPI0_0(%rip), %xmm0 +	movq	%rax, %rcx +	.align	16, 0x90 +.LBB0_2:                                # %polly.loop_header1.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB0_3 Depth 2 +	movq	$-1536, %rdx            # imm = 0xFFFFFFFFFFFFFA00 +	xorl	%esi, %esi +	.align	16, 0x90 +.LBB0_3:                                # %polly.loop_body2 +                                        #   Parent Loop BB0_2 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movl	%esi, %edi +	sarl	$31, %edi +	shrl	$22, %edi +	addl	%esi, %edi +	andl	$-1024, %edi            # imm = 0xFFFFFFFFFFFFFC00 +	negl	%edi +	leal	1(%rsi,%rdi), %edi +	cvtsi2sd	%edi, %xmm1 +	mulsd	%xmm0, %xmm1 +	cvtsd2ss	%xmm1, %xmm1 +	movss	%xmm1, A+6144(%rax,%rdx,4) +	movss	%xmm1, B+6144(%rax,%rdx,4) +	addl	%ecx, %esi +	incq	%rdx +	jne	.LBB0_3 +# BB#1:                                 # %polly.loop_header.loopexit +                                        #   in Loop: Header=BB0_2 Depth=1 +	addq	$6144, %rax             # imm = 0x1800 +	incq	%rcx +	cmpq	$1536, %rcx             # imm = 0x600 +	jne	.LBB0_2 +# BB#4:                                 # %polly.after_loop +	ret +.Ltmp0: +	.size	init_array, .Ltmp0-init_array + +	.globl	print_array +	.align	16, 0x90 +	.type	print_array,@function +print_array:                            # @print_array +# BB#0: +	pushq	%r14 +	pushq	%rbx +	pushq	%rax +	movq	$-9437184, %rbx         # imm = 0xFFFFFFFFFF700000 +	.align	16, 0x90 +.LBB1_1:                                # %.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB1_2 Depth 2 +	xorl	%r14d, %r14d +	movq	stdout(%rip), %rdi +	.align	16, 0x90 +.LBB1_2:                                #   Parent Loop BB1_1 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movss	C+9437184(%rbx,%r14,4), %xmm0 +	cvtss2sd	%xmm0, %xmm0 +	movl	$.L.str, %esi +	movb	$1, %al +	callq	fprintf +	movslq	%r14d, %rax +	imulq	$1717986919, %rax, %rcx # imm = 0x66666667 +	movq	%rcx, %rdx +	shrq	$63, %rdx +	sarq	$37, %rcx +	addl	%edx, %ecx +	imull	$80, %ecx, %ecx +	subl	%ecx, %eax +	cmpl	$79, %eax +	jne	.LBB1_4 +# BB#3:                                 #   in Loop: Header=BB1_2 Depth=2 +	movq	stdout(%rip), %rsi +	movl	$10, %edi +	callq	fputc +.LBB1_4:                                #   in Loop: Header=BB1_2 Depth=2 +	incq	%r14 +	movq	stdout(%rip), %rsi +	cmpq	$1536, %r14             # imm = 0x600 +	movq	%rsi, %rdi +	jne	.LBB1_2 +# BB#5:                                 #   in Loop: Header=BB1_1 Depth=1 +	movl	$10, %edi +	callq	fputc +	addq	$6144, %rbx             # imm = 0x1800 +	jne	.LBB1_1 +# BB#6: +	addq	$8, %rsp +	popq	%rbx +	popq	%r14 +	ret +.Ltmp1: +	.size	print_array, .Ltmp1-print_array + +	.section	.rodata.cst8,"aM",@progbits,8 +	.align	8 +.LCPI2_0: +	.quad	4602678819172646912     # double 5.000000e-01 +	.text +	.globl	main +	.align	16, 0x90 +	.type	main,@function +main:                                   # @main +# BB#0:                                 # %pollyBB +	pushq	%rbp +	pushq	%r15 +	pushq	%r14 +	pushq	%r13 +	pushq	%r12 +	pushq	%rbx +	subq	$24, %rsp +	xorl	%eax, %eax +	movsd	.LCPI2_0(%rip), %xmm0 +	movq	%rax, %rcx +	.align	16, 0x90 +.LBB2_1:                                # %polly.loop_header1.preheader.i +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB2_2 Depth 2 +	movq	$-1536, %rdx            # imm = 0xFFFFFFFFFFFFFA00 +	xorl	%esi, %esi +	.align	16, 0x90 +.LBB2_2:                                # %polly.loop_body2.i +                                        #   Parent Loop BB2_1 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movl	%esi, %edi +	sarl	$31, %edi +	shrl	$22, %edi +	addl	%esi, %edi +	andl	$-1024, %edi            # imm = 0xFFFFFFFFFFFFFC00 +	negl	%edi +	leal	1(%rsi,%rdi), %edi +	cvtsi2sd	%edi, %xmm1 +	mulsd	%xmm0, %xmm1 +	cvtsd2ss	%xmm1, %xmm1 +	movss	%xmm1, A+6144(%rax,%rdx,4) +	movss	%xmm1, B+6144(%rax,%rdx,4) +	addl	%ecx, %esi +	incq	%rdx +	jne	.LBB2_2 +# BB#3:                                 # %polly.loop_header.loopexit.i +                                        #   in Loop: Header=BB2_1 Depth=1 +	addq	$6144, %rax             # imm = 0x1800 +	incq	%rcx +	cmpq	$1536, %rcx             # imm = 0x600 +	jne	.LBB2_1 +# BB#4:                                 # %polly.loop_header.preheader +	movl	$C, %edi +	xorl	%esi, %esi +	movl	$9437184, %edx          # imm = 0x900000 +	callq	memset +	xorl	%eax, %eax +	movq	%rax, 16(%rsp)          # 8-byte Spill +	movq	%rax, (%rsp)            # 8-byte Spill +	jmp	.LBB2_6 +	.align	16, 0x90 +.LBB2_5:                                # %polly.loop_header7.loopexit +                                        #   in Loop: Header=BB2_6 Depth=1 +	addq	$393216, (%rsp)         # 8-byte Folded Spill +                                        # imm = 0x60000 +	movq	16(%rsp), %rax          # 8-byte Reload +	addq	$64, %rax +	movq	%rax, 16(%rsp)          # 8-byte Spill +	cmpq	$1536, %rax             # imm = 0x600 +	je	.LBB2_7 +.LBB2_6:                                # %polly.loop_header12.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB2_9 Depth 2 +                                        #       Child Loop BB2_11 Depth 3 +                                        #         Child Loop BB2_14 Depth 4 +                                        #           Child Loop BB2_18 Depth 5 +                                        #             Child Loop BB2_19 Depth 6 +	movq	16(%rsp), %rax          # 8-byte Reload +	leaq	63(%rax), %rax +	movq	(%rsp), %rcx            # 8-byte Reload +	leaq	A(%rcx), %rdx +	movq	%rdx, 8(%rsp)           # 8-byte Spill +	xorl	%edx, %edx +	jmp	.LBB2_9 +	.align	16, 0x90 +.LBB2_8:                                # %polly.loop_header12.loopexit +                                        #   in Loop: Header=BB2_9 Depth=2 +	addq	$256, %rcx              # imm = 0x100 +	addq	$64, %rdx +	cmpq	$1536, %rdx             # imm = 0x600 +	je	.LBB2_5 +.LBB2_9:                                # %polly.loop_header17.preheader +                                        #   Parent Loop BB2_6 Depth=1 +                                        # =>  This Loop Header: Depth=2 +                                        #       Child Loop BB2_11 Depth 3 +                                        #         Child Loop BB2_14 Depth 4 +                                        #           Child Loop BB2_18 Depth 5 +                                        #             Child Loop BB2_19 Depth 6 +	leaq	63(%rdx), %rsi +	xorl	%edi, %edi +	movq	8(%rsp), %r8            # 8-byte Reload +	movq	%rdx, %r9 +	jmp	.LBB2_11 +	.align	16, 0x90 +.LBB2_10:                               # %polly.loop_header17.loopexit +                                        #   in Loop: Header=BB2_11 Depth=3 +	addq	$256, %r8               # imm = 0x100 +	addq	$98304, %r9             # imm = 0x18000 +	addq	$64, %rdi +	cmpq	$1536, %rdi             # imm = 0x600 +	je	.LBB2_8 +.LBB2_11:                               # %polly.loop_body18 +                                        #   Parent Loop BB2_6 Depth=1 +                                        #     Parent Loop BB2_9 Depth=2 +                                        # =>    This Loop Header: Depth=3 +                                        #         Child Loop BB2_14 Depth 4 +                                        #           Child Loop BB2_18 Depth 5 +                                        #             Child Loop BB2_19 Depth 6 +	cmpq	%rax, 16(%rsp)          # 8-byte Folded Reload +	jg	.LBB2_10 +# BB#12:                                # %polly.loop_body23.lr.ph +                                        #   in Loop: Header=BB2_11 Depth=3 +	leaq	63(%rdi), %r10 +	xorl	%r11d, %r11d +	jmp	.LBB2_14 +	.align	16, 0x90 +.LBB2_13:                               # %polly.loop_header22.loopexit +                                        #   in Loop: Header=BB2_14 Depth=4 +	addq	$6144, %r11             # imm = 0x1800 +	cmpq	$393216, %r11           # imm = 0x60000 +	je	.LBB2_10 +.LBB2_14:                               # %polly.loop_body23 +                                        #   Parent Loop BB2_6 Depth=1 +                                        #     Parent Loop BB2_9 Depth=2 +                                        #       Parent Loop BB2_11 Depth=3 +                                        # =>      This Loop Header: Depth=4 +                                        #           Child Loop BB2_18 Depth 5 +                                        #             Child Loop BB2_19 Depth 6 +	cmpq	%r10, %rdi +	jg	.LBB2_13 +# BB#15:                                # %polly.loop_body23 +                                        #   in Loop: Header=BB2_14 Depth=4 +	cmpq	%rsi, %rdx +	jg	.LBB2_13 +# BB#16:                                # %polly.loop_body33.lr.ph.preheader +                                        #   in Loop: Header=BB2_14 Depth=4 +	leaq	(%r8,%r11), %rbx +	xorl	%r14d, %r14d +	movq	%r9, %r15 +	movq	%r14, %r12 +	jmp	.LBB2_18 +	.align	16, 0x90 +.LBB2_17:                               # %polly.loop_header27.loopexit +                                        #   in Loop: Header=BB2_18 Depth=5 +	addq	$1536, %r15             # imm = 0x600 +	incq	%r12 +	cmpq	$64, %r12 +	je	.LBB2_13 +.LBB2_18:                               # %polly.loop_body33.lr.ph +                                        #   Parent Loop BB2_6 Depth=1 +                                        #     Parent Loop BB2_9 Depth=2 +                                        #       Parent Loop BB2_11 Depth=3 +                                        #         Parent Loop BB2_14 Depth=4 +                                        # =>        This Loop Header: Depth=5 +                                        #             Child Loop BB2_19 Depth 6 +	movss	(%rbx,%r12,4), %xmm0 +	pshufd	$0, %xmm0, %xmm0        # xmm0 = xmm0[0,0,0,0] +	movq	%r14, %r13 +	.align	16, 0x90 +.LBB2_19:                               # %polly.loop_body33 +                                        #   Parent Loop BB2_6 Depth=1 +                                        #     Parent Loop BB2_9 Depth=2 +                                        #       Parent Loop BB2_11 Depth=3 +                                        #         Parent Loop BB2_14 Depth=4 +                                        #           Parent Loop BB2_18 Depth=5 +                                        # =>          This Inner Loop Header: Depth=6 +	movaps	B(%r13,%r15,4), %xmm1 +	mulps	%xmm0, %xmm1 +	leaq	(%r11,%r13), %rbp +	addps	C(%rcx,%rbp), %xmm1 +	movaps	%xmm1, C(%rcx,%rbp) +	addq	$16, %r13 +	cmpq	$256, %r13              # imm = 0x100 +	jne	.LBB2_19 +	jmp	.LBB2_17 +.LBB2_7:                                # %polly.after_loop9 +	xorl	%eax, %eax +	addq	$24, %rsp +	popq	%rbx +	popq	%r12 +	popq	%r13 +	popq	%r14 +	popq	%r15 +	popq	%rbp +	ret +.Ltmp2: +	.size	main, .Ltmp2-main + +	.type	A,@object               # @A +	.comm	A,9437184,16 +	.type	B,@object               # @B +	.comm	B,9437184,16 +	.type	.L.str,@object          # @.str +	.section	.rodata.str1.1,"aMS",@progbits,1 +.L.str: +	.asciz	 "%lf " +	.size	.L.str, 5 + +	.type	C,@object               # @C +	.comm	C,9437184,16 + +	.section	".note.GNU-stack","",@progbits diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.exe b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.exeBinary files differ new file mode 100755 index 00000000000..4334522f458 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.exe diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.ll b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.llBinary files differ new file mode 100644 index 00000000000..fa301cfa5eb --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.ll diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.s b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.s new file mode 100644 index 00000000000..0f86df25d35 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged+tiled.s @@ -0,0 +1,323 @@ +	.file	"matmul.polly.interchanged+tiled.ll" +	.section	.rodata.cst8,"aM",@progbits,8 +	.align	8 +.LCPI0_0: +	.quad	4602678819172646912     # double 5.000000e-01 +	.text +	.globl	init_array +	.align	16, 0x90 +	.type	init_array,@function +init_array:                             # @init_array +# BB#0:                                 # %pollyBB +	xorl	%eax, %eax +	movsd	.LCPI0_0(%rip), %xmm0 +	movq	%rax, %rcx +	.align	16, 0x90 +.LBB0_2:                                # %polly.loop_header1.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB0_3 Depth 2 +	movq	$-1536, %rdx            # imm = 0xFFFFFFFFFFFFFA00 +	xorl	%esi, %esi +	.align	16, 0x90 +.LBB0_3:                                # %polly.loop_body2 +                                        #   Parent Loop BB0_2 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movl	%esi, %edi +	sarl	$31, %edi +	shrl	$22, %edi +	addl	%esi, %edi +	andl	$-1024, %edi            # imm = 0xFFFFFFFFFFFFFC00 +	negl	%edi +	leal	1(%rsi,%rdi), %edi +	cvtsi2sd	%edi, %xmm1 +	mulsd	%xmm0, %xmm1 +	cvtsd2ss	%xmm1, %xmm1 +	movss	%xmm1, A+6144(%rax,%rdx,4) +	movss	%xmm1, B+6144(%rax,%rdx,4) +	addl	%ecx, %esi +	incq	%rdx +	jne	.LBB0_3 +# BB#1:                                 # %polly.loop_header.loopexit +                                        #   in Loop: Header=BB0_2 Depth=1 +	addq	$6144, %rax             # imm = 0x1800 +	incq	%rcx +	cmpq	$1536, %rcx             # imm = 0x600 +	jne	.LBB0_2 +# BB#4:                                 # %polly.after_loop +	ret +.Ltmp0: +	.size	init_array, .Ltmp0-init_array + +	.globl	print_array +	.align	16, 0x90 +	.type	print_array,@function +print_array:                            # @print_array +# BB#0: +	pushq	%r14 +	pushq	%rbx +	pushq	%rax +	movq	$-9437184, %rbx         # imm = 0xFFFFFFFFFF700000 +	.align	16, 0x90 +.LBB1_1:                                # %.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB1_2 Depth 2 +	xorl	%r14d, %r14d +	movq	stdout(%rip), %rdi +	.align	16, 0x90 +.LBB1_2:                                #   Parent Loop BB1_1 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movss	C+9437184(%rbx,%r14,4), %xmm0 +	cvtss2sd	%xmm0, %xmm0 +	movl	$.L.str, %esi +	movb	$1, %al +	callq	fprintf +	movslq	%r14d, %rax +	imulq	$1717986919, %rax, %rcx # imm = 0x66666667 +	movq	%rcx, %rdx +	shrq	$63, %rdx +	sarq	$37, %rcx +	addl	%edx, %ecx +	imull	$80, %ecx, %ecx +	subl	%ecx, %eax +	cmpl	$79, %eax +	jne	.LBB1_4 +# BB#3:                                 #   in Loop: Header=BB1_2 Depth=2 +	movq	stdout(%rip), %rsi +	movl	$10, %edi +	callq	fputc +.LBB1_4:                                #   in Loop: Header=BB1_2 Depth=2 +	incq	%r14 +	movq	stdout(%rip), %rsi +	cmpq	$1536, %r14             # imm = 0x600 +	movq	%rsi, %rdi +	jne	.LBB1_2 +# BB#5:                                 #   in Loop: Header=BB1_1 Depth=1 +	movl	$10, %edi +	callq	fputc +	addq	$6144, %rbx             # imm = 0x1800 +	jne	.LBB1_1 +# BB#6: +	addq	$8, %rsp +	popq	%rbx +	popq	%r14 +	ret +.Ltmp1: +	.size	print_array, .Ltmp1-print_array + +	.section	.rodata.cst8,"aM",@progbits,8 +	.align	8 +.LCPI2_0: +	.quad	4602678819172646912     # double 5.000000e-01 +	.text +	.globl	main +	.align	16, 0x90 +	.type	main,@function +main:                                   # @main +# BB#0:                                 # %pollyBB +	pushq	%rbp +	pushq	%r15 +	pushq	%r14 +	pushq	%r13 +	pushq	%r12 +	pushq	%rbx +	subq	$40, %rsp +	xorl	%eax, %eax +	movsd	.LCPI2_0(%rip), %xmm0 +	movq	%rax, %rcx +	.align	16, 0x90 +.LBB2_1:                                # %polly.loop_header1.preheader.i +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB2_2 Depth 2 +	movq	$-1536, %rdx            # imm = 0xFFFFFFFFFFFFFA00 +	xorl	%esi, %esi +	.align	16, 0x90 +.LBB2_2:                                # %polly.loop_body2.i +                                        #   Parent Loop BB2_1 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movl	%esi, %edi +	sarl	$31, %edi +	shrl	$22, %edi +	addl	%esi, %edi +	andl	$-1024, %edi            # imm = 0xFFFFFFFFFFFFFC00 +	negl	%edi +	leal	1(%rsi,%rdi), %edi +	cvtsi2sd	%edi, %xmm1 +	mulsd	%xmm0, %xmm1 +	cvtsd2ss	%xmm1, %xmm1 +	movss	%xmm1, A+6144(%rax,%rdx,4) +	movss	%xmm1, B+6144(%rax,%rdx,4) +	addl	%ecx, %esi +	incq	%rdx +	jne	.LBB2_2 +# BB#3:                                 # %polly.loop_header.loopexit.i +                                        #   in Loop: Header=BB2_1 Depth=1 +	addq	$6144, %rax             # imm = 0x1800 +	incq	%rcx +	cmpq	$1536, %rcx             # imm = 0x600 +	jne	.LBB2_1 +# BB#4:                                 # %polly.loop_header.preheader +	movl	$C, %eax +	movq	%rax, 8(%rsp)           # 8-byte Spill +	xorl	%esi, %esi +	movl	$9437184, %edx          # imm = 0x900000 +	movl	$C, %edi +	callq	memset +	movl	$A, %eax +	movq	%rax, 16(%rsp)          # 8-byte Spill +	movq	$0, 32(%rsp)            # 8-byte Folded Spill +	jmp	.LBB2_6 +	.align	16, 0x90 +.LBB2_5:                                # %polly.loop_header7.loopexit +                                        #   in Loop: Header=BB2_6 Depth=1 +	addq	$393216, 16(%rsp)       # 8-byte Folded Spill +                                        # imm = 0x60000 +	addq	$393216, 8(%rsp)        # 8-byte Folded Spill +                                        # imm = 0x60000 +	movq	32(%rsp), %rax          # 8-byte Reload +	addq	$64, %rax +	movq	%rax, 32(%rsp)          # 8-byte Spill +	cmpq	$1536, %rax             # imm = 0x600 +	je	.LBB2_7 +.LBB2_6:                                # %polly.loop_header12.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB2_9 Depth 2 +                                        #       Child Loop BB2_11 Depth 3 +                                        #         Child Loop BB2_14 Depth 4 +                                        #           Child Loop BB2_18 Depth 5 +                                        #             Child Loop BB2_19 Depth 6 +	movq	32(%rsp), %rax          # 8-byte Reload +	leaq	63(%rax), %rax +	movl	$B, %ecx +	movq	%rcx, 24(%rsp)          # 8-byte Spill +	xorl	%ecx, %ecx +	movq	8(%rsp), %rdx           # 8-byte Reload +	jmp	.LBB2_9 +	.align	16, 0x90 +.LBB2_8:                                # %polly.loop_header12.loopexit +                                        #   in Loop: Header=BB2_9 Depth=2 +	addq	$256, %rdx              # imm = 0x100 +	addq	$256, 24(%rsp)          # 8-byte Folded Spill +                                        # imm = 0x100 +	addq	$64, %rcx +	cmpq	$1536, %rcx             # imm = 0x600 +	je	.LBB2_5 +.LBB2_9:                                # %polly.loop_header17.preheader +                                        #   Parent Loop BB2_6 Depth=1 +                                        # =>  This Loop Header: Depth=2 +                                        #       Child Loop BB2_11 Depth 3 +                                        #         Child Loop BB2_14 Depth 4 +                                        #           Child Loop BB2_18 Depth 5 +                                        #             Child Loop BB2_19 Depth 6 +	leaq	63(%rcx), %rsi +	xorl	%edi, %edi +	movq	16(%rsp), %r8           # 8-byte Reload +	movq	24(%rsp), %r9           # 8-byte Reload +	jmp	.LBB2_11 +	.align	16, 0x90 +.LBB2_10:                               # %polly.loop_header17.loopexit +                                        #   in Loop: Header=BB2_11 Depth=3 +	addq	$256, %r8               # imm = 0x100 +	addq	$393216, %r9            # imm = 0x60000 +	addq	$64, %rdi +	cmpq	$1536, %rdi             # imm = 0x600 +	je	.LBB2_8 +.LBB2_11:                               # %polly.loop_body18 +                                        #   Parent Loop BB2_6 Depth=1 +                                        #     Parent Loop BB2_9 Depth=2 +                                        # =>    This Loop Header: Depth=3 +                                        #         Child Loop BB2_14 Depth 4 +                                        #           Child Loop BB2_18 Depth 5 +                                        #             Child Loop BB2_19 Depth 6 +	cmpq	%rax, 32(%rsp)          # 8-byte Folded Reload +	jg	.LBB2_10 +# BB#12:                                # %polly.loop_body23.lr.ph +                                        #   in Loop: Header=BB2_11 Depth=3 +	leaq	63(%rdi), %r10 +	xorl	%r11d, %r11d +	jmp	.LBB2_14 +	.align	16, 0x90 +.LBB2_13:                               # %polly.loop_header22.loopexit +                                        #   in Loop: Header=BB2_14 Depth=4 +	addq	$6144, %r11             # imm = 0x1800 +	cmpq	$393216, %r11           # imm = 0x60000 +	je	.LBB2_10 +.LBB2_14:                               # %polly.loop_body23 +                                        #   Parent Loop BB2_6 Depth=1 +                                        #     Parent Loop BB2_9 Depth=2 +                                        #       Parent Loop BB2_11 Depth=3 +                                        # =>      This Loop Header: Depth=4 +                                        #           Child Loop BB2_18 Depth 5 +                                        #             Child Loop BB2_19 Depth 6 +	cmpq	%r10, %rdi +	jg	.LBB2_13 +# BB#15:                                # %polly.loop_body23 +                                        #   in Loop: Header=BB2_14 Depth=4 +	cmpq	%rsi, %rcx +	jg	.LBB2_13 +# BB#16:                                # %polly.loop_body33.lr.ph.preheader +                                        #   in Loop: Header=BB2_14 Depth=4 +	leaq	(%rdx,%r11), %rbx +	leaq	(%r8,%r11), %r14 +	xorl	%r15d, %r15d +	movq	%r9, %r12 +	movq	%r15, %r13 +	jmp	.LBB2_18 +	.align	16, 0x90 +.LBB2_17:                               # %polly.loop_header27.loopexit +                                        #   in Loop: Header=BB2_18 Depth=5 +	addq	$6144, %r12             # imm = 0x1800 +	incq	%r13 +	cmpq	$64, %r13 +	je	.LBB2_13 +.LBB2_18:                               # %polly.loop_body33.lr.ph +                                        #   Parent Loop BB2_6 Depth=1 +                                        #     Parent Loop BB2_9 Depth=2 +                                        #       Parent Loop BB2_11 Depth=3 +                                        #         Parent Loop BB2_14 Depth=4 +                                        # =>        This Loop Header: Depth=5 +                                        #             Child Loop BB2_19 Depth 6 +	movss	(%r14,%r13,4), %xmm0 +	movq	%r15, %rbp +	.align	16, 0x90 +.LBB2_19:                               # %polly.loop_body33 +                                        #   Parent Loop BB2_6 Depth=1 +                                        #     Parent Loop BB2_9 Depth=2 +                                        #       Parent Loop BB2_11 Depth=3 +                                        #         Parent Loop BB2_14 Depth=4 +                                        #           Parent Loop BB2_18 Depth=5 +                                        # =>          This Inner Loop Header: Depth=6 +	movss	(%r12,%rbp,4), %xmm1 +	mulss	%xmm0, %xmm1 +	addss	(%rbx,%rbp,4), %xmm1 +	movss	%xmm1, (%rbx,%rbp,4) +	incq	%rbp +	cmpq	$64, %rbp +	jne	.LBB2_19 +	jmp	.LBB2_17 +.LBB2_7:                                # %polly.after_loop9 +	xorl	%eax, %eax +	addq	$40, %rsp +	popq	%rbx +	popq	%r12 +	popq	%r13 +	popq	%r14 +	popq	%r15 +	popq	%rbp +	ret +.Ltmp2: +	.size	main, .Ltmp2-main + +	.type	A,@object               # @A +	.comm	A,9437184,16 +	.type	B,@object               # @B +	.comm	B,9437184,16 +	.type	.L.str,@object          # @.str +	.section	.rodata.str1.1,"aMS",@progbits,1 +.L.str: +	.asciz	 "%lf " +	.size	.L.str, 5 + +	.type	C,@object               # @C +	.comm	C,9437184,16 + +	.section	".note.GNU-stack","",@progbits diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged.exe b/polly/www/experiments/matmul/matmul.polly.interchanged.exeBinary files differ new file mode 100755 index 00000000000..cc125c4b2b1 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged.exe diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged.ll b/polly/www/experiments/matmul/matmul.polly.interchanged.llBinary files differ new file mode 100644 index 00000000000..c0a54bb64f4 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged.ll diff --git a/polly/www/experiments/matmul/matmul.polly.interchanged.s b/polly/www/experiments/matmul/matmul.polly.interchanged.s new file mode 100644 index 00000000000..8bbc523f764 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.polly.interchanged.s @@ -0,0 +1,217 @@ +	.file	"matmul.polly.interchanged.ll" +	.section	.rodata.cst8,"aM",@progbits,8 +	.align	8 +.LCPI0_0: +	.quad	4602678819172646912     # double 5.000000e-01 +	.text +	.globl	init_array +	.align	16, 0x90 +	.type	init_array,@function +init_array:                             # @init_array +# BB#0:                                 # %pollyBB +	xorl	%eax, %eax +	movsd	.LCPI0_0(%rip), %xmm0 +	movq	%rax, %rcx +	.align	16, 0x90 +.LBB0_2:                                # %polly.loop_header1.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB0_3 Depth 2 +	movq	$-1536, %rdx            # imm = 0xFFFFFFFFFFFFFA00 +	xorl	%esi, %esi +	.align	16, 0x90 +.LBB0_3:                                # %polly.loop_body2 +                                        #   Parent Loop BB0_2 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movl	%esi, %edi +	sarl	$31, %edi +	shrl	$22, %edi +	addl	%esi, %edi +	andl	$-1024, %edi            # imm = 0xFFFFFFFFFFFFFC00 +	negl	%edi +	leal	1(%rsi,%rdi), %edi +	cvtsi2sd	%edi, %xmm1 +	mulsd	%xmm0, %xmm1 +	cvtsd2ss	%xmm1, %xmm1 +	movss	%xmm1, A+6144(%rax,%rdx,4) +	movss	%xmm1, B+6144(%rax,%rdx,4) +	addl	%ecx, %esi +	incq	%rdx +	jne	.LBB0_3 +# BB#1:                                 # %polly.loop_header.loopexit +                                        #   in Loop: Header=BB0_2 Depth=1 +	addq	$6144, %rax             # imm = 0x1800 +	incq	%rcx +	cmpq	$1536, %rcx             # imm = 0x600 +	jne	.LBB0_2 +# BB#4:                                 # %polly.after_loop +	ret +.Ltmp0: +	.size	init_array, .Ltmp0-init_array + +	.globl	print_array +	.align	16, 0x90 +	.type	print_array,@function +print_array:                            # @print_array +# BB#0: +	pushq	%r14 +	pushq	%rbx +	pushq	%rax +	movq	$-9437184, %rbx         # imm = 0xFFFFFFFFFF700000 +	.align	16, 0x90 +.LBB1_1:                                # %.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB1_2 Depth 2 +	xorl	%r14d, %r14d +	movq	stdout(%rip), %rdi +	.align	16, 0x90 +.LBB1_2:                                #   Parent Loop BB1_1 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movss	C+9437184(%rbx,%r14,4), %xmm0 +	cvtss2sd	%xmm0, %xmm0 +	movl	$.L.str, %esi +	movb	$1, %al +	callq	fprintf +	movslq	%r14d, %rax +	imulq	$1717986919, %rax, %rcx # imm = 0x66666667 +	movq	%rcx, %rdx +	shrq	$63, %rdx +	sarq	$37, %rcx +	addl	%edx, %ecx +	imull	$80, %ecx, %ecx +	subl	%ecx, %eax +	cmpl	$79, %eax +	jne	.LBB1_4 +# BB#3:                                 #   in Loop: Header=BB1_2 Depth=2 +	movq	stdout(%rip), %rsi +	movl	$10, %edi +	callq	fputc +.LBB1_4:                                #   in Loop: Header=BB1_2 Depth=2 +	incq	%r14 +	movq	stdout(%rip), %rsi +	cmpq	$1536, %r14             # imm = 0x600 +	movq	%rsi, %rdi +	jne	.LBB1_2 +# BB#5:                                 #   in Loop: Header=BB1_1 Depth=1 +	movl	$10, %edi +	callq	fputc +	addq	$6144, %rbx             # imm = 0x1800 +	jne	.LBB1_1 +# BB#6: +	addq	$8, %rsp +	popq	%rbx +	popq	%r14 +	ret +.Ltmp1: +	.size	print_array, .Ltmp1-print_array + +	.section	.rodata.cst8,"aM",@progbits,8 +	.align	8 +.LCPI2_0: +	.quad	4602678819172646912     # double 5.000000e-01 +	.text +	.globl	main +	.align	16, 0x90 +	.type	main,@function +main:                                   # @main +# BB#0:                                 # %pollyBB +	pushq	%rax +	xorl	%eax, %eax +	movsd	.LCPI2_0(%rip), %xmm0 +	movq	%rax, %rcx +	.align	16, 0x90 +.LBB2_1:                                # %polly.loop_header1.preheader.i +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB2_2 Depth 2 +	movq	$-1536, %rdx            # imm = 0xFFFFFFFFFFFFFA00 +	xorl	%esi, %esi +	.align	16, 0x90 +.LBB2_2:                                # %polly.loop_body2.i +                                        #   Parent Loop BB2_1 Depth=1 +                                        # =>  This Inner Loop Header: Depth=2 +	movl	%esi, %edi +	sarl	$31, %edi +	shrl	$22, %edi +	addl	%esi, %edi +	andl	$-1024, %edi            # imm = 0xFFFFFFFFFFFFFC00 +	negl	%edi +	leal	1(%rsi,%rdi), %edi +	cvtsi2sd	%edi, %xmm1 +	mulsd	%xmm0, %xmm1 +	cvtsd2ss	%xmm1, %xmm1 +	movss	%xmm1, A+6144(%rax,%rdx,4) +	movss	%xmm1, B+6144(%rax,%rdx,4) +	addl	%ecx, %esi +	incq	%rdx +	jne	.LBB2_2 +# BB#3:                                 # %polly.loop_header.loopexit.i +                                        #   in Loop: Header=BB2_1 Depth=1 +	addq	$6144, %rax             # imm = 0x1800 +	incq	%rcx +	cmpq	$1536, %rcx             # imm = 0x600 +	jne	.LBB2_1 +# BB#4:                                 # %polly.loop_header.preheader +	movl	$C, %edi +	xorl	%esi, %esi +	movl	$9437184, %edx          # imm = 0x900000 +	callq	memset +	xorl	%eax, %eax +	jmp	.LBB2_6 +	.align	16, 0x90 +.LBB2_5:                                # %polly.loop_header7.loopexit +                                        #   in Loop: Header=BB2_6 Depth=1 +	addq	$6144, %rax             # imm = 0x1800 +	cmpq	$9437184, %rax          # imm = 0x900000 +	je	.LBB2_7 +.LBB2_6:                                # %polly.loop_header12.preheader +                                        # =>This Loop Header: Depth=1 +                                        #     Child Loop BB2_9 Depth 2 +                                        #       Child Loop BB2_10 Depth 3 +	leaq	A(%rax), %rcx +	movq	$-9437184, %rdx         # imm = 0xFFFFFFFFFF700000 +	jmp	.LBB2_9 +	.align	16, 0x90 +.LBB2_8:                                # %polly.loop_header12.loopexit +                                        #   in Loop: Header=BB2_9 Depth=2 +	addq	$4, %rcx +	addq	$6144, %rdx             # imm = 0x1800 +	je	.LBB2_5 +.LBB2_9:                                # %polly.loop_header17.preheader +                                        #   Parent Loop BB2_6 Depth=1 +                                        # =>  This Loop Header: Depth=2 +                                        #       Child Loop BB2_10 Depth 3 +	movss	(%rcx), %xmm0 +	xorl	%esi, %esi +	.align	16, 0x90 +.LBB2_10:                               # %polly.loop_body18 +                                        #   Parent Loop BB2_6 Depth=1 +                                        #     Parent Loop BB2_9 Depth=2 +                                        # =>    This Inner Loop Header: Depth=3 +	movss	B+9437184(%rdx,%rsi,4), %xmm1 +	mulss	%xmm0, %xmm1 +	addss	C(%rax,%rsi,4), %xmm1 +	movss	%xmm1, C(%rax,%rsi,4) +	incq	%rsi +	cmpq	$1536, %rsi             # imm = 0x600 +	jne	.LBB2_10 +	jmp	.LBB2_8 +.LBB2_7:                                # %polly.after_loop9 +	xorl	%eax, %eax +	popq	%rdx +	ret +.Ltmp2: +	.size	main, .Ltmp2-main + +	.type	A,@object               # @A +	.comm	A,9437184,16 +	.type	B,@object               # @B +	.comm	B,9437184,16 +	.type	.L.str,@object          # @.str +	.section	.rodata.str1.1,"aMS",@progbits,1 +.L.str: +	.asciz	 "%lf " +	.size	.L.str, 5 + +	.type	C,@object               # @C +	.comm	C,9437184,16 + +	.section	".note.GNU-stack","",@progbits diff --git a/polly/www/experiments/matmul/matmul.preopt.ll b/polly/www/experiments/matmul/matmul.preopt.ll new file mode 100644 index 00000000000..9287d7e141b --- /dev/null +++ b/polly/www/experiments/matmul/matmul.preopt.ll @@ -0,0 +1,180 @@ +; ModuleID = 'matmul.s' +target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" +target triple = "x86_64-unknown-linux-gnu" + +%struct._IO_FILE = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %struct._IO_marker*, %struct._IO_FILE*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] } +%struct._IO_marker = type { %struct._IO_marker*, %struct._IO_FILE*, i32 } + +@A = common global [1536 x [1536 x float]] zeroinitializer, align 16 +@B = common global [1536 x [1536 x float]] zeroinitializer, align 16 +@stdout = external global %struct._IO_FILE* +@.str = private unnamed_addr constant [5 x i8] c"%lf \00" +@C = common global [1536 x [1536 x float]] zeroinitializer, align 16 +@.str1 = private unnamed_addr constant [2 x i8] c"\0A\00" + +define void @init_array() nounwind { +; <label>:0 +  br label %1 + +; <label>:1                                       ; preds = %18, %0 +  %2 = phi i64 [ %indvar.next2, %18 ], [ 0, %0 ] +  %exitcond5 = icmp ne i64 %2, 1536 +  br i1 %exitcond5, label %3, label %19 + +; <label>:3                                       ; preds = %1 +  br label %4 + +; <label>:4                                       ; preds = %16, %3 +  %indvar = phi i64 [ %indvar.next, %16 ], [ 0, %3 ] +  %scevgep4 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %2, i64 %indvar +  %scevgep = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %2, i64 %indvar +  %tmp = mul i64 %2, %indvar +  %tmp3 = trunc i64 %tmp to i32 +  %exitcond = icmp ne i64 %indvar, 1536 +  br i1 %exitcond, label %5, label %17 + +; <label>:5                                       ; preds = %4 +  %6 = srem i32 %tmp3, 1024 +  %7 = add nsw i32 1, %6 +  %8 = sitofp i32 %7 to double +  %9 = fdiv double %8, 2.000000e+00 +  %10 = fptrunc double %9 to float +  store float %10, float* %scevgep4 +  %11 = srem i32 %tmp3, 1024 +  %12 = add nsw i32 1, %11 +  %13 = sitofp i32 %12 to double +  %14 = fdiv double %13, 2.000000e+00 +  %15 = fptrunc double %14 to float +  store float %15, float* %scevgep +  br label %16 + +; <label>:16                                      ; preds = %5 +  %indvar.next = add i64 %indvar, 1 +  br label %4 + +; <label>:17                                      ; preds = %4 +  br label %18 + +; <label>:18                                      ; preds = %17 +  %indvar.next2 = add i64 %2, 1 +  br label %1 + +; <label>:19                                      ; preds = %1 +  ret void +} + +define void @print_array() nounwind { +; <label>:0 +  br label %1 + +; <label>:1                                       ; preds = %19, %0 +  %indvar1 = phi i64 [ %indvar.next2, %19 ], [ 0, %0 ] +  %exitcond3 = icmp ne i64 %indvar1, 1536 +  br i1 %exitcond3, label %2, label %20 + +; <label>:2                                       ; preds = %1 +  br label %3 + +; <label>:3                                       ; preds = %15, %2 +  %indvar = phi i64 [ %indvar.next, %15 ], [ 0, %2 ] +  %scevgep = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar1, i64 %indvar +  %j.0 = trunc i64 %indvar to i32 +  %exitcond = icmp ne i64 %indvar, 1536 +  br i1 %exitcond, label %4, label %16 + +; <label>:4                                       ; preds = %3 +  %5 = load %struct._IO_FILE** @stdout, align 8 +  %6 = load float* %scevgep +  %7 = fpext float %6 to double +  %8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %5, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %7) +  %9 = srem i32 %j.0, 80 +  %10 = icmp eq i32 %9, 79 +  br i1 %10, label %11, label %14 + +; <label>:11                                      ; preds = %4 +  %12 = load %struct._IO_FILE** @stdout, align 8 +  %13 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %12, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) +  br label %14 + +; <label>:14                                      ; preds = %11, %4 +  br label %15 + +; <label>:15                                      ; preds = %14 +  %indvar.next = add i64 %indvar, 1 +  br label %3 + +; <label>:16                                      ; preds = %3 +  %17 = load %struct._IO_FILE** @stdout, align 8 +  %18 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %17, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) +  br label %19 + +; <label>:19                                      ; preds = %16 +  %indvar.next2 = add i64 %indvar1, 1 +  br label %1 + +; <label>:20                                      ; preds = %1 +  ret void +} + +declare i32 @fprintf(%struct._IO_FILE*, i8*, ...) + +define i32 @main() nounwind { +; <label>:0 +  call void @init_array() +  br label %1 + +; <label>:1                                       ; preds = %16, %0 +  %indvar3 = phi i64 [ %indvar.next4, %16 ], [ 0, %0 ] +  %exitcond9 = icmp ne i64 %indvar3, 1536 +  br i1 %exitcond9, label %2, label %17 + +; <label>:2                                       ; preds = %1 +  br label %3 + +; <label>:3                                       ; preds = %14, %2 +  %indvar1 = phi i64 [ %indvar.next2, %14 ], [ 0, %2 ] +  %scevgep8 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar3, i64 %indvar1 +  %exitcond6 = icmp ne i64 %indvar1, 1536 +  br i1 %exitcond6, label %4, label %15 + +; <label>:4                                       ; preds = %3 +  store float 0.000000e+00, float* %scevgep8 +  br label %5 + +; <label>:5                                       ; preds = %12, %4 +  %indvar = phi i64 [ %indvar.next, %12 ], [ 0, %4 ] +  %scevgep5 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %indvar3, i64 %indvar +  %scevgep = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %indvar, i64 %indvar1 +  %exitcond = icmp ne i64 %indvar, 1536 +  br i1 %exitcond, label %6, label %13 + +; <label>:6                                       ; preds = %5 +  %7 = load float* %scevgep8 +  %8 = load float* %scevgep5 +  %9 = load float* %scevgep +  %10 = fmul float %8, %9 +  %11 = fadd float %7, %10 +  store float %11, float* %scevgep8 +  br label %12 + +; <label>:12                                      ; preds = %6 +  %indvar.next = add i64 %indvar, 1 +  br label %5 + +; <label>:13                                      ; preds = %5 +  br label %14 + +; <label>:14                                      ; preds = %13 +  %indvar.next2 = add i64 %indvar1, 1 +  br label %3 + +; <label>:15                                      ; preds = %3 +  br label %16 + +; <label>:16                                      ; preds = %15 +  %indvar.next4 = add i64 %indvar3, 1 +  br label %1 + +; <label>:17                                      ; preds = %1 +  ret i32 0 +} diff --git a/polly/www/experiments/matmul/matmul.s b/polly/www/experiments/matmul/matmul.s new file mode 100644 index 00000000000..bec9d2a7504 --- /dev/null +++ b/polly/www/experiments/matmul/matmul.s @@ -0,0 +1,255 @@ +; ModuleID = 'matmul.c' +target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" +target triple = "x86_64-unknown-linux-gnu" + +%struct._IO_FILE = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %struct._IO_marker*, %struct._IO_FILE*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] } +%struct._IO_marker = type { %struct._IO_marker*, %struct._IO_FILE*, i32 } + +@A = common global [1536 x [1536 x float]] zeroinitializer, align 16 +@B = common global [1536 x [1536 x float]] zeroinitializer, align 16 +@stdout = external global %struct._IO_FILE* +@.str = private unnamed_addr constant [5 x i8] c"%lf \00" +@C = common global [1536 x [1536 x float]] zeroinitializer, align 16 +@.str1 = private unnamed_addr constant [2 x i8] c"\0A\00" + +define void @init_array() nounwind { +  %i = alloca i32, align 4 +  %j = alloca i32, align 4 +  store i32 0, i32* %i, align 4 +  br label %1 + +; <label>:1                                       ; preds = %41, %0 +  %2 = load i32* %i, align 4 +  %3 = icmp slt i32 %2, 1536 +  br i1 %3, label %4, label %44 + +; <label>:4                                       ; preds = %1 +  store i32 0, i32* %j, align 4 +  br label %5 + +; <label>:5                                       ; preds = %37, %4 +  %6 = load i32* %j, align 4 +  %7 = icmp slt i32 %6, 1536 +  br i1 %7, label %8, label %40 + +; <label>:8                                       ; preds = %5 +  %9 = load i32* %i, align 4 +  %10 = load i32* %j, align 4 +  %11 = mul nsw i32 %9, %10 +  %12 = srem i32 %11, 1024 +  %13 = add nsw i32 1, %12 +  %14 = sitofp i32 %13 to double +  %15 = fdiv double %14, 2.000000e+00 +  %16 = fptrunc double %15 to float +  %17 = load i32* %j, align 4 +  %18 = sext i32 %17 to i64 +  %19 = load i32* %i, align 4 +  %20 = sext i32 %19 to i64 +  %21 = getelementptr inbounds [1536 x [1536 x float]]* @A, i32 0, i64 %20 +  %22 = getelementptr inbounds [1536 x float]* %21, i32 0, i64 %18 +  store float %16, float* %22 +  %23 = load i32* %i, align 4 +  %24 = load i32* %j, align 4 +  %25 = mul nsw i32 %23, %24 +  %26 = srem i32 %25, 1024 +  %27 = add nsw i32 1, %26 +  %28 = sitofp i32 %27 to double +  %29 = fdiv double %28, 2.000000e+00 +  %30 = fptrunc double %29 to float +  %31 = load i32* %j, align 4 +  %32 = sext i32 %31 to i64 +  %33 = load i32* %i, align 4 +  %34 = sext i32 %33 to i64 +  %35 = getelementptr inbounds [1536 x [1536 x float]]* @B, i32 0, i64 %34 +  %36 = getelementptr inbounds [1536 x float]* %35, i32 0, i64 %32 +  store float %30, float* %36 +  br label %37 + +; <label>:37                                      ; preds = %8 +  %38 = load i32* %j, align 4 +  %39 = add nsw i32 %38, 1 +  store i32 %39, i32* %j, align 4 +  br label %5 + +; <label>:40                                      ; preds = %5 +  br label %41 + +; <label>:41                                      ; preds = %40 +  %42 = load i32* %i, align 4 +  %43 = add nsw i32 %42, 1 +  store i32 %43, i32* %i, align 4 +  br label %1 + +; <label>:44                                      ; preds = %1 +  ret void +} + +define void @print_array() nounwind { +  %i = alloca i32, align 4 +  %j = alloca i32, align 4 +  store i32 0, i32* %i, align 4 +  br label %1 + +; <label>:1                                       ; preds = %32, %0 +  %2 = load i32* %i, align 4 +  %3 = icmp slt i32 %2, 1536 +  br i1 %3, label %4, label %35 + +; <label>:4                                       ; preds = %1 +  store i32 0, i32* %j, align 4 +  br label %5 + +; <label>:5                                       ; preds = %26, %4 +  %6 = load i32* %j, align 4 +  %7 = icmp slt i32 %6, 1536 +  br i1 %7, label %8, label %29 + +; <label>:8                                       ; preds = %5 +  %9 = load %struct._IO_FILE** @stdout, align 8 +  %10 = load i32* %j, align 4 +  %11 = sext i32 %10 to i64 +  %12 = load i32* %i, align 4 +  %13 = sext i32 %12 to i64 +  %14 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %13 +  %15 = getelementptr inbounds [1536 x float]* %14, i32 0, i64 %11 +  %16 = load float* %15 +  %17 = fpext float %16 to double +  %18 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %9, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %17) +  %19 = load i32* %j, align 4 +  %20 = srem i32 %19, 80 +  %21 = icmp eq i32 %20, 79 +  br i1 %21, label %22, label %25 + +; <label>:22                                      ; preds = %8 +  %23 = load %struct._IO_FILE** @stdout, align 8 +  %24 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %23, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) +  br label %25 + +; <label>:25                                      ; preds = %22, %8 +  br label %26 + +; <label>:26                                      ; preds = %25 +  %27 = load i32* %j, align 4 +  %28 = add nsw i32 %27, 1 +  store i32 %28, i32* %j, align 4 +  br label %5 + +; <label>:29                                      ; preds = %5 +  %30 = load %struct._IO_FILE** @stdout, align 8 +  %31 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %30, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) +  br label %32 + +; <label>:32                                      ; preds = %29 +  %33 = load i32* %i, align 4 +  %34 = add nsw i32 %33, 1 +  store i32 %34, i32* %i, align 4 +  br label %1 + +; <label>:35                                      ; preds = %1 +  ret void +} + +declare i32 @fprintf(%struct._IO_FILE*, i8*, ...) + +define i32 @main() nounwind { +  %1 = alloca i32, align 4 +  %i = alloca i32, align 4 +  %j = alloca i32, align 4 +  %k = alloca i32, align 4 +  %t_start = alloca double, align 8 +  %t_end = alloca double, align 8 +  store i32 0, i32* %1 +  call void @init_array() +  store i32 0, i32* %i, align 4 +  br label %2 + +; <label>:2                                       ; preds = %57, %0 +  %3 = load i32* %i, align 4 +  %4 = icmp slt i32 %3, 1536 +  br i1 %4, label %5, label %60 + +; <label>:5                                       ; preds = %2 +  store i32 0, i32* %j, align 4 +  br label %6 + +; <label>:6                                       ; preds = %53, %5 +  %7 = load i32* %j, align 4 +  %8 = icmp slt i32 %7, 1536 +  br i1 %8, label %9, label %56 + +; <label>:9                                       ; preds = %6 +  %10 = load i32* %j, align 4 +  %11 = sext i32 %10 to i64 +  %12 = load i32* %i, align 4 +  %13 = sext i32 %12 to i64 +  %14 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %13 +  %15 = getelementptr inbounds [1536 x float]* %14, i32 0, i64 %11 +  store float 0.000000e+00, float* %15 +  store i32 0, i32* %k, align 4 +  br label %16 + +; <label>:16                                      ; preds = %49, %9 +  %17 = load i32* %k, align 4 +  %18 = icmp slt i32 %17, 1536 +  br i1 %18, label %19, label %52 + +; <label>:19                                      ; preds = %16 +  %20 = load i32* %j, align 4 +  %21 = sext i32 %20 to i64 +  %22 = load i32* %i, align 4 +  %23 = sext i32 %22 to i64 +  %24 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %23 +  %25 = getelementptr inbounds [1536 x float]* %24, i32 0, i64 %21 +  %26 = load float* %25 +  %27 = load i32* %k, align 4 +  %28 = sext i32 %27 to i64 +  %29 = load i32* %i, align 4 +  %30 = sext i32 %29 to i64 +  %31 = getelementptr inbounds [1536 x [1536 x float]]* @A, i32 0, i64 %30 +  %32 = getelementptr inbounds [1536 x float]* %31, i32 0, i64 %28 +  %33 = load float* %32 +  %34 = load i32* %j, align 4 +  %35 = sext i32 %34 to i64 +  %36 = load i32* %k, align 4 +  %37 = sext i32 %36 to i64 +  %38 = getelementptr inbounds [1536 x [1536 x float]]* @B, i32 0, i64 %37 +  %39 = getelementptr inbounds [1536 x float]* %38, i32 0, i64 %35 +  %40 = load float* %39 +  %41 = fmul float %33, %40 +  %42 = fadd float %26, %41 +  %43 = load i32* %j, align 4 +  %44 = sext i32 %43 to i64 +  %45 = load i32* %i, align 4 +  %46 = sext i32 %45 to i64 +  %47 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %46 +  %48 = getelementptr inbounds [1536 x float]* %47, i32 0, i64 %44 +  store float %42, float* %48 +  br label %49 + +; <label>:49                                      ; preds = %19 +  %50 = load i32* %k, align 4 +  %51 = add nsw i32 %50, 1 +  store i32 %51, i32* %k, align 4 +  br label %16 + +; <label>:52                                      ; preds = %16 +  br label %53 + +; <label>:53                                      ; preds = %52 +  %54 = load i32* %j, align 4 +  %55 = add nsw i32 %54, 1 +  store i32 %55, i32* %j, align 4 +  br label %6 + +; <label>:56                                      ; preds = %6 +  br label %57 + +; <label>:57                                      ; preds = %56 +  %58 = load i32* %i, align 4 +  %59 = add nsw i32 %58, 1 +  store i32 %59, i32* %i, align 4 +  br label %2 + +; <label>:60                                      ; preds = %2 +  ret i32 0 +} diff --git a/polly/www/experiments/matmul/runall.sh b/polly/www/experiments/matmul/runall.sh new file mode 100755 index 00000000000..0944bd4fb68 --- /dev/null +++ b/polly/www/experiments/matmul/runall.sh @@ -0,0 +1,92 @@ +#!/bin/sh -a + + +echo "--> 1. Create LLVM-IR from C" +clang -S -emit-llvm matmul.c -o matmul.s + +echo "--> 2. Load Polly automatically when calling the 'opt' tool" +export PATH_TO_POLLY_LIB="~/Projekte/polly/build_clang/lib/" +alias opt="opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.so" + +echo "--> 3. Prepare the LLVM-IR for Polly" +opt -S -mem2reg -loop-simplify -indvars matmul.s > matmul.preopt.ll + +echo "--> 4. Show the SCoPs detected by Polly" +opt -basicaa -polly-cloog -analyze -q matmul.preopt.ll + +echo "--> 5.1 Highlight the detected SCoPs in the CFGs of the program" +# We only create .dot files, as directly -view-scops directly calls graphviz +# which would require user interaction to continue the script. +# opt -basicaa -view-scops -disable-output matmul.preopt.ll +opt -basicaa -dot-scops -disable-output matmul.preopt.ll + +echo "--> 5.2 Highlight the detected SCoPs in the CFGs of the program (print \ +no instructions)" +# We only create .dot files, as directly -view-scops-only directly calls +# graphviz which would require user interaction to continue the script. +# opt -basicaa -view-scops-only -disable-output matmul.preopt.ll +opt -basicaa -dot-scops-only -disable-output matmul.preopt.ll + +echo "--> 5.3 Create .png files from the .dot files" +for i in `ls *.dot`; do dot -Tpng $i > $i.png; done + +echo "--> 6. View the polyhedral representation of the SCoPs" +opt -basicaa -polly-scops -analyze matmul.preopt.ll + +echo "--> 7. Show the dependences for the SCoPs" +opt -basicaa -polly-dependences -analyze matmul.preopt.ll + +echo "--> 8. Export jscop files" +opt -basicaa -polly-export-jscop matmul.preopt.ll + +echo "--> 9. Import the updated jscop files and print the new SCoPs. (optional)" +opt -basicaa -polly-import-jscop -polly-cloog -analyze matmul.preopt.ll \ +    -polly-import-jscop-postfix=interchanged +opt -basicaa -polly-import-jscop -polly-cloog -analyze matmul.preopt.ll \ +    -polly-import-jscop-postfix=interchanged+tiled + +echo "--> 10. Codegenerate the SCoPs" +opt -basicaa -polly-import-jscop -polly-import-jscop-postfix=interchanged \ +    -polly-codegen \ +    matmul.preopt.ll | opt -O3 > matmul.polly.interchanged.ll +opt -basicaa -polly-import-jscop \ +    -polly-import-jscop-postfix=interchanged+tiled -polly-codegen \ +    matmul.preopt.ll | opt -O3 > matmul.polly.interchanged+tiled.ll +opt -basicaa -polly-import-jscop \ +    -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen \ +    matmul.preopt.ll -enable-polly-vector\ +    | opt -O3 > matmul.polly.interchanged+tiled+vector.ll +opt -basicaa -polly-import-jscop \ +    -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen \ +    matmul.preopt.ll -enable-polly-vector -enable-polly-openmp\ +    | opt -O3 > matmul.polly.interchanged+tiled+vector+openmp.ll +opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll + +echo "--> 11. Create the executables" +llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && gcc matmul.polly.interchanged.s \ +    -o matmul.polly.interchanged.exe +llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && gcc matmul.polly.interchanged+tiled.s \ +    -o matmul.polly.interchanged+tiled.exe +llc matmul.polly.interchanged+tiled+vector.ll \ +    -o matmul.polly.interchanged+tiled+vector.s \ +    && gcc matmul.polly.interchanged+tiled+vector.s \ +    -o matmul.polly.interchanged+tiled+vector.exe +llc matmul.polly.interchanged+tiled+vector+openmp.ll \ +    -o matmul.polly.interchanged+tiled+vector+openmp.s \ +    && gcc -lgomp matmul.polly.interchanged+tiled+vector+openmp.s \ +    -o matmul.polly.interchanged+tiled+vector+openmp.exe +llc matmul.normalopt.ll -o matmul.normalopt.s && gcc matmul.normalopt.s \ +    -o matmul.normalopt.exe + +echo "--> 12. Compare the runtime of the executables" + +echo "time ./matmul.normalopt.exe" +time -f "%E real, %U user, %S sys" ./matmul.normalopt.exe +echo "time ./matmul.polly.interchanged.exe" +time -f "%E real, %U user, %S sys" ./matmul.polly.interchanged.exe +echo "time ./matmul.polly.interchanged+tiled.exe" +time -f "%E real, %U user, %S sys" ./matmul.polly.interchanged+tiled.exe +echo "time ./matmul.polly.interchanged+tiled+vector.exe" +time -f "%E real, %U user, %S sys" ./matmul.polly.interchanged+tiled+vector.exe +echo "time ./matmul.polly.interchanged+tiled+vector+openmp.exe" +time -f "%E real, %U user, %S sys" ./matmul.polly.interchanged+tiled+vector+openmp.exe diff --git a/polly/www/experiments/matmul/scops.init_array.dot b/polly/www/experiments/matmul/scops.init_array.dot new file mode 100644 index 00000000000..1b3f09284f9 --- /dev/null +++ b/polly/www/experiments/matmul/scops.init_array.dot @@ -0,0 +1,47 @@ +digraph "Scop Graph for 'init_array' function" { +	label="Scop Graph for 'init_array' function"; + +	Node0x26ade30 [shape=record,label="{%0:\l\l  br label %1\l}"]; +	Node0x26ade30 -> Node0x26acdd0; +	Node0x26acdd0 [shape=record,label="{%1:\l\l  %2 = phi i64 [ %indvar.next2, %18 ], [ 0, %0 ]\l  %exitcond5 = icmp ne i64 %2, 1536\l  br i1 %exitcond5, label %3, label %19\l}"]; +	Node0x26acdd0 -> Node0x26acdf0; +	Node0x26acdd0 -> Node0x26adce0; +	Node0x26acdf0 [shape=record,label="{%3:\l\l  br label %4\l}"]; +	Node0x26acdf0 -> Node0x26addc0; +	Node0x26addc0 [shape=record,label="{%4:\l\l  %indvar = phi i64 [ %indvar.next, %16 ], [ 0, %3 ]\l  %scevgep4 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %2, i64 %indvar\l  %scevgep = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %2, i64 %indvar\l  %tmp = mul i64 %2, %indvar\l  %tmp3 = trunc i64 %tmp to i32\l  %exitcond = icmp ne i64 %indvar, 1536\l  br i1 %exitcond, label %5, label %17\l}"]; +	Node0x26addc0 -> Node0x26ace70; +	Node0x26addc0 -> Node0x26ad010; +	Node0x26ace70 [shape=record,label="{%5:\l\l  %6 = srem i32 %tmp3, 1024\l  %7 = add nsw i32 1, %6\l  %8 = sitofp i32 %7 to double\l  %9 = fdiv double %8, 2.000000e+00\l  %10 = fptrunc double %9 to float\l  store float %10, float* %scevgep4\l  %11 = srem i32 %tmp3, 1024\l  %12 = add nsw i32 1, %11\l  %13 = sitofp i32 %12 to double\l  %14 = fdiv double %13, 2.000000e+00\l  %15 = fptrunc double %14 to float\l  store float %15, float* %scevgep\l  br label %16\l}"]; +	Node0x26ace70 -> Node0x26ace90; +	Node0x26ace90 [shape=record,label="{%16:\l\l  %indvar.next = add i64 %indvar, 1\l  br label %4\l}"]; +	Node0x26ace90 -> Node0x26addc0[constraint=false]; +	Node0x26ad010 [shape=record,label="{%17:\l\l  br label %18\l}"]; +	Node0x26ad010 -> Node0x26ad6c0; +	Node0x26ad6c0 [shape=record,label="{%18:\l\l  %indvar.next2 = add i64 %2, 1\l  br label %1\l}"]; +	Node0x26ad6c0 -> Node0x26acdd0[constraint=false]; +	Node0x26adce0 [shape=record,label="{%19:\l\l  ret void\l}"]; +	colorscheme = "paired12" +        subgraph cluster_0x26a94c0 { +          label = ""; +          style = solid; +          color = 1 +          subgraph cluster_0x26aa4e0 { +            label = ""; +            style = filled; +            color = 3            subgraph cluster_0x26a9780 { +              label = ""; +              style = solid; +              color = 5 +              Node0x26addc0; +              Node0x26ace70; +              Node0x26ace90; +            } +            Node0x26acdd0; +            Node0x26acdf0; +            Node0x26ad010; +            Node0x26ad6c0; +          } +          Node0x26ade30; +          Node0x26adce0; +        } +} diff --git a/polly/www/experiments/matmul/scops.init_array.dot.png b/polly/www/experiments/matmul/scops.init_array.dot.pngBinary files differ new file mode 100644 index 00000000000..ee04e8b7018 --- /dev/null +++ b/polly/www/experiments/matmul/scops.init_array.dot.png diff --git a/polly/www/experiments/matmul/scops.main.dot b/polly/www/experiments/matmul/scops.main.dot new file mode 100644 index 00000000000..0459c48fb50 --- /dev/null +++ b/polly/www/experiments/matmul/scops.main.dot @@ -0,0 +1,65 @@ +digraph "Scop Graph for 'main' function" { +	label="Scop Graph for 'main' function"; + +	Node0x26ace10 [shape=record,label="{%0:\l\l  call void @init_array()\l  br label %1\l}"]; +	Node0x26ace10 -> Node0x26acd60; +	Node0x26acd60 [shape=record,label="{%1:\l\l  %indvar3 = phi i64 [ %indvar.next4, %16 ], [ 0, %0 ]\l  %exitcond9 = icmp ne i64 %indvar3, 1536\l  br i1 %exitcond9, label %2, label %17\l}"]; +	Node0x26acd60 -> Node0x26acd80; +	Node0x26acd60 -> Node0x26af2e0; +	Node0x26acd80 [shape=record,label="{%2:\l\l  br label %3\l}"]; +	Node0x26acd80 -> Node0x26aee80; +	Node0x26aee80 [shape=record,label="{%3:\l\l  %indvar1 = phi i64 [ %indvar.next2, %14 ], [ 0, %2 ]\l  %scevgep8 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar3, i64 %indvar1\l  %exitcond6 = icmp ne i64 %indvar1, 1536\l  br i1 %exitcond6, label %4, label %15\l}"]; +	Node0x26aee80 -> Node0x26aeea0; +	Node0x26aee80 -> Node0x26aeec0; +	Node0x26aeea0 [shape=record,label="{%4:\l\l  store float 0.000000e+00, float* %scevgep8\l  br label %5\l}"]; +	Node0x26aeea0 -> Node0x26aced0; +	Node0x26aced0 [shape=record,label="{%5:\l\l  %indvar = phi i64 [ %indvar.next, %12 ], [ 0, %4 ]\l  %scevgep5 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %indvar3, i64 %indvar\l  %scevgep = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %indvar, i64 %indvar1\l  %exitcond = icmp ne i64 %indvar, 1536\l  br i1 %exitcond, label %6, label %13\l}"]; +	Node0x26aced0 -> Node0x26ace60; +	Node0x26aced0 -> Node0x26af5e0; +	Node0x26ace60 [shape=record,label="{%6:\l\l  %7 = load float* %scevgep8\l  %8 = load float* %scevgep5\l  %9 = load float* %scevgep\l  %10 = fmul float %8, %9\l  %11 = fadd float %7, %10\l  store float %11, float* %scevgep8\l  br label %12\l}"]; +	Node0x26ace60 -> Node0x26af640; +	Node0x26af640 [shape=record,label="{%12:\l\l  %indvar.next = add i64 %indvar, 1\l  br label %5\l}"]; +	Node0x26af640 -> Node0x26aced0[constraint=false]; +	Node0x26af5e0 [shape=record,label="{%13:\l\l  br label %14\l}"]; +	Node0x26af5e0 -> Node0x26af6e0; +	Node0x26af6e0 [shape=record,label="{%14:\l\l  %indvar.next2 = add i64 %indvar1, 1\l  br label %3\l}"]; +	Node0x26af6e0 -> Node0x26aee80[constraint=false]; +	Node0x26aeec0 [shape=record,label="{%15:\l\l  br label %16\l}"]; +	Node0x26aeec0 -> Node0x26af740; +	Node0x26af740 [shape=record,label="{%16:\l\l  %indvar.next4 = add i64 %indvar3, 1\l  br label %1\l}"]; +	Node0x26af740 -> Node0x26acd60[constraint=false]; +	Node0x26af2e0 [shape=record,label="{%17:\l\l  ret i32 0\l}"]; +	colorscheme = "paired12" +        subgraph cluster_0x26a8b20 { +          label = ""; +          style = solid; +          color = 1 +          subgraph cluster_0x26a9220 { +            label = ""; +            style = filled; +            color = 3            subgraph cluster_0x26ad500 { +              label = ""; +              style = solid; +              color = 5 +              subgraph cluster_0x26ad480 { +                label = ""; +                style = solid; +                color = 7 +                Node0x26aced0; +                Node0x26ace60; +                Node0x26af640; +              } +              Node0x26aee80; +              Node0x26aeea0; +              Node0x26af5e0; +              Node0x26af6e0; +            } +            Node0x26acd60; +            Node0x26acd80; +            Node0x26aeec0; +            Node0x26af740; +          } +          Node0x26ace10; +          Node0x26af2e0; +        } +} diff --git a/polly/www/experiments/matmul/scops.main.dot.png b/polly/www/experiments/matmul/scops.main.dot.pngBinary files differ new file mode 100644 index 00000000000..404d5f19f38 --- /dev/null +++ b/polly/www/experiments/matmul/scops.main.dot.png diff --git a/polly/www/experiments/matmul/scops.print_array.dot b/polly/www/experiments/matmul/scops.print_array.dot new file mode 100644 index 00000000000..6aafb40d666 --- /dev/null +++ b/polly/www/experiments/matmul/scops.print_array.dot @@ -0,0 +1,60 @@ +digraph "Scop Graph for 'print_array' function" { +	label="Scop Graph for 'print_array' function"; + +	Node0x26ac9a0 [shape=record,label="{%0:\l\l  br label %1\l}"]; +	Node0x26ac9a0 -> Node0x26acd00; +	Node0x26acd00 [shape=record,label="{%1:\l\l  %indvar1 = phi i64 [ %indvar.next2, %19 ], [ 0, %0 ]\l  %exitcond3 = icmp ne i64 %indvar1, 1536\l  br i1 %exitcond3, label %2, label %20\l}"]; +	Node0x26acd00 -> Node0x26a8ac0; +	Node0x26acd00 -> Node0x26ac9c0; +	Node0x26a8ac0 [shape=record,label="{%2:\l\l  br label %3\l}"]; +	Node0x26a8ac0 -> Node0x26ad940; +	Node0x26ad940 [shape=record,label="{%3:\l\l  %indvar = phi i64 [ %indvar.next, %15 ], [ 0, %2 ]\l  %scevgep = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar1, i64 %indvar\l  %j.0 = trunc i64 %indvar to i32\l  %exitcond = icmp ne i64 %indvar, 1536\l  br i1 %exitcond, label %4, label %16\l}"]; +	Node0x26ad940 -> Node0x26acde0; +	Node0x26ad940 -> Node0x26ad9e0; +	Node0x26acde0 [shape=record,label="{%4:\l\l  %5 = load %struct._IO_FILE** @stdout, align 8\l  %6 = load float* %scevgep\l  %7 = fpext float %6 to double\l  %8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %5, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %7)\l  %9 = srem i32 %j.0, 80\l  %10 = icmp eq i32 %9, 79\l  br i1 %10, label %11, label %14\l}"]; +	Node0x26acde0 -> Node0x26ada40; +	Node0x26acde0 -> Node0x26acfa0; +	Node0x26ada40 [shape=record,label="{%11:\l\l  %12 = load %struct._IO_FILE** @stdout, align 8\l  %13 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %12, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))\l  br label %14\l}"]; +	Node0x26ada40 -> Node0x26acfa0; +	Node0x26acfa0 [shape=record,label="{%14:\l\l  br label %15\l}"]; +	Node0x26acfa0 -> Node0x26ad6c0; +	Node0x26ad6c0 [shape=record,label="{%15:\l\l  %indvar.next = add i64 %indvar, 1\l  br label %3\l}"]; +	Node0x26ad6c0 -> Node0x26ad940[constraint=false]; +	Node0x26ad9e0 [shape=record,label="{%16:\l\l  %17 = load %struct._IO_FILE** @stdout, align 8\l  %18 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %17, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))\l  br label %19\l}"]; +	Node0x26ad9e0 -> Node0x26ace00; +	Node0x26ace00 [shape=record,label="{%19:\l\l  %indvar.next2 = add i64 %indvar1, 1\l  br label %1\l}"]; +	Node0x26ace00 -> Node0x26acd00[constraint=false]; +	Node0x26ac9c0 [shape=record,label="{%20:\l\l  ret void\l}"]; +	colorscheme = "paired12" +        subgraph cluster_0x26adae0 { +          label = ""; +          style = solid; +          color = 1 +          subgraph cluster_0x26aa030 { +            label = ""; +            style = solid; +            color = 6 +            subgraph cluster_0x26a9fb0 { +              label = ""; +              style = solid; +              color = 5 +              subgraph cluster_0x26adb60 { +                label = ""; +                style = solid; +                color = 7 +                Node0x26acde0; +                Node0x26ada40; +              } +              Node0x26ad940; +              Node0x26acfa0; +              Node0x26ad6c0; +            } +            Node0x26acd00; +            Node0x26a8ac0; +            Node0x26ad9e0; +            Node0x26ace00; +          } +          Node0x26ac9a0; +          Node0x26ac9c0; +        } +} diff --git a/polly/www/experiments/matmul/scops.print_array.dot.png b/polly/www/experiments/matmul/scops.print_array.dot.pngBinary files differ new file mode 100644 index 00000000000..5b1658a291f --- /dev/null +++ b/polly/www/experiments/matmul/scops.print_array.dot.png diff --git a/polly/www/experiments/matmul/scopsonly.init_array.dot b/polly/www/experiments/matmul/scopsonly.init_array.dot new file mode 100644 index 00000000000..7ef7b1397a5 --- /dev/null +++ b/polly/www/experiments/matmul/scopsonly.init_array.dot @@ -0,0 +1,47 @@ +digraph "Scop Graph for 'init_array' function" { +	label="Scop Graph for 'init_array' function"; + +	Node0x24dfca0 [shape=record,label="{%0}"]; +	Node0x24dfca0 -> Node0x24dfdf0; +	Node0x24dfdf0 [shape=record,label="{%1}"]; +	Node0x24dfdf0 -> Node0x24dee50; +	Node0x24dfdf0 -> Node0x24def50; +	Node0x24dee50 [shape=record,label="{%3}"]; +	Node0x24dee50 -> Node0x24deec0; +	Node0x24deec0 [shape=record,label="{%4}"]; +	Node0x24deec0 -> Node0x24dfdc0; +	Node0x24deec0 -> Node0x24df0c0; +	Node0x24dfdc0 [shape=record,label="{%5}"]; +	Node0x24dfdc0 -> Node0x24defb0; +	Node0x24defb0 [shape=record,label="{%16}"]; +	Node0x24defb0 -> Node0x24deec0[constraint=false]; +	Node0x24df0c0 [shape=record,label="{%17}"]; +	Node0x24df0c0 -> Node0x24deee0; +	Node0x24deee0 [shape=record,label="{%18}"]; +	Node0x24deee0 -> Node0x24dfdf0[constraint=false]; +	Node0x24def50 [shape=record,label="{%19}"]; +	colorscheme = "paired12" +        subgraph cluster_0x24db4c0 { +          label = ""; +          style = solid; +          color = 1 +          subgraph cluster_0x24dc4e0 { +            label = ""; +            style = filled; +            color = 3            subgraph cluster_0x24db780 { +              label = ""; +              style = solid; +              color = 5 +              Node0x24deec0; +              Node0x24dfdc0; +              Node0x24defb0; +            } +            Node0x24dfdf0; +            Node0x24dee50; +            Node0x24df0c0; +            Node0x24deee0; +          } +          Node0x24dfca0; +          Node0x24def50; +        } +} diff --git a/polly/www/experiments/matmul/scopsonly.init_array.dot.png b/polly/www/experiments/matmul/scopsonly.init_array.dot.pngBinary files differ new file mode 100644 index 00000000000..92c4f9882bd --- /dev/null +++ b/polly/www/experiments/matmul/scopsonly.init_array.dot.png diff --git a/polly/www/experiments/matmul/scopsonly.main.dot b/polly/www/experiments/matmul/scopsonly.main.dot new file mode 100644 index 00000000000..d375349730a --- /dev/null +++ b/polly/www/experiments/matmul/scopsonly.main.dot @@ -0,0 +1,65 @@ +digraph "Scop Graph for 'main' function" { +	label="Scop Graph for 'main' function"; + +	Node0x24deb60 [shape=record,label="{%0}"]; +	Node0x24deb60 -> Node0x24deaa0; +	Node0x24deaa0 [shape=record,label="{%1}"]; +	Node0x24deaa0 -> Node0x24e12a0; +	Node0x24deaa0 -> Node0x24e0e30; +	Node0x24e12a0 [shape=record,label="{%2}"]; +	Node0x24e12a0 -> Node0x24e0e00; +	Node0x24e0e00 [shape=record,label="{%3}"]; +	Node0x24e0e00 -> Node0x24e1410; +	Node0x24e0e00 -> Node0x24e1470; +	Node0x24e1410 [shape=record,label="{%4}"]; +	Node0x24e1410 -> Node0x24e1380; +	Node0x24e1380 [shape=record,label="{%5}"]; +	Node0x24e1380 -> Node0x24deaf0; +	Node0x24e1380 -> Node0x24e1620; +	Node0x24deaf0 [shape=record,label="{%6}"]; +	Node0x24deaf0 -> Node0x24e1680; +	Node0x24e1680 [shape=record,label="{%12}"]; +	Node0x24e1680 -> Node0x24e1380[constraint=false]; +	Node0x24e1620 [shape=record,label="{%13}"]; +	Node0x24e1620 -> Node0x24e16e0; +	Node0x24e16e0 [shape=record,label="{%14}"]; +	Node0x24e16e0 -> Node0x24e0e00[constraint=false]; +	Node0x24e1470 [shape=record,label="{%15}"]; +	Node0x24e1470 -> Node0x24e01a0; +	Node0x24e01a0 [shape=record,label="{%16}"]; +	Node0x24e01a0 -> Node0x24deaa0[constraint=false]; +	Node0x24e0e30 [shape=record,label="{%17}"]; +	colorscheme = "paired12" +        subgraph cluster_0x24dfc10 { +          label = ""; +          style = solid; +          color = 1 +          subgraph cluster_0x24de570 { +            label = ""; +            style = filled; +            color = 3            subgraph cluster_0x24de7a0 { +              label = ""; +              style = solid; +              color = 5 +              subgraph cluster_0x24de720 { +                label = ""; +                style = solid; +                color = 7 +                Node0x24e1380; +                Node0x24deaf0; +                Node0x24e1680; +              } +              Node0x24e0e00; +              Node0x24e1410; +              Node0x24e1620; +              Node0x24e16e0; +            } +            Node0x24deaa0; +            Node0x24e12a0; +            Node0x24e1470; +            Node0x24e01a0; +          } +          Node0x24deb60; +          Node0x24e0e30; +        } +} diff --git a/polly/www/experiments/matmul/scopsonly.main.dot.png b/polly/www/experiments/matmul/scopsonly.main.dot.pngBinary files differ new file mode 100644 index 00000000000..f0cf154bc79 --- /dev/null +++ b/polly/www/experiments/matmul/scopsonly.main.dot.png diff --git a/polly/www/experiments/matmul/scopsonly.print_array.dot b/polly/www/experiments/matmul/scopsonly.print_array.dot new file mode 100644 index 00000000000..7c46729e31d --- /dev/null +++ b/polly/www/experiments/matmul/scopsonly.print_array.dot @@ -0,0 +1,60 @@ +digraph "Scop Graph for 'print_array' function" { +	label="Scop Graph for 'print_array' function"; + +	Node0x24df2c0 [shape=record,label="{%0}"]; +	Node0x24df2c0 -> Node0x24df2a0; +	Node0x24df2a0 [shape=record,label="{%1}"]; +	Node0x24df2a0 -> Node0x24dee90; +	Node0x24df2a0 -> Node0x24dee20; +	Node0x24dee90 [shape=record,label="{%2}"]; +	Node0x24dee90 -> Node0x24debd0; +	Node0x24debd0 [shape=record,label="{%3}"]; +	Node0x24debd0 -> Node0x24df150; +	Node0x24debd0 -> Node0x24de990; +	Node0x24df150 [shape=record,label="{%4}"]; +	Node0x24df150 -> Node0x24df3a0; +	Node0x24df150 -> Node0x24defb0; +	Node0x24df3a0 [shape=record,label="{%11}"]; +	Node0x24df3a0 -> Node0x24defb0; +	Node0x24defb0 [shape=record,label="{%14}"]; +	Node0x24defb0 -> Node0x24df530; +	Node0x24df530 [shape=record,label="{%15}"]; +	Node0x24df530 -> Node0x24debd0[constraint=false]; +	Node0x24de990 [shape=record,label="{%16}"]; +	Node0x24de990 -> Node0x24df9a0; +	Node0x24df9a0 [shape=record,label="{%19}"]; +	Node0x24df9a0 -> Node0x24df2a0[constraint=false]; +	Node0x24dee20 [shape=record,label="{%20}"]; +	colorscheme = "paired12" +        subgraph cluster_0x24dbe40 { +          label = ""; +          style = solid; +          color = 1 +          subgraph cluster_0x24db6e0 { +            label = ""; +            style = solid; +            color = 6 +            subgraph cluster_0x24db660 { +              label = ""; +              style = solid; +              color = 5 +              subgraph cluster_0x24db5e0 { +                label = ""; +                style = solid; +                color = 7 +                Node0x24df150; +                Node0x24df3a0; +              } +              Node0x24debd0; +              Node0x24defb0; +              Node0x24df530; +            } +            Node0x24df2a0; +            Node0x24dee90; +            Node0x24de990; +            Node0x24df9a0; +          } +          Node0x24df2c0; +          Node0x24dee20; +        } +} diff --git a/polly/www/experiments/matmul/scopsonly.print_array.dot.png b/polly/www/experiments/matmul/scopsonly.print_array.dot.pngBinary files differ new file mode 100644 index 00000000000..3426e7b06fb --- /dev/null +++ b/polly/www/experiments/matmul/scopsonly.print_array.dot.png diff --git a/polly/www/get_started.html b/polly/www/get_started.html new file mode 100644 index 00000000000..8c176f41620 --- /dev/null +++ b/polly/www/get_started.html @@ -0,0 +1,136 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" +          "http://www.w3.org/TR/html4/strict.dtd"> +<html> +<head> +  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> +  <title>Polly - Getting Started</title> +  <link type="text/css" rel="stylesheet" href="menu.css" /> +  <link type="text/css" rel="stylesheet" href="content.css" /> +</head> +<body> + +<!--#include virtual="menu.html.incl"--> + +<div id="content"> + +<h1>Getting Started: Building and Installing Polly</h1> + + +<h2 id="prerequisites"> Prerequisites </h2> + +The following prerequisites can be installed through the package management +system of your operating system. + +<ul> +<li>libgmp (library + developer package)</li> +</ul> + +<h3> Install ISL / CLooG libraries </h3> + +Polly requires the latest versions of <a href="http://www.cloog.org">CLooG</a> +and <a href="http://repo.or.cz/w/isl.git">isl</a> to be installed. The CLooG git +repository contains both the latest version of CLooG and isl. + +<pre> +git clone git://repo.or.cz/cloog.git +cd cloog +./get_submodules.sh +./autogen.sh +./configure --with-gmp-prefix=/path/to/gmp/installation --prefix=/path/to/cloog/installation +make +make install +</pre> + +<h3> Install Pocc (Optional) </h3> + +Polly can use <a href="http://www.cse.ohio-state.edu/~pouchet/software/pocc"> +PoCC</a> as an external optimizer. PoCC provides an +integrated version of <a href="http://pluto.sf.net">Pluto</a>, an advanced +data-locality and tileability optimizer.  To enable this feature install PoCC +1.0-rc3.1 (the one with Polly support) and add it to your PATH. + +<pre> +wget <a +href="http://www.cse.ohio-state.edu/~pouchet/software/pocc/download/pocc-1.0-rc3.1-full.tar.gz">http://www.cse.ohio-state.edu/~pouchet/software/pocc/download/pocc-1.0-rc3.1-full.tar.gz</a> +tar xzf pocc-1.0-rc3.1-full.tar.gz +cd pocc-1.0-rc3.1 +./install.sh +export PATH=`pwd`/bin +</pre> + +Furthermore, scoplib-0.2.0 has to be installed such that polly can link to +it. + +<pre> +wget <a +href="http://www.cse.ohio-state.edu/~pouchet/software/pocc/download/modules/scoplib-0.2.0.tar.gz" +>http://www.cse.ohio-state.edu/~pouchet/software/pocc/download/modules/scoplib-0.2.0.tar.gz</a> +tar xzf  scoplib-0.2.0.tar.gz +cd scoplib-0.2.0 +./configure --enable-mp-version --prefix=/path/to/scoplib/installation +make && make install +</pre> + +<h2 id="source"> Get the code </h2> + +<p> +The Polly source code is available in the LLVM SVN repository. For convenience +we also provide a git mirror. To build Polly we extract its source code into the +<em>tools</em> directory of the llvm sources.</p> +<b>A recent LLVM checkout is needed. Older versions may not work!</b> + +<h3>SVN</h3> +<pre> +export LLVM_SRC=`pwd`/llvm +svn checkout http://llvm.org/svn/llvm-project/llvm/trunk ${LLVM_SRC} +cd ${LLVM_SRC}/tools +svn checkout http://llvm.org/svn/llvm-project/polly/trunk polly +</pre> +<h3>GIT</h3> +<pre> +export LLVM_SRC=`pwd`/llvm +git clone http://llvm.org/git/llvm.git ${LLVM_SRC} +cd ${LLVM_SRC}/tools +git clone git://repo.or.cz/polly.git +</pre> + + + +<h2 id="build">Build Polly</h2> + +To build Polly you can either use the autoconf or the cmake build system. At the +moment only the autoconf build system allows to run the llvm test-suite and only +the cmake build system allows to run 'make polly-test'. + +<h3>CMake</h3> + +<pre> +mkdir build +cd build +cmake ${LLVM_SRC} + +# If CMAKE cannot find CLooG and ISL +cmake -DCMAKE_PREFIX_PATH=/cloog/installation . + +# To point CMAKE to the scoplib source +cmake -DCMAKE_PREFIX_PATH=/scoplib/installation . + +make +</pre> + +<h3> Autoconf </h2> + +<pre> +mkdir build +cd build +${LLVM_SRC}/configure --with-cloog=/cloog/installation --with-isl=/cloog/installation --with-scoplib=/scoplib/installation +make +</pre> + +<h2> Test Polly</h2> + +To check if Polly works correctly you can run <em>make polly-test</em>. This +currently works only with a cmake build. +</div> +</body> +</html> diff --git a/polly/www/images/architecture.png b/polly/www/images/architecture.pngBinary files differ new file mode 100644 index 00000000000..fdd26a075f9 --- /dev/null +++ b/polly/www/images/architecture.png diff --git a/polly/www/images/iit-madras.png b/polly/www/images/iit-madras.pngBinary files differ new file mode 100644 index 00000000000..caf90ab0ad6 --- /dev/null +++ b/polly/www/images/iit-madras.png diff --git a/polly/www/images/osu.png b/polly/www/images/osu.pngBinary files differ new file mode 100644 index 00000000000..154a04b1c67 --- /dev/null +++ b/polly/www/images/osu.png diff --git a/polly/www/images/performance/parallel-large.png b/polly/www/images/performance/parallel-large.pngBinary files differ new file mode 100644 index 00000000000..76261bb4206 --- /dev/null +++ b/polly/www/images/performance/parallel-large.png diff --git a/polly/www/images/performance/parallel-small.png b/polly/www/images/performance/parallel-small.pngBinary files differ new file mode 100644 index 00000000000..3c9f6ba3246 --- /dev/null +++ b/polly/www/images/performance/parallel-small.png diff --git a/polly/www/images/performance/sequential-large.png b/polly/www/images/performance/sequential-large.pngBinary files differ new file mode 100644 index 00000000000..5c88354f188 --- /dev/null +++ b/polly/www/images/performance/sequential-large.png diff --git a/polly/www/images/performance/sequential-small.png b/polly/www/images/performance/sequential-small.pngBinary files differ new file mode 100644 index 00000000000..94b248de8b7 --- /dev/null +++ b/polly/www/images/performance/sequential-small.png diff --git a/polly/www/images/sys-uni.png b/polly/www/images/sys-uni.pngBinary files differ new file mode 100644 index 00000000000..e6b84e16acd --- /dev/null +++ b/polly/www/images/sys-uni.png diff --git a/polly/www/images/uni-passau.png b/polly/www/images/uni-passau.pngBinary files differ new file mode 100644 index 00000000000..4bbfa216315 --- /dev/null +++ b/polly/www/images/uni-passau.png diff --git a/polly/www/index.html b/polly/www/index.html new file mode 100644 index 00000000000..e26ea6983d8 --- /dev/null +++ b/polly/www/index.html @@ -0,0 +1,73 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"  +          "http://www.w3.org/TR/html4/strict.dtd"> +<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> +<html> +<head> +  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> +  <title>Polly - Polyhedral optimizations for LLVM</title> +  <link type="text/css" rel="stylesheet" href="menu.css"> +  <link type="text/css" rel="stylesheet" href="content.css"> +</head> +<body> +<!--#include virtual="menu.html.incl"--> +<div id="content"> +  <!--*********************************************************************--> +  <h1>Polly: Polyhedral optimizations for LLVM</h1> +  <!--*********************************************************************--> + +  <p>Polly is a project that works on advanced optimizations for data-locality +  and parallelism. It uses the polyhedral model, a high-level mathematical +  abstraction, to analyse and optimize the memory access pattern of a program. +  Due to the use of a polyhedral representation Polly can easily calculate +  detailed data dependency information which it uses to derive an optimized loop +  structure. Polly can speed up sequential code by improving memory locality and +  consequently the cache use.  Furthermore, Polly is able to expose different +  kinds of parallelism which it exploits by introducing (basic) OpenMP and SIMD +  code.  The automatic use of vector accelerators is planned and will take +  avantage of the ongoing work on the LLVM PTX backend. +  </p> + +  <em> Polly is still a research project, that is not production quality. We are +  working on a robust implementation of Polly's core. You are invited to join us +  by directly contributing to Polly or by using it for your own research.</em> + +  <!--=====================================================================--> +  <h2>Major changes in Polly</h2> +  <!--=====================================================================--> + +  <ul> +  <li>April 2011 - Polly moves to the LLVM infrastructure </li> +  <li>March 2011 - Polly is presented at <a +  href="http://impact2011.inrialpes.fr/">CGO/IMPACT 2011</a>, Polly can compile +  polybench 2.0 with vectorization and OpenMP code generation.  </li> +  <li> Februar 2011 - pollycc - a script to automatically compile with +  polyhedral optimizations </li> +  <li> Januar 2011 - Basic OpenMP support, Alias analysis integration, +  Pluto/POCC support </li> +  <li> Dezember 2010 - Basic vectorization support </li> +  <li> November 2010 - Talk about Polly at the <a +  href="http://llvm.org/devmtg/2010-11/">LLVM Developer Meeting</a> </li> +  <li> October 2010 - Added dependency analysis </li> +  <li> October 2010 - Finished Phase 1 - Get something working </li> +  <li> October 2010 - Support for scalar dependences and sequential SCoPs </li> +  <li> August 2010 - RegionInfo pass committed to llvm </li> +  <li> August 2010 - llvm-test suite compiles </li> +  <li> July 2010 - Code generation works for normal SCoPs.  </li> +  <li> June 2010 - OpenSCoP import/export works (as far as openscop is finished) +  </li> +  <li> May 2010 - The CLooG AST can be parsed </li> +  <li> April 2010 - SCoPs can automatically be detected (WIP) </li> +  <li> March 2010 - The RegionInfo framework is almost completed.  </li> +  <li> February 2010 - Translating a simple loop to Polly-IR and passing it to +  CLooG-isl to regenerate a loop structure works.  </li> +  <li> February 2010 - ISL and CLooG are integrated.  </li> +  <li> January 2010 - The RegionInfo pass is finished.  </li> +  <li> End of 2009 - Work on the infrastructure started.  </li> +  </ul> +  <!--=====================================================================--> +  <h2> The architecture of Polly</h2> +  <!--=====================================================================--> +  <img src='images/architecture.png' /> +</div> +</body> +</html> diff --git a/polly/www/menu.css b/polly/www/menu.css new file mode 100644 index 00000000000..9f26687b437 --- /dev/null +++ b/polly/www/menu.css @@ -0,0 +1,39 @@ +/***************/ +/* page layout */ +/***************/ + +[id=menu] { +	width:25ex; +        float: left; +} +[id=content] { +	/* *****  EDIT THIS VALUE IF CONTENT OVERLAPS MENU ***** */ +	position:absolute; +  left:29ex; +	padding-right:4ex; +} + +/**************/ +/* menu style */ +/**************/ + +#menu .submenu { +	padding-top:1em; +	display:block; +} + +#menu label { +	display:block; +	font-weight: bold; +	text-align: center; +	background-color: rgb(192,192,192); +} +#menu a { +	padding:0 .2em; +	display:block; +	text-align: center; +	background-color: rgb(235,235,235); +} +#menu a:visited { +	color:rgb(100,50,100); +} diff --git a/polly/www/menu.html.incl b/polly/www/menu.html.incl new file mode 100644 index 00000000000..803c724b819 --- /dev/null +++ b/polly/www/menu.html.incl @@ -0,0 +1,36 @@ +<div id="menu"> +  <div> +    <a href="http://llvm.org/">LLVM Home</a> +  </div> + +  <div class="submenu"> +    <label>Polly Info</label> +    <a href="index.html">About</a> +    <a href="todo.html">Todo</a> +    <a href="passes.html">LLVM Passes</a> +<!--    <a href="examples.html">Examples</a> --> +    <a href="performance.html">Performance</a> +    <a href="publications.html">Publications</a> +    <a href="contributors.html">Contributors</a> +  </div> + +  <div class="submenu"> +    <label>Communication</label> +    <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits"> +      llvm-commits List +    </a> +    <a href="http://groups.google.com/group/polly-dev">polly-dev List</a> +    <a href="http://llvm.org/bugs/">Bug Reports</a> +  </div> + +  <div class="submenu"> +    <label>The Code</label> +    <a href="get_started.html#prerequisites">Prerequisites</a> +    <a href="get_started.html#source">Download</a> +    <a href="get_started.html#build">Build</a> +    <a href="http://llvm.org/viewvc/llvm-project/polly/trunk/"> +      Browse (ViewVC) +    </a> +    <a href="http://repo.or.cz/w/polly-mirror.git">Browse (GitWeb)</a> +  </div> +</div> diff --git a/polly/www/passes.html b/polly/www/passes.html new file mode 100644 index 00000000000..d53ccba6b2f --- /dev/null +++ b/polly/www/passes.html @@ -0,0 +1,68 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"  +          "http://www.w3.org/TR/html4/strict.dtd"> +<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> +<html> +<head> +  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> +  <title>Polly - The available LLVM passes</title> +  <link type="text/css" rel="stylesheet" href="menu.css"> +  <link type="text/css" rel="stylesheet" href="content.css"> +</head> +<body> +<!--#include virtual="menu.html.incl"--> +<div id="content"> +  <!--*********************************************************************--> +  <h1>Polly: The available LLVM passes</h1> +  <!--*********************************************************************--> + +  <p>Polly consists of a set of LLVM passes.  </p> + +<h2>Front End</h2> +<ul> +<li><em>polly-prepare</em> Prepare code for Polly</li> +<li><em>polly-region-simplify</em> Transform refined regions into simple regions</li> +<li><em>polly-detect</em> Detect SCoPs in functions</li> +<li><em>polly-analyze-ir</em> Analyse the LLVM-IR in the detected SCoPs</li> +<li><em>polly-independent</em> Create independent blocks</li> +<li><em>polly-scops</em> Create polyhedral description of SCoPs</li> +</ul> +<h2>Middle End</h2> +<ul> +<li><em>polly-dependences</em> Calculate the dependences in a SCoPs</li> +<li><em>polly-interchange</em> Perform loop interchange (work in progress)</li> +<li><em>polly-optimize</em> Optimize the SCoP using PoCC</li> +<li>Import/Export +<ul> +<li><em>polly-export-cloog</em> Export the CLooG input file +(Writes a .cloog file for each SCoP)</li> +<li><em>polly-export</em> Export SCoPs with OpenScop library +(Writes a .scop file for each SCoP)</li> +<li><em>polly-import</em> Import SCoPs with OpenScop library +(Reads a .scop file for each SCoP)</li> +<li><em>polly-export-scoplib</em> Export SCoPs with ScopLib library +(Writes a .scoplib file for each SCoP)</li> +<li><em>polly-import-scoplib</em> Import SCoPs with ScopLib library +(Reads a .scoplib file for each SCoP)</li> +<li><em>polly-export-jscop</em> Export SCoPs as JSON +(Writes a .jscop file for each SCoP)</li> +<li><em>polly-import-jscop</em> Import SCoPs from JSON +(Reads a .jscop file for each SCoP)</li> +</ul> +</li> +<li>Graphviz +<ul> +<li><em>dot-scops</em> Print SCoPs of function</li> +<li><em>dot-scops-only</em> Print SCoPs of function (without function bodies)</li> +<li><em>view-scops</em> View SCoPs of function</li> +<li><em>view-scops-only</em> View SCoPs of function (without function bodies)</li> +</ul></li> +</ul> +<h2>Back End</h2> +<ul> +<li><em>polly-cloog</em> Execute CLooG code generation</li> +<li><em>polly-codegen</em> Create LLVM-IR from the polyhedral information</li> +</ul> + +</div> +</body> +</html> diff --git a/polly/www/performance.html b/polly/www/performance.html new file mode 100644 index 00000000000..0d4475b92bd --- /dev/null +++ b/polly/www/performance.html @@ -0,0 +1,66 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"  +          "http://www.w3.org/TR/html4/strict.dtd"> +<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> +<html> +<head> <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> +  <title>Polly - Performance</title> +  <link type="text/css" rel="stylesheet" href="menu.css"> +  <link type="text/css" rel="stylesheet" href="content.css"> +</head> +<body> +<!--#include virtual="menu.html.incl"--> +<div id="content"> +<h1>Polly: Performance</h1> + +<p>To evaluate the performance benefits Polly currently provides we compiled the +<a href="http://www.cse.ohio-state.edu/~pouchet/software/polybench/">Polybench +2.0</a> benchmark suite.  Each benchmark was run with double precision floating +point values on an Intel Core Xeon X5670 CPU @ 2.93GHz (12 cores, 24 thread) +system. We used <a href="http://pocc.sf.net">PoCC</a> and the included <a +href="http://pluto-compiler.sf.net">Pluto</a> transformations to optimize the +code. The source code of Polly and LLVM/clang was checked out on  +25/03/2011.</p> + +<p>The results shown were created fully automatically without manual +interaction. We did not yet spend any time to tune the results. Hence +further improvments may be achieved by tuning the code generated by Polly, the +heuristics used by Pluto or by investigating if more code could be optimized. +As Pluto was never used at such a low level, its heuristics are probably +far from perfect. Another area where we expect larger performance improvements +is the SIMD vector code generation. At the moment, it rarely yields to +performance improvements, as we did not yet include vectorization in our +heuristics. By changing this we should be able to significantly increase the +number of test cases that show improvements.</p> + +<p>The polybench test suite contains computation kernels from linear algebra +routines, stencil computations, image processing and data mining. Polly +recognices the majority of them and is able to show good speedup. However, +to show similar speedup on larger examples like the SPEC CPU benchmarks Polly +still misses support for integer casts, variable-sized multi-dimensional arrays +and probably several other construts. This support is necessary as such +constructs appear in larger programs, but not in our limited test suite. + +<h2> Sequential runs</h2> + +For the sequential runs we used Polly to create a program structure that is +optimized for data-locality. One of the major optimizations performed is tiling. +The speedups shown are without the use of any multi-core parallelism. No +additional hardware is used, but the single available core is used more +efficiently. +<h3> Small data size</h3> +<img src="images/performance/sequential-small.png" /><br /> +<h3> Large data size</h3> +<img src="images/performance/sequential-large.png" /> +<h2> Parallel runs</h2> +For the parallel runs we used Polly to expose parallelism and to add calls to an +OpenMP runtime library. With OpenMP we can use all 12 hardware cores +instead of the single core that was used before. We can see that in several +cases we obtain more than linear speedup. This additional speedup is due to +improved data-locality. +<h3> Small data size</h3> +<img src="images/performance/parallel-small.png" /><br /> +<h3> Large data size</h3> +<img src="images/performance/parallel-large.png" /> +</div> +</body> +</html> diff --git a/polly/www/publications.html b/polly/www/publications.html new file mode 100644 index 00000000000..afde2b1346a --- /dev/null +++ b/polly/www/publications.html @@ -0,0 +1,43 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"  +          "http://www.w3.org/TR/html4/strict.dtd"> +<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> +<html> +<head> +  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> +  <title>Polly - Publications</title> +  <link type="text/css" rel="stylesheet" href="menu.css"> +  <link type="text/css" rel="stylesheet" href="content.css"> +</head> +<body> +<!--#include virtual="menu.html.incl"--> +<div id="content"> +  <!--*********************************************************************--> +  <h1>Polly: Publications</h1> +  <!--*********************************************************************--> + +  <h2> 2011 </h2> +  <ul> +  <li>Polly - Polyhedral Optimization in LLVM<br /> +  Tobias Grosser, Hongbin Zheng, Ragesh Aloor, Andreas Simbürger, Armin +  Größlinger, Louis-Noël Pouchet<br /> +  IMPACT at CGO 2011 <br /> +  <a +  href="publications/grosser-impact-2011.pdf">Paper</a>, <a +  href="publications/grosser-impact-2011-slides.pdf">Slides +  </a> +  </li> +  </ul> +  <h2> 2010 </h2> +  <ul> +  <li>Polly - Polyhedral Transformations for LLVM<br /> +  Tobias Grosser, Hongbin Zheng<br /> +  LLVM Developer Metting <br /><a +  href="http://llvm.org/devmtg/2010-11/Grosser-Polly.pdf">Slides</a>, <a +  href="http://llvm.org/devmtg/2010-11/videos/Grosser_Polly-desktop.mp4">Video +  (Computer)</a>, <a +  href="http://llvm.org/devmtg/2010-11/videos/Grosser_Polly-mobile.mp4">Video +  (Mobile)</a></li> +  </ul> +</div> +</body> +</html> diff --git a/polly/www/publications/grosser-impact-2011-slides.pdf b/polly/www/publications/grosser-impact-2011-slides.pdfBinary files differ new file mode 100644 index 00000000000..3e604108a8d --- /dev/null +++ b/polly/www/publications/grosser-impact-2011-slides.pdf diff --git a/polly/www/publications/grosser-impact-2011.pdf b/polly/www/publications/grosser-impact-2011.pdfBinary files differ new file mode 100644 index 00000000000..9b79bd26fa0 --- /dev/null +++ b/polly/www/publications/grosser-impact-2011.pdf diff --git a/polly/www/todo.html b/polly/www/todo.html new file mode 100644 index 00000000000..a8b589d3b62 --- /dev/null +++ b/polly/www/todo.html @@ -0,0 +1,363 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"  +          "http://www.w3.org/TR/html4/strict.dtd"> +<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> +<html> +<head> <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> +  <title>Polly - Todo</title> +  <link type="text/css" rel="stylesheet" href="menu.css"> +  <link type="text/css" rel="stylesheet" href="content.css"> +</head> +<body> +<!--#include virtual="menu.html.incl"--> +<div id="content"> +<h3> Setup infrastructure at LLVM </h3> + +<p>We are currently moving to the LLVM infrastructure +</p> +<table class="wikitable" cellpadding="2"> + +<tbody><tr> +<th width="400px"> Task +</th><th width="150px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Move to LLVM svn +</th><td align="center"> done +</td><td> Tobias + +</td></tr> +<tr> +<th align="left"> Get git mirror at llvm.org/git/polly.git +</th><td align="center"> done +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Setup on commit mails +</th><td align="center"> +</td><td> Tobias +</td></tr> +<tr> + +<th align="left"> Add LLVM Bugzilla category +</th></tr> +<tr> +<th align="left"> Create polly.llvm.org website +</th><td> +</td></tr> +<tr> +<th align="left"> Add Polly buildbot that runs 'polly-test' +</th><td> +</td></tr> +<tr> +<th align="left"> Run nightly performance/coverage tests with the llvm +test-suite +</th><td> + +</td></tr> +</tbody></table> +<h3> Phase 2 </h3> +<p>The second phase of Polly can build on a robust, but very limited framework. +In this phase work on removing limitations and extending the framework is +planned. Also we plan the first very simple transformations. Furthermore the +build system will be improved to simplify deployment. +</p> +<table class="wikitable" cellpadding="2"> + +<tbody><tr> + +<th colspan="3" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> Frontend +</th></tr> +<tr> +<th width="400px"> Task +</th><th width="150px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Support for casts in expressions +</th><td> +</td><td> +</td><td> +</td></tr> + +<tr> +<th align="left"> Support multi dimensional arrays. +</th><td align="center"> planning +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Alias sets +</th></tr> +<tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> Middle end +</th></tr> +<tr> + +<th width="400px"> Task +</th><th width="80px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Implement ISL dependency analysis pass +</th><td align="center"> working +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Connect pocc/pluto + +</th><td align="center"> working +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Finish OpenSCoP support +</th><td> +</td><td> +</td><td> +</td></tr> + +<tr> +<th align="left"> Add SCoPLib 0.2 support to connect pocc +</th><td align="center">done + +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Write simple loop blocking +</th><td> +</td><td> +</td><td> +</td></tr> +<tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> Backend +</th></tr> +<tr> +<th width="400px"> Task + +</th><th width="80px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Code generation for non 64bit targets +</th><td> +</td><td> +</td><td> +</td></tr> +<tr> +<th align="left"> Add write support for data access functions +</th><td> +</td><td> +</td><td> + +</td></tr> +<tr> +<th align="left"> Create vector loops +</th><td align="center">70% done +</td><td>Tobias +</td></tr> +<tr> +<th align="left">Create OpenMP +loops +</th><td align="center">90% done +</td><td> Raghesh & Tobias + +</td></tr> +<tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> General tasks +</th></tr> +<tr> +<th width="300px"> Task +</th><th width="80px"> Status +</th><th>Owner +</th></tr> + +<tr> +<th align="left"> Commit RegionPass patch upstream +</th><td align="center"> done + +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Build against an installed LLVM +</th><td> works for cmake +</td><td> ether +</td></tr> +<tr> +<th align="left"> Setup buildbot regression testers using LNT +</th><td> +</td><td> +</td><td> + +</td></tr> +</tbody></table> +<h3>Phase 1 - Get something +working (Finished October 2010)</span></h3> +<p>The first iteration of this project aims to create a minimal working version +of this framework, that is capable to transform an LLVM-IR program to the +polyhedral model +and back to LLVM-IR without applying any transformations. +</p> +<table class="wikitable" cellpadding="2"> + +<tbody><tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> Frontend +</th></tr> + +<tr> +<th width="300px"> Task +</th><th width="150px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Region detection +</td><td> works + Committed upstream +</td><td>Ether +</td></tr> +<tr> +<th align="left"> Access Functions +</td><td> Working +</td><td> John + Ether +</td></tr> +<tr> +<th align="left"> Alias sets +</td><td> Still open +</td></tr> +<tr> +<th align="left"> Scalar evolution to affine expression +</td><td> Done + +</td><td> +Ether +</td></tr> +<tr> +<th align="left"> SCoP extraction +</td><td> Working +</td><td>Tobias + Ether + +</td></tr> +<tr> +<th align="left"> SCoPs to polyhedral model +</td><td> Working +</td><td>Tobias + Ether +</td></tr> +<tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> Middle end + +</th></tr> +<tr> +<th width="300px"> Task +</th><th width="80px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Define polyhedral description +</td><td> Working +</td><td>Tobias + +</td></tr> +<tr> +<th align="left"> Import/Export using openscop +</td><td> working +</td><td>Tobias +</td></tr> +<tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> Backend +</th></tr> +<tr> + +<th width="300px"> Task +</th><th width="80px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Create LLVM-IR using CLooG +</td><td> Working +</td><td> Tobias + +</td></tr> +<tr> +<th colspan="4" style="background: none repeat scroll 0% 0% rgb(239, 239, +239);"> General tasks +</th></tr> +<tr> +<th width="300px"> Task +</th><th width="80px"> Status +</th><th>Owner +</th></tr> +<tr> +<th align="left"> Setup git repositories + +</td><td> Done +</td><td> Tobias +</td></tr> +<tr> +<th align="left"> Add CLooG/isl to build system +</td><td> Works on Unix +</td><td> Tobias + +</td></tr></tbody></table> +<h3>Further projects </h3> +<p>There are several great projects related to polly that can already be started +or are possible in the near future.  </p> +<h4>Extend the post dominance analysis for infinite loops (small)</h4> + +<p>At the moment the post dominance analysis cannot handle infinite loops. All +basic blocks in the CFG that do not return are - at the moment - not part of the +post dominance tree. +However by adding some virtual edges, they could be added to the post dominator +tree. Where to add the edges needs some research. +</p><p>This is a small project, that is is well defined. As it is directly in +LLVM it can be easily committed upstream. It is useful for polly, as the +RegionInfo pass will be able to detect regions in parts of the CFG that never +return. +</p><p><i>A good starter to get into LLVM</i> +</p> +<h4>Vectorization </h4> +<p>It is planned to use Polly to support vectorization in LLVM. +</p><p>The basic idea is to use Polly and the polyhedral tools to transform code +such that the innermost loops can be executed in parallel. Afterwards during +code generation in LLVM the loops will be created using vector instructions. +</p><p>To start we plan to use <a href="http://pluto-compiler.sf.net" +class="external text" title="http://pluto-compiler.sf.net" +rel="nofollow">Pluto</a> to transform the loop nests. Pluto can generate vector +parallel code and annotate the vector parallel loops. Some impressive results +were shown on code that was afterwards vectorized by the icc enforcing +vectorization. We believe LLVM can do even better, as it can interact directly +with the polyhedral information. + +</p><p>As an example simple matrix multiplication: +</p> +<pre> +for(i=0; i<M; i++) +  for(j=0; j<N; j++) +    for(k=0; k<K; k++) +      C[i][j] = beta*C[i][j] + alpha*A[i][k] * B[k][j]; +</pre> +<p>After plutos transformations with added tiling and +vectorization hints: +</p> +<pre> +if ((K >= 1) && (M >= 1) && (N >= 1)) +  for (t1=0;t1<=floord(M-1,32);t1++) +    for (t2=0;t2<=floord(N-1,32);t2++) +      for (t3=0;t3<=floord(K-1,32);t3++) +        for (t4=32*t1;t4<=min(M-1,32*t1+31);t4++) +          for (t5=32*t3;t5<=min(K-1,32*t3+31);t5++) { +            lbv=32*t2; +            ubv=min(N-1,32*t2+31); +            #pragma ivdep +            #pragma vector always +            for (t6=lbv; t6<=ubv; t6++) +              C[t4][t6]=beta*C[t4][t6]+alpha*A[t4][t5]*B[t5][t6];; +         } +</pre> +<p>In this example the innermost loop is parallel without any dependencies. </p> +</div> +</body> +</html> | 

