summaryrefslogtreecommitdiffstats
path: root/polly/lib/Transform/ScheduleOptimizer.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Full/partial tile separation for vectorizationTobias Grosser2015-10-201-0/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We isolate full tiles from partial tiles to be able to, for example, vectorize loops with parametric lower and/or upper bounds. If we use -polly-vectorizer=stripmine, we can see execution-time improvements: correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to 0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from 8m5565s to 7m4285s (-13.18 %). The current full/partial tile separation increases compile-time more than necessary. As a result, we see in compile time regressions, for example, for 3mm from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected as we generate more IR and consequently more time is spent in the LLVM backends. However, a first investiagation has shown that a larger portion of compile time is unnecessarily spent inside Polly's parallelism detection and could be eliminated by propagating existing knowledge about vector loop parallelism. Before enabling -polly-vectorizer=stripmine by default, it is necessary to address this compile-time issue. Contributed-by: Roman Gareev <gareevroman@gmail.com> Reviewers: jdoerfert, grosser Subscribers: grosser, #polly Differential Revision: http://reviews.llvm.org/D13779 llvm-svn: 250809
* [NFC] Consistenly use commented and annotated ScopPass functionsJohannes Doerfert2015-09-271-3/+8
| | | | | | | | | | | The changes affect methods that are part of the Pass interface and include: - Comments that describe the methods purpose. - A consistent use of the keywords override and virtual. Additionally, the printScop method is now optional and removed from SCoP passes that do not implement it. llvm-svn: 248685
* [NFC] Use releaseMemory to release internal memoryJohannes Doerfert2015-09-271-4/+2
| | | | llvm-svn: 248684
* Make our data-locality schedule tree transforms externally accessibleTobias Grosser2015-08-241-141/+48
| | | | | | | | Other passes which perform different optimizations might be interested in also applying data-locality transformations as part of their overall transformation. llvm-svn: 245824
* Use marker nodes to annotate the different levels of tilingTobias Grosser2015-08-231-7/+26
| | | | | | | Currently, marker nodes are ignored during AST generation, but visible in the -debug-only=polly-ast output. llvm-svn: 245809
* Do really not unroll the vector loop in combination with register tilingTobias Grosser2015-08-201-3/+2
| | | | | | | The previous commit lacked a test case for register tiling + pre-vectorization and we obviously got it immediately wrong. llvm-svn: 245599
* Add experimental support for trivial register tilingTobias Grosser2015-08-201-0/+30
| | | | | | | | | | Register tiling in Polly is for now just an additional level of tiling which is fully unrolled. It is disabled by default. To make this useful for more than experiments, we still need a cost function as well as possibly further optimizations that teach LLVM to actually put some of the values we got into scalar registers. llvm-svn: 245564
* Add support for two-level tilingTobias Grosser2015-08-201-14/+34
| | | | | | | | | | By default we only use one level of tiling for loops, but in general tiling for multiple levels is trivial for us. Hence, we add a set of options that allow people to play with a second level of tiling. If this is profitable for some cases we can work on heuristics that allow us to identify these cases and use two-level tiling for them. llvm-svn: 245563
* Factor out check for tileable band node.Tobias Grosser2015-08-201-7/+27
| | | | llvm-svn: 245559
* Introduce tileBand function to simplify codeTobias Grosser2015-08-201-19/+31
| | | | llvm-svn: 245558
* Add some forgotten isl memory annotationsTobias Grosser2015-08-201-2/+3
| | | | llvm-svn: 245557
* Make prevectorization width configurableTobias Grosser2015-08-191-2/+8
| | | | | | | | | | | | | | | Polly uses 'prevectorization' to enable outer loop vectorization. When vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop iterations which are than interchanged to the innermost level such that LLVM's inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this loop. The number of loop iterations to strip-mine is now configurable with the option -polly-prevect-width=<number-of-prevec-dims>. This is mostly a debugging option. We should probably add a heuristic that derives the number of prevectorization dimensions from the target data and the data types used. llvm-svn: 245424
* Do not use negative option nameTobias Grosser2015-08-191-9/+5
| | | | | | Instead of -polly-no-tiling, we use -polly-tiling=false to disable tiling. llvm-svn: 245423
* Simplify tiling code a bitTobias Grosser2015-08-191-19/+15
| | | | | | | We only need to allocate the tile size vector if we actually want to perform a tiling. llvm-svn: 245422
* AST Generation Paper published in TOPLASTobias Grosser2015-08-151-7/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The July issue of TOPLAS contains a 50 page discussion of the AST generation techniques used in Polly. This discussion gives not only an in-depth description of how we (re)generate an imperative AST from our polyhedral based mathematical program description, but also gives interesting insights about: - Schedule trees: A tree-based mathematical program description that enables us to perform loop transformations on an abstract level, while issues like the generation of the correct loop structure and loop bounds will be taken care of by our AST generator. - Polyhedral unrolling: We discuss techniques that allow the unrolling of non-trivial loops in the context of parameteric loop bounds, complex tile shapes and conditionally executed statements. Such unrolling support enables the generation of predicated code e.g. in the context of GPGPU computing. - Isolation for full/partial tile separation: We discuss native support for handling full/partial tile separation and -- in general -- native support for isolation of boundary cases to enable smooth code generation for core computations. - AST generation with modulo constraints: We discuss how modulo mappings are lowered to efficient C/LLVM code. - User-defined constraint sets for run-time checks We discuss how arbitrary sets of constraints can be used to automatically create run-time checks that ensure a set of constrainst actually hold. This feature is very useful to verify at run-time various assumptions that have been taken program optimization. Polyhedral AST generation is more than scanning polyhedra Tobias Grosser, Sven Verdoolaege, Albert Cohen ACM Transations on Programming Languages and Systems (TOPLAS), 37(4), July 2015 llvm-svn: 245157
* Rewrite getPrevectorMap using schedule trees operationsTobias Grosser2015-07-281-82/+41
| | | | | | | | | | | | | | | | Schedule trees are a lot easier to work with, for both humans and machines. For humans the more structured schedule representation is easier to reason about. Together with the more abstract isl programming interface this can result in a lot cleaner code (see this changeset). For machines, the structured schedule and the fact that we now use explicit piecewise affine expressions instead of integer maps makes it easier to generate code from this schedule tree. As a result, we can already see a slight compile-time improvement -- for 3mm from 0m0.593s to 0m0.551s seconds (-7 %). More importantly, future optimizations such as full-partial tile separation will most likely result in more streamlined code to be generated. Contributed-by: Roman Gareev <gareevroman@gmail.com> llvm-svn: 243458
* Simplify some isl expression we useTobias Grosser2015-07-261-3/+1
| | | | | Suggested-by: Sven Verdoolaege <skimo-polly@kotnet.org> llvm-svn: 243254
* Prevectorize the schedule of the band (or the point loop in case of tiling)Tobias Grosser2015-07-251-10/+12
| | | | | Contributed-by: Roman Gareev <gareevroman@gmail.com> llvm-svn: 243214
* Use schedule trees to represent execution order of statementsTobias Grosser2015-07-141-31/+26
| | | | | | | | | | | | | | | | | | Instead of flat schedules, we now use so-called schedule trees to represent the execution order of the statements in a SCoP. Schedule trees make it a lot easier to analyze, understand and modify properties of a schedule, as specific nodes in the tree can be choosen and possibly replaced. This patch does not yet fully move our DependenceInfo pass to schedule trees, as some additional performance analysis is needed here. (In general schedule trees should be faster in compile-time, as the more structured representation is generally easier to analyze and work with). We also can not yet perform the reduction analysis on schedule trees. For more information regarding schedule trees, please see Section 6 of https://lirias.kuleuven.be/handle/123456789/497238 llvm-svn: 242130
* Update ISL to isl-0.15-3-g532568aMichael Kruse2015-06-181-5/+5
| | | | | | | | | | | | | | This version adds small integer optimization, but is not active by default. It will be enabled in a later commit. The schedule-fuse=min/max option has been replaced by the serialize-sccs option. Adapting Polly was necessary, but retaining the name polly-opt-fusion=min/max. Differential Revision: http://reviews.llvm.org/D10505 Reviewers: grosser llvm-svn: 240027
* Dump YAML schedule tree as properly indented tree in DEBUG outputTobias Grosser2015-05-301-1/+8
| | | | llvm-svn: 238645
* Update isl to 93b8e43dTobias Grosser2015-05-281-1/+1
| | | | | | | This update brings mostly interface cleanups, but also fixes two bugs in imath (a memory leak, some undefined behavior). llvm-svn: 238422
* Use value semantics for list of ScopStmt(s) instead of std::owningptrTobias Grosser2015-05-271-4/+4
| | | | | | | | | | | | | | | | David Blaike suggested this as an alternative to the use of owningptr(s) for our memory management, as value semantics allow to avoid the additional interface complexity caused by owningptr while still providing similar memory consistency guarantees. We could also have used a std::vector, but the use of std::vector would yield possibly changing pointers which currently causes problems as for example the memory accesses carry pointers to their parent statements. Such pointers should not change. Reviewer: jblaikie, jdoerfert Differential Revision: http://reviews.llvm.org/D10041 llvm-svn: 238290
* Use unique_ptr to clarify ownership of ScopStmtTobias Grosser2015-05-231-1/+1
| | | | llvm-svn: 238090
* Replace low-level constraint building with higher level functionsTobias Grosser2015-05-211-13/+6
| | | | | | | | Instead of explicitly building constraints and adding them to our maps we now use functions like map_order_le to add the relevant information to the maps. llvm-svn: 237934
* Add explicit #includes for used isl featuresTobias Grosser2015-05-091-0/+2
| | | | llvm-svn: 236931
* Sort include directivesTobias Grosser2015-05-091-7/+7
| | | | | | | | | | Upcoming revisions of isl require us to include header files explicitly, which have previously been already transitively included. Before we add them, we sort the existing includes. Thanks to Chandler for sort_includes.py. A simple, but very convenient script. llvm-svn: 236930
* Rename 'scattering' to 'schedule'Tobias Grosser2015-04-211-1/+1
| | | | | | | | | | | | | | | | In Polly we used both the term 'scattering' and the term 'schedule' to describe the execution order of a statement without actually distinguishing between them. We now uniformly use the term 'schedule' for the execution order. This corresponds to the terminology of isl. History: CLooG introduced the term scattering as the generated code can be used as a sequential execution order (schedule) or as a parallel dimension enumerating different threads of execution (placement). In Polly and/or isl the term placement was never used, but we uniformly refer to an execution order as a schedule and only later introduce parallelism. When doing so we do not talk about about specific placement dimensions. llvm-svn: 235380
* Make -polly-no-tiling work againTobias Grosser2015-04-051-1/+8
| | | | llvm-svn: 234125
* Do not scale tile loopsTobias Grosser2015-03-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We now generate tile loops as: for (int c1 = 0; c1 <= 47; c1 += 1) for (int c2 = 0; c2 <= 47; c2 += 1) for (int c3 = 0; c3 <= 31; c3 += 1) for (int c4 = 0; c4 <= 31; c4 += 4) #pragma simd for (int c5 = c4; c5 <= c4 + 3; c5 += 1) Stmt_for_body3(32 * c1 + c3, 32 * c2 + c5); instead of for (int c1 = 0; c1 <= 1535; c1 += 32) for (int c2 = 0; c2 <= 1535; c2 += 32) for (int c3 = 0; c3 <= 31; c3 += 1) for (int c4 = 0; c4 <= 31; c4 += 4) #pragma simd for (int c5 = c4; c5 <= c4 + 3; c5 += 1) Stmt_for_body3(c1 + c3, c2 + c5); Run-time performance-wise this makes little difference, but this gives a large reduction in compile time (10-30% on 17 LNT benchmarks). Apparently the isl AST generator is not yet very efficient in generating the latter. llvm-svn: 233675
* Use schedule trees to perform post-scheduling transformationsTobias Grosser2015-03-221-220/+76
| | | | | | | | | | | | | | | | | | | Replacing the old band_tree based code with code that is based on the new schedule tree [1] interface makes applying complex schedule transformations a lot more straightforward. We now do not need to reason about the meaning of flat schedules, but can use a more straightforward tree structure. We do not yet exploit this a lot in the current code, but hopefully we will be able to do so soon. This change also allows us to drop some code, as isl now provides some higher level interfaces to apply loop transformations such as tiling. This change causes some small test case changes as isl uses a slightly different way to perform loop tiling, but no significant functional changes are intended. [1] http://impact.gforge.inria.fr/impact2014/papers/impact2014-verdoolaege.pdf llvm-svn: 232911
* Add some missing __isl_give/__isl_keep annotationsTobias Grosser2015-03-191-7/+10
| | | | llvm-svn: 232711
* Create a dependence struct to hold dependence information for a SCoP.Johannes Doerfert2015-03-051-11/+11
| | | | | | | | | | | The new Dependences struct in the DependenceInfo holds all information that was formerly part of the DependenceInfo. It also provides the same interface for the user to access this information. This is another step to a more general ScopPass interface that does allow multiple SCoPs to be "in flight". llvm-svn: 231327
* Rename the Dependences pass to DependenceInfo [NFC]Johannes Doerfert2015-03-041-11/+11
| | | | | | | | | | We rename the Dependences pass to DependenceInfo as a first step to a caching pass policy. The new DependenceInfo pass will later provide "Dependences" for a SCoP. To keep consistency the test folder is renamed too. llvm-svn: 231308
* [Refactor] Use virtual and override appropriatelyJohannes Doerfert2015-03-011-4/+4
| | | | | | | + Add override for overwritten methods. + Remove virtual for methods we do not want to be overwritten. llvm-svn: 230898
* [Refactor] Add a Scop & as argument to printScopJohannes Doerfert2015-03-011-2/+2
| | | | | | This is the first step in the interface simplification. llvm-svn: 230897
* Do not try to optimize empty SCoPs.Johannes Doerfert2015-02-141-0/+8
| | | | llvm-svn: 229253
* Add early exits for SCoPs we did not optimizeJohannes Doerfert2015-02-111-3/+37
| | | | | | | | | | | | | This allows us to skip ast and code generation if we did not optimize a SCoP and will not generate parallel or alias annotations. The initial heuristic to exit is simple but allows improvements later on. All failing test cases have been modified to disable early exit, thus to keep their coverage. Differential Revision: http://reviews.llvm.org/D7254 llvm-svn: 228851
* Drop an unused parameterTobias Grosser2015-01-211-5/+3
| | | | llvm-svn: 226739
* Use stringFromIslObj instead of isl_..._dump to print to dbgs()Tobias Grosser2014-10-221-6/+5
| | | | | | | | | This makes sure we consistently use dbgs() when printing debug output. Previously, the code just mixed calls to isl_*_dump() with printing to dbgs() and was relying for both methods to interact in predictable ways (same output stream, no unexpected reordering of outputs). llvm-svn: 220443
* [Fix] isl usage errors in ScheduleOptimizerJohannes Doerfert2014-08-201-3/+3
| | | | llvm-svn: 216084
* clang-format polly to avoid buildbot noiseTobias Grosser2014-07-091-36/+34
| | | | llvm-svn: 212609
* [C++11] Use more range based forsTobias Grosser2014-06-281-6/+4
| | | | llvm-svn: 211981
* Added option for n-dimensional rectangular tilingJohannes Doerfert2014-05-281-9/+23
| | | | | | | | | | | | | | + CL-option --polly-tile-sizes=<int,...,int> The i'th value is used as a tile size for dimension i, if there is no i'th value, the value of --polly-default-tile-size is used + CL-option --polly-default-tile-size=int Used if no tile size is given for a dimension i + 3 Simple testcases llvm-svn: 209753
* [Modules] Fix potential ODR violations by sinking the DEBUG_TYPEChandler Carruth2014-04-221-2/+2
| | | | | | | | | | definition below all of the header #include lines, Polly edition. If you want to know more details about this, you can see the recent commits to Debug.h in LLVM. This is just the Polly segment of a cleanup I'm doing globally for this macro. llvm-svn: 206852
* [C++11] Use nullptrTobias Grosser2014-04-161-3/+3
| | | | llvm-svn: 206361
* Fixed gcc build warningsTobias Grosser2014-04-111-1/+3
| | | | | | | + vim 'fixed' line endings in json_value.cpp Contributed-by: Johannes Doerfert <doerfert@cs.uni-saarland.de> llvm-svn: 206044
* clang-format: Remove empty linesTobias Grosser2014-03-211-1/+0
| | | | llvm-svn: 204468
* Allow several polly command line options to be provided multiple timesTobias Grosser2014-03-131-7/+10
| | | | | Contributed-by: Sam Novak <snovak@uwsp.edu> llvm-svn: 203869
* Move transformations into own directoryAndreas Simbuerger2014-03-111-0/+610
Move all transformations into their own directory. CMakeLists are adjusted accordingly. llvm-svn: 203607
OpenPOWER on IntegriCloud