| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.
llvm-svn: 273437
|
|
|
|
|
|
|
|
|
|
| |
Instead of using 0 or NULL use the C++11 nullptr symbol when referencing null
pointers.
This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.
llvm-svn: 273435
|
|
|
|
|
|
| |
a band node.
llvm-svn: 273400
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel,
plus two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update.
In this change we create the BLIS micro-kernel by applying
a combination of tiling and unrolling. In subsequent changes
we will add the extraction of the BLIS macro-kernel
and implement the packing transformation.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D21140
llvm-svn: 273397
|
|
|
|
| |
llvm-svn: 272515
|
|
|
|
|
|
| |
by "&&", "||".
llvm-svn: 272168
|
|
|
|
|
|
| |
contains a correct partial schedule
llvm-svn: 271780
|
|
|
|
|
|
|
|
|
|
|
| |
circularShiftOutputDims
Reported-by: Mehdi Amini <mehdi.amini@apple.com>
Contributed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: http://reviews.llvm.org/D20969
llvm-svn: 271705
|
|
|
|
|
|
|
|
|
|
| |
multiplication
Fix small issues related to characters, operators and descriptions of tests.
Differential Revision: http://reviews.llvm.org/D20806
llvm-svn: 271264
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Created a new pass ScopInfoRegionPass. As name suggests, it is a
region pass and it is there to preserve compatibility with our
existing Polly passes. ScopInfoRegionPass will return a SCoP object
for a valid region while the creation of the SCoP stays in the
ScopInfo class.
Contributed-by: Utpal Bora <cs14mtech11017@iith.ac.in>
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Johannes Doerfert <doerfert@cs.uni-saarland.de>
Differential Revision: http://reviews.llvm.org/D20770
llvm-svn: 271259
|
|
|
|
|
|
|
|
|
| |
This header is required to make the ISO 646 alternative operator
spellings ("and", "or" instead of "&&", "||") work. Should these
operators be replaced by the standard ones as already suggested by
Johannes, also remove this #include again.
llvm-svn: 271206
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add determination of statements that contain, in particular,
matrix multiplications and can be optimized with [1] to try to
get close-to-peak performance. It can be enabled
via polly-pm-based-opts, which is false by default.
Refs:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D20575
llvm-svn: 271128
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a command line switch to set the
isl_options_set_schedule_outer_coincidence option. ISL then tries to
build schedules where the outer member of a band satisfies the
coincidence constraints.
In practice this allows loop skewing for more parallelism in inner
loops.
llvm-svn: 268222
|
|
|
|
| |
llvm-svn: 265882
|
|
|
|
|
|
| |
granularities.
llvm-svn: 262591
|
|
|
|
| |
llvm-svn: 261905
|
|
|
|
|
|
| |
This reverts commit 4d3753b9646a69c00d234ccd6e91dc3d0ea5d643.
llvm-svn: 261892
|
|
|
|
| |
llvm-svn: 261886
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use 'mark' nodes annotate a SIMD loop during ScheduleTransformation and skip
parallelism checks.
The buildbot shows the following compile/execution time changes:
Compile time:
Improvements Δ Previous Current σ
…/gesummv -6.06% 0.2640 0.2480 0.0055
…/gemver -4.46% 0.4480 0.4280 0.0044
…/covariance -4.31% 0.8360 0.8000 0.0065
…/adi -3.23% 0.9920 0.9600 0.0065
…/doitgen -2.53% 0.9480 0.9240 0.0090
…/3mm -2.33% 1.0320 1.0080 0.0087
Execution time:
Regressions Δ Previous Current σ
…/viterbi 1.70% 5.1840 5.2720 0.0074
…/smallpt 1.06% 12.4920 12.6240 0.0040
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D14491
llvm-svn: 261620
|
|
|
|
| |
llvm-svn: 256151
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We isolate full tiles from partial tiles to be able to, for example, vectorize
loops with parametric lower and/or upper bounds.
If we use -polly-vectorizer=stripmine, we can see execution-time improvements:
correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to
0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from
8m5565s to 7m4285s (-13.18 %).
The current full/partial tile separation increases compile-time more than
necessary. As a result, we see in compile time regressions, for example, for 3mm
from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected
as we generate more IR and consequently more time is spent in the LLVM backends.
However, a first investiagation has shown that a larger portion of compile time
is unnecessarily spent inside Polly's parallelism detection and could be
eliminated by propagating existing knowledge about vector loop parallelism.
Before enabling -polly-vectorizer=stripmine by default, it is necessary to
address this compile-time issue.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewers: jdoerfert, grosser
Subscribers: grosser, #polly
Differential Revision: http://reviews.llvm.org/D13779
llvm-svn: 250809
|
|
|
|
|
|
|
|
|
|
|
| |
Polly can now be used as a analysis only tool as long as the code
generation is disabled. However, we do not have an alternative to the
independent blocks pass in place yet, though in the relevant cases
this does not seem to impact the performance much. Nevertheless, a
virtual alternative that allows the same transformations without
changing the input region will follow shortly.
llvm-svn: 250652
|
|
|
|
| |
llvm-svn: 250392
|
|
|
|
|
|
|
|
| |
This will allow us to optimize C++ template code with Polly. This support is
mostly for debugging purpose and individual experiments. The ultimate goal is
still to run Polly later in the pass manager when inlining already happened.
llvm-svn: 250092
|
|
|
|
|
|
|
|
| |
Helper functions in the BlockGenerators.h/cpp introduce dependences
from the frontend to the backend of Polly. As they are used in
ScopDetection, ScopInfo, etc. we move them to the ScopHelper file.
llvm-svn: 249919
|
|
|
|
|
|
|
|
|
|
|
| |
The changes affect methods that are part of the Pass interface and
include:
- Comments that describe the methods purpose.
- A consistent use of the keywords override and virtual.
Additionally, the printScop method is now optional and removed from
SCoP passes that do not implement it.
llvm-svn: 248685
|
|
|
|
| |
llvm-svn: 248684
|
|
|
|
| |
llvm-svn: 247198
|
|
|
|
|
|
|
|
| |
Other passes which perform different optimizations might be interested in
also applying data-locality transformations as part of their overall
transformation.
llvm-svn: 245824
|
|
|
|
|
|
|
| |
Currently, marker nodes are ignored during AST generation, but visible in the
-debug-only=polly-ast output.
llvm-svn: 245809
|
|
|
|
|
|
|
| |
The previous commit lacked a test case for register tiling + pre-vectorization
and we obviously got it immediately wrong.
llvm-svn: 245599
|
|
|
|
|
|
|
|
|
|
| |
Register tiling in Polly is for now just an additional level of tiling which
is fully unrolled. It is disabled by default. To make this useful for more than
experiments, we still need a cost function as well as possibly further
optimizations that teach LLVM to actually put some of the values we got into
scalar registers.
llvm-svn: 245564
|
|
|
|
|
|
|
|
|
|
| |
By default we only use one level of tiling for loops, but in general tiling
for multiple levels is trivial for us. Hence, we add a set of options that
allow people to play with a second level of tiling. If this is profitable for
some cases we can work on heuristics that allow us to identify these cases
and use two-level tiling for them.
llvm-svn: 245563
|
|
|
|
| |
llvm-svn: 245559
|
|
|
|
| |
llvm-svn: 245558
|
|
|
|
| |
llvm-svn: 245557
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Polly uses 'prevectorization' to enable outer loop vectorization. When
vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop
iterations which are than interchanged to the innermost level such that LLVM's
inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this
loop. The number of loop iterations to strip-mine is now configurable with the
option -polly-prevect-width=<number-of-prevec-dims>.
This is mostly a debugging option. We should probably add a heuristic that
derives the number of prevectorization dimensions from the target data and
the data types used.
llvm-svn: 245424
|
|
|
|
|
|
| |
Instead of -polly-no-tiling, we use -polly-tiling=false to disable tiling.
llvm-svn: 245423
|
|
|
|
|
|
|
| |
We only need to allocate the tile size vector if we actually want to perform
a tiling.
llvm-svn: 245422
|
|
|
|
|
|
|
| |
Since Polly has now support for the code generation of scalar and PHI
dependences this code was unused and is now dropped.
llvm-svn: 245284
|
|
|
|
|
|
| |
This fixes compilation after LLVM commit r245193.
llvm-svn: 245211
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The July issue of TOPLAS contains a 50 page discussion of the AST generation
techniques used in Polly. This discussion gives not only an in-depth
description of how we (re)generate an imperative AST from our polyhedral based
mathematical program description, but also gives interesting insights about:
- Schedule trees: A tree-based mathematical program description that enables us
to perform loop transformations on an abstract level, while issues like the
generation of the correct loop structure and loop bounds will be taken care of
by our AST generator.
- Polyhedral unrolling: We discuss techniques that allow the unrolling of
non-trivial loops in the context of parameteric loop bounds, complex tile
shapes and conditionally executed statements. Such unrolling support enables
the generation of predicated code e.g. in the context of GPGPU computing.
- Isolation for full/partial tile separation: We discuss native support for
handling full/partial tile separation and -- in general -- native support for
isolation of boundary cases to enable smooth code generation for core
computations.
- AST generation with modulo constraints: We discuss how modulo mappings are
lowered to efficient C/LLVM code.
- User-defined constraint sets for run-time checks We discuss how arbitrary
sets of constraints can be used to automatically create run-time checks that
ensure a set of constrainst actually hold. This feature is very useful to
verify at run-time various assumptions that have been taken program
optimization.
Polyhedral AST generation is more than scanning polyhedra
Tobias Grosser, Sven Verdoolaege, Albert Cohen
ACM Transations on Programming Languages and Systems (TOPLAS), 37(4), July 2015
llvm-svn: 245157
|
|
|
|
|
|
|
| |
The function to which this commit applies has been removed in a
previous commit.
llvm-svn: 244450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: The splitExitBlock function is never called. Going to replace its functionality in successive patches that do not modify the IR.
Reviewers: grosser
Subscribers: pollydev
Projects: #polly
Differential Revision: http://reviews.llvm.org/D11865
llvm-svn: 244404
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Schedule trees are a lot easier to work with, for both humans and machines. For
humans the more structured schedule representation is easier to reason about.
Together with the more abstract isl programming interface this can result in a
lot cleaner code (see this changeset). For machines, the structured schedule and
the fact that we now use explicit piecewise affine expressions instead of
integer maps makes it easier to generate code from this schedule tree. As a
result, we can already see a slight compile-time improvement -- for 3mm from
0m0.593s to 0m0.551s seconds (-7 %). More importantly, future optimizations such
as full-partial tile separation will most likely result in more streamlined code
to be generated.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 243458
|
|
|
|
|
| |
Suggested-by: Sven Verdoolaege <skimo-polly@kotnet.org>
llvm-svn: 243254
|
|
|
|
|
| |
Contributed-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 243214
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of flat schedules, we now use so-called schedule trees to represent the
execution order of the statements in a SCoP. Schedule trees make it a lot easier
to analyze, understand and modify properties of a schedule, as specific nodes
in the tree can be choosen and possibly replaced.
This patch does not yet fully move our DependenceInfo pass to schedule trees,
as some additional performance analysis is needed here. (In general schedule
trees should be faster in compile-time, as the more structured representation
is generally easier to analyze and work with). We also can not yet perform the
reduction analysis on schedule trees.
For more information regarding schedule trees, please see Section 6 of
https://lirias.kuleuven.be/handle/123456789/497238
llvm-svn: 242130
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes old code that has been disabled since several weeks and was hidden
behind the flags -disable-polly-intra-scop-scalar-to-array=false and
-polly-model-phi-nodes=false. Earlier, Polly used to translate scalars and
PHI nodes to single element arrays, as this avoided the need for their special
handling in Polly. With Johannes' patches adding native support for such scalar
references to Polly, this code is not needed any more. After this commit both
-polly-prepare and -polly-independent are now mostly no-ops. Only a couple of
simple transformations still remain, but they are scheduled for removal too.
Thanks again to Johannes Doerfert for his nice work in making all this code
obsolete.
llvm-svn: 240766
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This version adds small integer optimization, but is not active by
default. It will be enabled in a later commit.
The schedule-fuse=min/max option has been replaced by the
serialize-sccs option. Adapting Polly was necessary, but retaining the
name polly-opt-fusion=min/max.
Differential Revision: http://reviews.llvm.org/D10505
Reviewers: grosser
llvm-svn: 240027
|