| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
"frame-pointer"="none" as cleanups after D56351
|
|
|
|
|
|
|
|
|
| |
compiler identification lines in test-cases.
(Doing so only because it's then easier to search for references which
are actually important and need fixing.)
llvm-svn: 351200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When checking the parallelism of a scheduling dimension, we first check if excluding reduction dependences the loop is parallel or not.
If the loop is not parallel, then we need to return the minimal dependence distance of all data dependences, including the previously subtracted reduction dependences.
Reviewers: grosser, Meinersbur, efriedma, eli.friedman, jdoerfert, bollu
Reviewed By: Meinersbur
Subscribers: llvm-commits, pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D45236
llvm-svn: 329214
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Splitting basic blocks into multiple statements if there are now
additional scalar dependencies gives more freedom to the scheduler, but
more statements also means higher compile-time complexity. Switch to
finer statement granularity, the additional compile time should be
limited by the number of operations quota.
The regression tests are written for the -polly-stmt-granularity=bb
setting, therefore we add that flag to those tests that break with the
new default. Some of the tests only fail because the statements are
named differently due to a basic block resulting in multiple statements,
but which are removed during simplification of statements without
side-effects. Previous commits tried to reduce this effect, but it is
not completely avoidable.
Differential Revision: https://reviews.llvm.org/D42151
llvm-svn: 324169
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In certain situations, the context in the isl_ast_build could result for the
min/max locations of our alias sets to become empty, which would cause an
internal error in isl, which is then unable to derive a value for these
expressions. Check these conditions before code generating expressions and
instead assume that alias check succeeded. This is valid, as the corresponding
memory accesses will not be executed under any valid context.
This fixed llvm.org/PR34432. Thanks to Qirun Zhang for reporting.
llvm-svn: 312455
|
|
|
|
|
|
| |
This possibly helps to avoid run-time check failures in the COSMO kernels.
llvm-svn: 311920
|
|
|
|
|
|
| |
This simplifies the test cases.
llvm-svn: 307645
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When providing the option "-polly-ast-print-accesses" Polly also prints the
memory accesses that are generated:
#pragma known-parallel
for (int c0 = 0; c0 <= 1023; c0 += 4)
#pragma simd
for (int c1 = c0; c1 <= c0 + 3; c1 += 1)
Stmt_for_body(
/* read */ &MemRef_B[0]
/* write */ MemRef_A[c1]
);
This makes writing and debugging memory layout transformations easier.
Based on a patch contributed by Thomas Lang (ETH Zurich)
llvm-svn: 307579
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Multi-disjunct access maps can easily result in inbound assumptions which
explode in case of many memory accesses and many parameters. This change reduces
compilation time of some larger kernel from over 15 minutes to less than 16
seconds.
Interesting is the test case test/ScopInfo/multidim_param_in_subscript.ll
which has a memory access
[n] -> { Stmt_for_body3[i0, i1] -> MemRef_A[i0, -1 + n - i1] }
which requires folding, but where only a single disjunct remains. We can still
model this test case even when only using limited memory folding.
For people only reading commit messages, here the comment that explains what
memory folding is:
To recover memory accesses with array size parameters in the subscript
expression we post-process the delinearization results.
We would normally recover from an access A[exp0(i) * N + exp1(i)] into an
array A[][N] the 2D access A[exp0(i)][exp1(i)]. However, another valid
delinearization is A[exp0(i) - 1][exp1(i) + N] which - depending on the
range of exp1(i) - may be preferrable. Specifically, for cases where we
know exp1(i) is negative, we want to choose the latter expression.
As we commonly do not have any information about the range of exp1(i),
we do not choose one of the two options, but instead create a piecewise
access function that adds the (-1, N) offsets as soon as exp1(i) becomes
negative. For a 2D array such an access function is created by applying
the piecewise map:
[i,j] -> [i, j] : j >= 0
[i,j] -> [i-1, j+N] : j < 0
After this patch we generate only the first case, except for situations where
we can proove the first case to be invalid and can consequently select the
second without introducing disjuncts.
llvm-svn: 296679
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this simplification for a loop nest:
void foo(long n1_a, long n1_b, long n1_c, long n1_d,
long p1_b, long p1_c, long p1_d,
float A_1[][p1_b][p1_c][p1_d]) {
for (long i = 0; i < n1_a; i++)
for (long j = 0; j < n1_b; j++)
for (long k = 0; k < n1_c; k++)
for (long l = 0; l < n1_d; l++)
A_1[i][j][k][l] += i + j + k + l;
}
the assumption:
n1_a <= 0 or (n1_a > 0 and n1_b <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d > 0 and
p1_b >= n1_b and p1_c >= n1_c and p1_d >= n1_d)
is taken rather than the simpler assumption:
p9_b >= n9_b and p9_c >= n9_c and p9_d >= n9_d.
The former is less strict, as it allows arbitrary values of p1_* in case, the
loop is not executed at all. However, in practice these precise constraints
explode when combined across different accesses and loops. For now it seems
to make more sense to take less precise, but more scalable constraints by
default. In case we find a practical example where more precise constraints
are needed, we can think about allowing such precise constraints in specific
situations where they help.
This change speeds up the new test case from taking very long (waited at least
a minute, but it probably takes a lot more) to below a second.
llvm-svn: 296456
|
|
|
|
|
|
|
| |
This update includes a couple more coalescing changes as well as a large
number of isl-internal code cleanups (dead assigments, ...).
llvm-svn: 295419
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change we used the name of the base pointer to mark reductions. This
is imprecise as the canonical reference is the ScopArray itself and not the
basepointer of a reduction. Using the base pointer of reductions is problematic
in cases where a single ScopArray is referenced through two different base
pointers.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294568
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Providing the context to the ast generator allows for additional simplifcations
and -- more importantly -- allows to generate loops with only partially bounded
domains, assuming the domains are bounded for all parameter configurations
that are valid as defined by the context.
This change fixes the crash reported in http://llvm.org/PR30956
The original reason why we did not include the context when generating an
AST was that CLooG and later isl used to sometimes transfer some of the
constraints that bound the size of parameters from the context into the
generated AST. This resulted in operations with very large constants, which
sometimes introduced problematic integer overflows. The latest versions of
the isl AST generator are careful to not introduce such constants.
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286442
|
|
|
|
| |
llvm-svn: 278673
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding a new pass PolyhedralInfo. This pass will be the interface to Polly.
Initially, we will provide the following interface:
- #IsParallel(Loop *L) - return a bool depending on whether the loop is
parallel or not for the given program order.
Patch by Utpal Bora <cs14mtech11017@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D21486
llvm-svn: 276637
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this update the isl AST generation extracts disjunctive constraints early
on. As a result, code that previously resulted in two branches with (close-to)
identical code within them:
if (P <= -1) {
for (int c0 = 0; c0 < N; c0 += 1)
Stmt_store(c0);
} else if (P >= 1)
for (int c0 = 0; c0 < N; c0 += 1)
Stmt_store(c0);
results now in only a single branch body:
if (P <= -1 || P >= 1)
for (int c0 = 0; c0 < N; c0 += 1)
Stmt_store(c0);
This resolves http://llvm.org/PR27559
Besides the above change, this isl update brings better simplification of
sets/maps containing existentially quantified dimensions and fixes a bug in
isl's coalescing.
llvm-svn: 272500
|
|
|
|
|
|
|
|
|
| |
As these test cases will be changed in a subsequent commit, we expand and
tighten them to make the subsequent changes to them more obvious. As part of
this we add more context to some test cases and add CHECK-NEXT lines to ensure
no intermediate lines are missed by accident.
llvm-svn: 272499
|
|
|
|
|
|
|
|
|
| |
Min/max expressions are easier to read and can in some cases also result in
more concise IR that is generated as the min/max --- when lowered to a
cmp+select pattern -- commonly has a simpler condition then the ternary
condition isl would normally generate.
llvm-svn: 268855
|
|
|
|
|
|
|
|
| |
Utilizing the record option for assumptions we can simplify the wrapping
assumption generation a lot. Additionally, we can now report locations
together with wrapping assumptions, though they might not be accurate yet.
llvm-svn: 266069
|
|
|
|
|
|
|
| |
Just an import to keep track with the latest version of isl. We are not looking
for specific features.
llvm-svn: 264452
|
|
|
|
|
|
|
| |
Delinearization is now enabled by default and does not need to explicitly need
to be enabled in our tests.
llvm-svn: 264154
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to speed up compile time and to avoid random timeouts we now
separately track assumptions and restrictions. In this context
assumptions describe parameter valuations we need and restrictions
describe parameter valuations we do not allow. During AST generation
we create a runtime check for both, whereas the one for the
restrictions is negated before a conjunction is build.
Except the In-Bounds assumptions we currently only track restrictions.
Differential Revision: http://reviews.llvm.org/D17247
llvm-svn: 262328
|
|
|
|
|
|
|
| |
From now on we bail only if a non-trivial alias group contains a non-affine
access, not when we discover aliasing and non-affine accesses are allowed.
llvm-svn: 261863
|
|
|
|
|
|
|
| |
This is the default since a long time. Setting it again does not add value
in any of these test cases.
llvm-svn: 253800
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case the original parameter instruction does not have a name, but it comes
from a load instruction where the base pointer has a name we used the name of
the load instruction to give some more intuition of where the parameter came
from. To ensure this works also through GEPs which may have complex offsets,
we originally just dropped the offsets and _only_ used the base pointer name.
As this can result in multiple parameters to get the same name, we now prefix
the parameter ID to ensure parameter names are unique. This will make it easier
to understand debug output.
This change does not affect correctness, as parameter IDs (even of the same
name) can always be distinguished through the SCEV pointer stored inside them.
llvm-svn: 253330
|
|
|
|
|
|
|
|
|
| |
If a SCoP contains error blocks we cannot use the domain constraints
to simplify the assumptions as the domain is already influenced by the
assumptions we took. Before this patch we did that and some assumptions
became self-fulfilling as they were implied by the domain constraints.
llvm-svn: 252424
|
|
|
|
|
|
|
|
| |
These flags are now always passed to all tests and need to be disabled if
not needed. Disabling these flags, rather than passing them to almost all
tests, significantly simplfies our RUN: lines.
llvm-svn: 249422
|
|
|
|
|
|
|
|
|
| |
If the GEP instructions give us enough insights, model scalar accesses as
multi-dimensional (and generate the relevant run-time checks to ensure
correctness). This will allow us to simplify the dependence computation in
a subsequent commit.
llvm-svn: 247906
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow to generate non-wrap assumptions for integer expressions
that are part of the SCoP. We compare the common isl representation of
the expression with one computed with modulo semantic. For all parameter
combinations they are not equal we can have integer overflows.
The nsw flags are respected when the modulo representation is computed,
nuw and nw flags are ignored for now.
In order to not increase compile time to much, the non-wrap assumptions
are collected in a separate boundary context instead of the assumed
context. This helps compile time as the boundary context can become
complex and it is therefor not advised to use it in other operations
except runtime check generation. However, the assumed context is e.g.,
used to tighten dependences. While the boundary context might help to
tighten the assumed context it is doubtful that it will help in practice
(it does not effect lnt much) as the boundary (or no-wrap assumptions)
only restrict the very end of the possible value range of parameters.
PET uses a different approach to compute the no-wrap context, though lnt runs
have shown that this version performs slightly better for us.
llvm-svn: 247732
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As we do not rely on ScalarEvolution any more we do not need to get
the backedge taken count. Additionally, our domain generation handles
everything that is affine and has one latch and our ScopDetection will
over-approximate everything else.
This change will therefor allow loops with:
- one latch
- exiting conditions that are affine
Additionally, it will not check for structured control flow anymore.
Hence, loops and conditionals are not necessarily single entry single
exit regions any more.
Differential Version: http://reviews.llvm.org/D12758
llvm-svn: 247289
|
|
|
|
| |
llvm-svn: 247198
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a region does not have more than one loop, we do not identify it as
a Scop in ScopDetection. The main optimizations Polly is currently performing
(tiling, preparation for outer-loop vectorization and loop fusion) are unlikely
to have a positive impact on individual loops. In some cases, Polly's run-time
alias checks or conditional hoisting may still have a positive impact, but those
are mostly enabling transformations which LLVM already performs for individual
loops. As we do not focus on individual loops, we leave them untouched to not
introduce compile time regressions and execution time noise. This results in
good compile time reduction (oourafft: -73.99%, smg2000: -56.25%).
Contributed-by: Pratik Bhatu <cs12b1010@iith.ac.in>
Reviewers: grosser
Differential Revision: http://reviews.llvm.org/D12268
llvm-svn: 246161
|
|
|
|
|
|
|
|
|
|
| |
Instead of generating code for an empty assumed context we bail out
early. As the number of assumptions we generate increases this becomes
more and more important. Additionally, this change will allow us to
hide internal contexts that are only used in runtime checks e.g., a
boundary context with constraints not suited for simplifications.
llvm-svn: 245540
|
|
|
|
|
|
|
|
|
| |
As specified in PR23888, run-time alias check generation is expensive
in terms of compile-time. This reduces the compile time by computing
minimal/maximal access only once for each base pointer
Contributed-by: Pratik Bhatu <cs12b1010@iith.ac.in>
llvm-svn: 243024
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of flat schedules, we now use so-called schedule trees to represent the
execution order of the statements in a SCoP. Schedule trees make it a lot easier
to analyze, understand and modify properties of a schedule, as specific nodes
in the tree can be choosen and possibly replaced.
This patch does not yet fully move our DependenceInfo pass to schedule trees,
as some additional performance analysis is needed here. (In general schedule
trees should be faster in compile-time, as the more structured representation
is generally easier to analyze and work with). We also can not yet perform the
reduction analysis on schedule trees.
For more information regarding schedule trees, please see Section 6 of
https://lirias.kuleuven.be/handle/123456789/497238
llvm-svn: 242130
|
|
|
|
|
|
|
|
| |
I just learned that target triples prevent test cases to be run on other
architectures. Polly test cases are until now sufficiently target independent
to not require any target triples. Hence, we drop them.
llvm-svn: 235384
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to delinerize code such as:
A[][n]
for (i
for (j
A[i][n-j-1] = ...
which would previously have been delinearize to an access A[i+1][-j-1].
To recover the correct access we apply the piecewise expression:
{ A[i][j] -> A[i-1][i+N]: i < 0; A[i][j] -> A[i][i]: i >= 0}
This approach generalizes to higher dimensions.
llvm-svn: 233566
|
|
|
|
| |
llvm-svn: 230796
|
|
|
|
| |
llvm-svn: 230784
|
|
|
|
|
|
|
| |
isl recently introduced a new interface to create run-time checks from
constraint sets. Use this interface to simplify our run-time check generation.
llvm-svn: 230640
|
|
|
|
|
|
|
|
|
|
| |
Scops that only read seem generally uninteresting and scops that only write are
most likely initializations where there is also little to optimize. To not
waste compile time we bail early.
Differential Revision: http://reviews.llvm.org/D7735
llvm-svn: 229820
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit imports the latest isl version into lib/External/isl. The changes
relavant for Polly are:
1) Schedule trees [1] have been introduced as a more structured way to
describe schedules. Polly does not yet use them, but we may switch to them
in the near future.
2) Another set of coalescing changes [2] simplifies some data dependences and
removes a couple of code generation artifacts.
We now understand that the following sets can be merged:
{ Stmt_S1[i0, i1] -> Stmt_S2[i0 + i1] :
i0 >= 0 and i1 <= 1023 - i0 and i1 >= 1
Stmt_S1[i0, 0] -> Stmt_S2[i0] : i0 <= 1023 and i0 >= 1}
into:
{ Stmt_S1[i0, i1] -> Stmt_S2[i0 + i1] : i1 <= 1023 - i0 and i1 >= 0 and
i1 >= 1 - i0 and i0 >= 0 }
Changes of this kind reduce unnecessary specialization during code
generation.
- for (int c3 = 0; c3 <= 1023; c3 += 1) {
- if (c3 % 2 == 0) {
- Stmt_for_body3(c1, c3);
- } else
- Stmt_for_body3(c1, c3);
- }
+ for (int c3 = 0; c3 <= 1023; c3 += 1)
+ Stmt_for_body3(c1, c3);
[1] http://impact.gforge.inria.fr/impact2014/papers/impact2014-verdoolaege.pdf
[2] http://impact.gforge.inria.fr/impact2015/papers/impact2015-verdoolaege.pdf
llvm-svn: 229423
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to skip ast and code generation if we did not optimize
a SCoP and will not generate parallel or alias annotations. The
initial heuristic to exit is simple but allows improvements later on.
All failing test cases have been modified to disable early exit, thus
to keep their coverage.
Differential Revision: http://reviews.llvm.org/D7254
llvm-svn: 228851
|
|
|
|
| |
llvm-svn: 227801
|
|
|
|
|
|
|
|
|
| |
Schedule dimensions that have the same constant value accross all statements do
not carry any information, but due to the increased dimensionality of the
schedule cost compile time. To not pay this cost, we remove constant dimensions
if possible.
llvm-svn: 225067
|
|
|
|
|
|
|
|
|
| |
Update tests for LLVM assembly format change in r224257 using the script
attached to PR21532. I'm hoping this unsticks the bot [1].
[1]: http://lab.llvm.org:8011/builders/polly-amd64-linux/builds/25432
llvm-svn: 224269
|
|
|
|
|
|
|
|
|
|
| |
Isl now specifically marks modulo operations that are compared against zero.
They can be implemented with the C/LLVM remainder operation.
We also update a couple of test cases where the output of isl has slightly
changed.
llvm-svn: 223607
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case a GEP instruction references into a fixed size array e.g., an access
A[i][j] into an array A[100x100], LLVM-IR does not guarantee that the subscripts
always compute values that are within array bounds. We now derive the set of
parameter values for which all accesses are within bounds and add the assumption
that the scop is only every executed with this set of parameter values.
Example:
void foo(float A[][20], long n, long m {
for (long i = 0; i < n; i++)
for (long j = 0; j < m; j++)
A[i][j] = ...
This loop yields out-of-bound accesses if m is at least 20 and at the same time
at least one iteration of the outer loop is executed. Hence, we assume:
n <= 0 or m <= 20.
Doing so simplifies the dependence analysis problem, allows us to perform
more optimizations and generate better code.
TODO: The location where the GEP instruction is executed is not necessarily the
location where the memory is actually accessed. As a result scanning for GEP[s]
is imprecise. Even though this is not a correctness problem, this imprecision
may result in missed optimizations or non-optimal run-time checks.
In polybench where this mismatch between parametric loop bounds and fixed size
arrays is common, we see with this patch significant reductions in compile time
(up to 50%) and execution time (up to 70%). We see two significant compile time
regressions (fdtd-2d, jacobi-2d-imper), and one execution time regression
(trmm). Both regressions arise due to additional optimizations that have been
enabled by this patch. They can be addressed in subsequent commits.
http://reviews.llvm.org/D6369
llvm-svn: 222754
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of parallelizing every parallel outermost loop, we now use a very
minimalistic cost model. Specifically, we assume innermost loops are not
worth parallelising and all non-innermost loops are.
When parallelizing all loops in LNT we got several slowdowns/timeouts due to
us parallelizing innermost loops that are executed only a couple of times
(number of iterations not known statically). With this basic heuristic enabled
LNT does not show any more timeouts, while several interesting loops are still
parallelized.
There are many ways to obtain an improved heuristic. Constructing such an
improvide heuristic from a position of minimal slow-down and zero code size
increase seems to be the best, as it allows us to track progress on LNT.
llvm-svn: 222096
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We introduces a new flag -polly-parallel and use it to annotate the for-nodes in
the isl ast that we want to execute thread parallel (e.g., using OpenMP). We
previously already emmitted openmp annotations, but we did this for various
kinds of parallel loops, including some which we can not run in parallel.
With this patch we now have three annotations:
1) #pragma known-parallel [reduction]
2) #pragma omp for
3) #pragma simd
meaning:
1) loop has no loop carried dependences
2) loop will be executed thread-parallel
3) loop can possibly be vectorized
This patch introduces 1) and reduces the use of 2) to only the cases where we
will actually generate thread parallel code.
It is in preparation of openmp code generation in our isl backend.
Legacy:
- We also have a command line option -enable-polly-openmp. This option controls
the OpenMP code generation in CLooG. It will become an alias of
-polly-parallel after the CLooG code generation has been dropped.
http://reviews.llvm.org/D6142
llvm-svn: 221479
|