| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After loop versioning, a dominance check of a non-affine subregion's
exit node causes the dominance check to always fail on any block in the
subregion if it shares the same exit block with the scop. The
subregion's exit block has become polly_merge_new_and_old, which also
receives the control flow of the generated code. This would cause that
any value for implicit stores is assumed to be not from the scop.
We check dominance with the generated exit node instead.
This fixes llvm.org/PR25438
llvm-svn: 252375
|
|
|
|
| |
llvm-svn: 252302
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were adding all generated values in non-affine subregions to be used
for the subregions generated exit block. The thought was that only
values that are dominating the original exit block can be used there.
But it is possible for synthesizable values to be expanded in any
block. If the same values is also used for implicit writes, it would
try to reuse already synthesized values even if not dominating the exit
block.
The fix is to only add values to the list of values usable in the exit
block only if it is dominating the exit block. This fixes
llvm.org/PR25412.
llvm-svn: 252301
|
|
|
|
| |
llvm-svn: 252273
|
|
|
|
|
|
|
|
| |
Before this commit memory reference identifiers have only been unique per
basic block, but not per (non-affine) ScopStmt. This commit now uses the
MemoryAccess base pointer to uniquely identify each Memory access.
llvm-svn: 252200
|
|
|
|
|
|
|
|
|
|
|
| |
For generating scalar writes of non-affine subregions, all except phi
writes are generated in the exit block. The phi writes are generated in
the incoming block for which we errornously used the same BBMap. This
can conflict if a value for one block is synthesized, and then reused
for another block which is not dominated by the first block. This is
fixed by using block-specific BBMaps for phi writes.
llvm-svn: 252172
|
|
|
|
|
|
|
|
|
| |
To simplify and correct the preloading of a base pointer origin, e.g.,
the base pointer for the current indirect invariant load, we now just
check if there is an invariant access class that involves the base
pointer of the current class.
llvm-svn: 251962
|
|
|
|
|
|
|
|
| |
If a base pointer of a preloaded value has a base pointer origin, thus it is
an indirect invariant load, we have to make sure the base pointer origin is
preloaded first.
llvm-svn: 251946
|
|
|
|
|
|
|
|
|
| |
If a base pointer load is preloaded, we have change the base pointer of
the derived SAI. However, as the derived SAI relationship is is
coarse grained, we need to check if we actually preloaded the base
pointer or a different element of the base pointer SAI array.
llvm-svn: 251881
|
|
|
|
| |
llvm-svn: 251869
|
|
|
|
| |
llvm-svn: 251228
|
|
|
|
|
|
|
|
| |
When verifying if a scop is still valid we rerun all analysis, but did not
update DetectionContextMap. This change ensures that information, e.g. about
non-affine regions, is correctly updated
llvm-svn: 251227
|
|
|
|
| |
llvm-svn: 251199
|
|
|
|
|
|
|
|
|
|
| |
the scop
Such PHI nodes can not only appear in the ExitBlock of the Scop, but indeed
any scalar PHI node above the scop and used in the scop is modeled as scalar
read access.
llvm-svn: 251198
|
|
|
|
|
|
| |
Thanks Tobias for the hint.
llvm-svn: 250695
|
|
|
|
|
|
|
|
|
|
|
|
| |
New values were always synthesized in the block of the instruction
that needed them. This is incorrect for PHI node whose' value must be
defined in the respective incoming block. This patch temporarily moves
the builder's insert point to the incoming block while synthesizing phi
node arguments.
This fixes PR25241 (http://llvm.org/bugs/show_bug.cgi?id=25241)
llvm-svn: 250693
|
|
|
|
|
|
|
|
|
|
|
|
| |
Accesses that have a relative offset (in bytes) that is not divisible
by the type size (in bytes) will be represented as empty in the SCoP
description. This is on its own not good but it also crashed the
invariant load hoisting. This patch will fix the latter problem while
the former should be addressed too.
This fixes bug 25236.
llvm-svn: 250664
|
|
|
|
|
|
|
|
|
|
| |
If the base pointer of a load is invariant and defined in the SCoP but
not loaded we cannot hoist the load as we would not hoist the base
pointer definition.
This fixes bug 25237.
llvm-svn: 250663
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sorting is replaced by a demand driven code generation that will pre-load a
value when it is needed or, if it was not needed before, at some point
determined by the order of invariant accesses in the program. Only in very
little cases this demand driven pre-loading will kick in, though it will
prevent us from generating faulty code. An example where it is needed is
shown in:
test/ScopInfo/invariant_loads_complicated_dependences.ll
Invariant loads that appear in parameters but are not on the top-level (e.g.,
the parameter is not a SCEVUnknown) will now be treated correctly.
Differential Revision: http://reviews.llvm.org/D13831
llvm-svn: 250655
|
|
|
|
|
|
|
|
|
|
|
| |
Polly can now be used as a analysis only tool as long as the code
generation is disabled. However, we do not have an alternative to the
independent blocks pass in place yet, though in the relevant cases
this does not seem to impact the performance much. Nevertheless, a
virtual alternative that allows the same transformations without
changing the input region will follow shortly.
llvm-svn: 250652
|
|
|
|
|
|
|
|
|
|
| |
Expressing this in terms of BlockGenerator::getOrCreateAlloca(const
ScopArrayInfo *Array) does not work as the MemoryAccess BasePtr is in case of
invariant load hoisting different to the ScopArrayInfo BasePtr. Until this is
investigated and fixed, we move back to code that just uses the baseptr of
MemoryAccess.
llvm-svn: 250637
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of generating implicit loads within basic blocks, put them
before the instructions of the statment itself, including non-affine
subregions. The region's entry node is dominating all blocks in the
region and therefore the loaded value will be available there.
Implicit writes in block-stmts were already stored back at the end of
the block. Now, also generate the stores of non-affine subregions when
leaving the statement, i.e. in the exiting block.
This change is required for array-mapped implicits ("De-LICM") to
ensure that there are no dependencies of demoted scalars within
statments. Statement load all required values, operator on copied in
registers, and then write back the changed value to the demoted memory.
Lifetimes analysis within statements becomes unecessary.
Differential Revision: http://reviews.llvm.org/D13487
llvm-svn: 250625
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In r250408 'CHECK-NEXT: br' lines were removed as they also matched a
'%polly.subregion.iv.inc' instruction and did consequently not check what they
were supposed to check. However, without these lines we can not test that the
.s2a instructions that are not any more generated since r250411 really are not
emitted. Hence, we add back the CHECK-NEXT lines to ensure there are really no
instructions generated between the store that we check for and the branch at the
end of the basic block. To ensure we do not match too early, we now check for
'br i1' or 'br label'.
llvm-svn: 250435
|
|
|
|
|
|
|
|
|
|
|
|
| |
When pulling a llvm::Value to be written as a PHI write, the former
code did only check whether it is within the same basic block, but it
could also be the same non-affine subregion. In that case some
unecessary pair of MemoryAccesses would have been created.
Two unit test were explicitely checking for the unecessary writes,
including the comments that the writes are unecessary.
llvm-svn: 250411
|
|
|
|
|
|
|
|
|
| |
They happen to match
%polly.subregion.iv.inc = add i32 %polly.subregion.iv, 1
^^ ^^
that is, are misleading in what they actually check.
llvm-svn: 250408
|
|
|
|
|
|
|
|
|
|
| |
When sharing the same map from old to new value, CodeGeneration would
reuse the same new value for each basic block. However, the SCEV
expander might emit code in a basic block that does not dominate a use
of the SCEV in another basic block. This test checks whether both such
blocks have their own expanded new values.
llvm-svn: 250389
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We harden one test case by ensuring no additional stores may possibly be
introduced between the stores we check for and the basic block terminator
statements.
We also add a test case for the situation where a value that is passed from
a non-affine region to a PHI node does not dominate the exit of the non-affine
region. This case has come up in patch reviews, so we make sure it is properly
handled today and in the future.
llvm-svn: 250217
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a (assumed) invariant location is loaded multiple times we
generated a parameter for each location. However, this caused compile
time problems for several benchmarks (e.g., 445_gobmk in SPEC2006 and
BT in the NAS benchmarks). Additionally, the code we generate is
suboptimal as we preload the same location multiple times and perform
the same checks on all the parameters that refere to the same value.
With this patch we consolidate the invariant loads in three steps:
1) During SCoP initialization required invariant loads are put in
equivalence classes based on their pointer operand. One
representing load is used to generate a parameter for the whole
class, thus we never generate multiple parameters for the same
location.
2) During the SCoP simplification we remove invariant memory
accesses that are in the same equivalence class. While doing so
we build the union of all execution domains as it is only
important that the location is at least accessed once.
3) During code generation we only preload one element of each
equivalence class with the unified execution domain. All others
are mapped to that preloaded value.
Differential Revision: http://reviews.llvm.org/D13338
llvm-svn: 249853
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows invariant loads to be used in the SCoP description,
e.g., as loop bounds, conditions or in memory access functions.
First we collect "required invariant loads" during SCoP detection that
would otherwise make an expression we care about non-affine. To this
end a new level of abstraction was introduced before
SCEVValidator::isAffineExpr() namely ScopDetection::isAffine() and
ScopDetection::onlyValidRequiredInvariantLoads(). Here we can decide
if we want a load inside the region to be optimistically assumed
invariant or not. If we do, it will be marked as required and in the
SCoP generation we bail if it is actually not invariant. If we don't
it will be a non-affine expression as before. At the moment we
optimistically assume all "hoistable" (namely non-loop-carried) loads
to be invariant. This causes us to expand some SCoPs and dismiss them
later but it also allows us to detect a lot we would dismiss directly
if we would ask e.g., AliasAnalysis::canBasicBlockModify(). We also
allow potential aliases between optimistically assumed invariant loads
and other pointers as our runtime alias checks are sound in case the
loads are actually invariant. Together with the invariant checks this
combination allows to handle a lot more than LICM can.
The code generation of the invariant loads had to be extended as we
can now have dependences between parameters and invariant (hoisted)
loads as well as the other way around, e.g.,
test/Isl/CodeGen/invariant_load_parameters_cyclic_dependence.ll
First, it is important to note that we cannot have real cycles but
only dependences from a hoisted load to a parameter and from another
parameter to that hoisted load (and so on). To handle such cases we
materialize llvm::Values for parameters that are referred by a hoisted
load on demand and then materialize the remaining parameters. Second,
there are new kinds of dependences between hoisted loads caused by the
constraints on their execution. If a hoisted load is conditionally
executed it might depend on the value of another hoisted load. To deal
with such situations we sort them already in the ScopInfo such that
they can be generated in the order they are listed in the
Scop::InvariantAccesses list (see compareInvariantAccesses). The
dependences between hoisted loads caused by indirect accesses are
handled the same way as before.
llvm-svn: 249607
|
|
|
|
|
|
|
|
| |
These flags are now always passed to all tests and need to be disabled if
not needed. Disabling these flags, rather than passing them to almost all
tests, significantly simplfies our RUN: lines.
llvm-svn: 249422
|
|
|
|
|
|
|
| |
By disabling our scop-profitability heuristics this becomes also visible in some
older test cases.
llvm-svn: 249411
|
|
|
|
|
|
|
|
|
|
| |
A statement with an empty domain complicates the invariant load
hoisting and does not help any subsequent analysis or transformation.
In fact it might introduce parameter dimensions or increase the
schedule dimensionality. To this end, we remove statements with an
empty domain early in the SCoP simplification.
llvm-svn: 249276
|
|
|
|
|
|
|
| |
The last invariant load fix was based on a later patch not
polly/master, thus needs to be adjusted.
llvm-svn: 249145
|
|
|
|
|
|
|
|
| |
We have to skip accesses in non-affine subregions during hoisting as
they might not be executed under the same condition as the entry of
the non-affine subregion.
llvm-svn: 249139
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a value is globally mapped (IslNodeBuilder::ValueMap) and
referenced in the code that will be put into a subfunction, we hand
down the new value to the subfunction.
This patch also removes code that handed down all invariant loads to
the subfunction. Instead, only needed invariant loads are given to the
subfunction. There are two possible reasons for an invariant load to
be handed down:
1) The invariant load is used in a block that is placed in the
subfunction but which is not the parent of the load. In this
case, the scalar access that will read the loaded value, will
cause its base pointer (the preloaded value) to be handed down to
the subfunction.
2) The invariant load is defined and used in a block that is placed
in the subfunction. With this patch we will hand down the
preloaded value to the subfunction as the invariant load is
globally mapped to that value.
llvm-svn: 249126
|
|
|
|
|
|
| |
Hand down all preloaded values to the parallel subfunction.
llvm-svn: 249010
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instructions which we can synthesis from a SCEV expression are not
generated directly, but only when they are used as an operand of
another instruction. This avoids generating unnecessary instructions
and works more reliably than first inserting them and then deleting
them later on.
This commit was reverted in r248860 due to a remaining miscompile, where
we forgot to synthesis the operand values that were referenced from scalar
writes. test/Isl/CodeGen/scalar-store-from-same-bb.ll tests that we do this
now correctly.
llvm-svn: 248900
|
|
|
|
|
|
|
|
|
| |
Before we unconditinoally forced all users outside the SCoP to use
the preloaded value. However, if the SCoP is not executed due to the
runtime checks, we need to use the original value because it might not
be invariant in the first place.
llvm-svn: 248881
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a first step in the direction of assumed invariant loads (loads
that are not written in some context) we now detect and hoist
definitively invariant loads. These invariant loads will be preloaded
in the code generation and used in the optimized version of the SCoP.
If the load is only conditionally executed the preloaded version will
also only be executed under the same condition, hence we will never
access memory that wouldn't have been accessed otherwise. This is also
the most distinguishing feature to licm.
As hoisting can make statements empty we will simplify the SCoP and
remove empty statements that would otherwise cause artifacts in the
code generation.
Differential Revision: http://reviews.llvm.org/D13194
llvm-svn: 248861
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 07830c18d789ee72812d5b5b9b4f8ce72ebd4207.
The commit broke at least one test in lnt,
MultiSource/Benchmarks/Ptrdist/bc/number.c
was miss compiled and the test produced a wrong result.
One Polly test case that was added later was adjusted too.
llvm-svn: 248860
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every once in a while we see code that accesses memory with different types,
e.g. to perform operations on a piece of memory using type 'float', but to copy
data to this memory using type 'int'. Modeled in C, such codes look like:
void foo(float A[], float B[]) {
for (long i = 0; i < 100; i++)
*(int *)(&A[i]) = *(int *)(&B[i]);
for (long i = 0; i < 100; i++)
A[i] += 10;
}
We already used the correct types during normal operations, but fall back to our
detected type as soon as we import changed memory access functions. For these
memory accesses we may generate invalid IR due to a mismatch between the element
type of the array we detect and the actual type used in the memory access. To
address this issue, we always cast the newly created address of a memory access
back to the type of the memory access where the address will be used.
llvm-svn: 248781
|
|
|
|
|
|
|
| |
While debugging, this makes it easier to understand due to which memory
reference these stores have been introduced.
llvm-svn: 248717
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instructions which we can synthesis from a SCEV expression are not generated
directly, but only when they are used as an operand of another instruction. This
avoids generating unnecessary instruction and works more reliably than first
inserting them and then deleting them later on.
Suggested-by: Johannes Doerfert <doerfert@cs.uni-saarland.de>
Differential Revision: http://reviews.llvm.org/D13208
llvm-svn: 248712
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows switch instructions with affine conditions in the
SCoP. Also switch instructions in non-affine subregions are allowed.
Both did not require much changes to the code, though there was some
refactoring needed to integrate them without code duplication.
In the llvm-test suite the number of profitable SCoPs increased from
135 to 139 but more importantly we can handle more benchmarks and user
inputs without preprocessing.
Differential Revision: http://reviews.llvm.org/D13200
llvm-svn: 248701
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now only delete trivially dead instructions in the BB we copy (copyBB), but
not in any other BB. Only for copyBB we know that there will _never_ be any
future uses of instructions that have no use after copyBB has been generated.
Other instructions in the AST that have been generated by IslNodeBuilder may
look dead at the moment, but may possibly still be referenced by GlobalMaps. If
we delete them now, later uses would break surprisingly.
We do not have a test case that breaks due to us deleting too many instructions.
This issue was found by inspection.
llvm-svn: 248688
|
|
|
|
|
|
|
|
|
|
|
|
| |
After having generated a new user statement a couple of inefficient or
trivially dead instructions may remain. This commit runs instruction
simplification over the newly generated blocks to ensure unneeded
instructions are removed right away.
This commit does adds simplification for non-affine subregions which was not
yet part of 248681.
llvm-svn: 248683
|
|
|
|
|
|
|
| |
Otherwise, part of the computation will be just simplified away when we add
instruction simplification support to the RegionGenerator.
llvm-svn: 248682
|
|
|
|
|
|
|
|
|
|
|
| |
After having generated a new user statement a couple of inefficient or trivially
dead instructions may remain. This commit runs instruction simplification over
the newly generated blocks to ensure unneeded instructions are removed right
away.
This commit does not yet add simplification for non-affine subregions.
llvm-svn: 248681
|
|
|
|
|
|
|
|
|
|
|
| |
This commit basically reverts r246427 but still solves the issue
tackled by that commit. Instead of emitting initialization code in the
beginning of the start block we now generate parallel code in its own
block and thereby guarantee separation. This is necessary as we cannot
generate code for hoisted loads prior to the start block but it still
needs to be placed prior to everything else.
llvm-svn: 248674
|
|
|
|
|
|
|
|
| |
We now add loop carried information during the second traversal of the
region instead of in a intermediate step in-between. This makes the
generation simpler, removes code and should even be faster.
llvm-svn: 248125
|