| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
They are not used and consequently do not even need to be computed. This reduces
the overall compile time for our kernel from 1m33s to 17s.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36868
llvm-svn: 311157
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change reduces the overall number of synchronize calls for kernels with
a lot of output data at the cost of additional synchronize calls for kernels
launched in sequence without any device to host transfers in between. As the
latter pattern is a lot less frequent, this seems a better tradeoff.
Even though the above motivation would be motivation enough, this is just
a step towards enabling ppcg to not compute to and from device copy calls
at all, which would be incorrect in case we still relied on these calls to
place our synchronization statements.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36867
llvm-svn: 311155
|
|
|
|
|
|
|
|
| |
interaction. [NFC]
This fixes compile errors.
llvm-svn: 311130
|
|
|
|
|
|
|
|
| |
This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile time reduction from 5
minutes to only slightly above 1 minute for one of our larger test cases.
llvm-svn: 311127
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We add a ScopInliner pass which inlines functions based on a simple heuristic:
Let `g` call `f`.
If we can model all of `f` as a Scop, we inline `f` into `g`.
This requires `-polly-detect-full-function` to be enabled. So, the pass
asserts that `-polly-detect-full-function` is enabled.
Differential Revision: https://reviews.llvm.org/D36832
llvm-svn: 311126
|
|
|
|
| |
llvm-svn: 311123
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reuse the machinery built for replacing global arrays to replace malloc/free as
well. Example replacement that was missed earlier:
```
call void \
bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13)
```
- Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss
this. We don't miss this anymore.
Differential Revision: https://reviews.llvm.org/D36825
llvm-svn: 311121
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- If we have global arrays, we would like to rewrite them to global
pointers which are allocated using `cudaMallocManaged`.
- If we have allocas in a function, we would like to rewrite them to
heap-allocations with `cudaMallocManaged` and `cudaFree`.
- With these rewrite mechanisms, we can offload _any_ function to the
GPU with no code rewrite whatsover.
Differential Revision: https://reviews.llvm.org/D36516
llvm-svn: 311080
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This pass detangles induction variables from functions, which take variables by
reference. Most fortran functions compiled with gfortran pass variables by
reference. Unfortunately a common pattern, printf calls of induction variables,
prevent in this situation the promotion of the induction variable to a register,
which again inhibits any kind of loop analysis. To work around this issue
we developed a specialized pass which introduces separate alloca slots for
known-read-only references, which indicate the mem2reg pass that the induction
variables can be promoted to registers and consquently enable SCEV to work.
We currently hardcode the information that a function
_gfortran_transfer_integer_write does not read its second parameter, as
dragonegg does not add the right annotations and we cannot change old dragonegg
releases. Hopefully flang will produce the right annotations.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: mgorny, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36800
llvm-svn: 311066
|
|
|
|
|
|
|
| |
Before this change kernels that used invariant loads would have resulted in
invalid PTX code.
llvm-svn: 311042
|
|
|
|
|
|
|
|
| |
Reviewers: grosser, bollu
Differential Revision: https://reviews.llvm.org/D36714
llvm-svn: 310908
|
|
|
|
|
|
|
|
| |
Reviewers: grosser, Meinersbur, bollu
Differential Revision: https://reviews.llvm.org/D36660
llvm-svn: 310815
|
|
|
|
|
|
|
|
| |
Reviewers: grosser, Meinersbur, bollu
Differential Revision: https://reviews.llvm.org/D36659
llvm-svn: 310814
|
|
|
|
| |
llvm-svn: 310795
|
|
|
|
| |
llvm-svn: 310667
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Before, if we fail to parse a jscop file, this will be reported as an
error and importing is aborted. However, this isn't actually strong
enough, since although the import is aborted, the scop has already been
modified and is very likely broken. Instead, make this a hard failure
and throw an LLVM error. This new behaviour requires small changes to
the tests for the legacy pass, namely using `not` to verify the error.
Further, fixed the jscop file for the
base_pointer_load_is_inst_inside_invariant_1 testcase.
Reviewed By: Meinersbur
Split out of D36578.
llvm-svn: 310599
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I pulled out all functionality into static functions, and use those both
in the legacy passes and in the new ones.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: Meinersbur
Subscribers: llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D36578
llvm-svn: 310597
|
|
|
|
|
|
| |
This is necessary for partial writes (as used by delicm) to work.
llvm-svn: 310553
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
During code generation for a Scop we modify the IR of a function.
While this shouldn't affect a Scop in the formal sense, the implementation
caches various information about the IR such as SCEV expressions for bounds or
parameters. This cached information needs to be updated or invalidated. To this
end, SPMUpdater allows passes to report when they've invalidated a Scop to the
PassManager, which will then flush and recompute all Scops. This in turn
invalidates all iterators, so references to Scops shouldn't be held.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: grosser
Subscribers: llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D36524
llvm-svn: 310551
|
|
|
|
|
|
|
| |
We do not need to keep `malloc` and `free` around since they are
replaced by `polly_{malloc,free}Managed.`
llvm-svn: 310504
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are working towards removing uses of Scop::getStmtFor(BB). In this
patch, we remove dependency of Scop::getStmtFor(Inst) on getStmtFor(BB).
To do so, we introduce a map of instructions to their corresponding scop
statements and use it to get the instructions' statement.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D35663
llvm-svn: 310494
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
managed memory.
This pass is useful to automatically convert a codebase that uses malloc/free
to use their managed memory counterparts.
Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants.
A future patch will teach ManagedMemoryRewrite to rewrite global arrays
as pointers to globally allocated managed memory.
Differential Revision: https://reviews.llvm.org/D36513
llvm-svn: 310471
|
|
|
|
|
|
|
|
|
|
| |
Codegen with -polly-parallel queried the unmapped MemoryAccess, but only
the MemoryKind after mapping is relevant for codegen.
This should fix various fails of the
perf-x86_64-penryn-O3-polly-parallel-fast buildbot.
llvm-svn: 310466
|
|
|
|
|
|
|
|
|
| |
The previous value of "polly-delicm" was forgotten to to be changed when
ForwardOpTree was split from DeLICM.
Thanks to Tobias for noticing!
llvm-svn: 310465
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
isl_error_quota proof.
distributeDomain() and filterKnownValInst() are used in a scop
of ForwardOpTree that limits the number of isl operations.
Therefore some isl functions may return null after any operation.
Remove assertion that assume non-null results and handle
isl_*_foreach returning isl::stat::error.
I hope this fixes the crash of the asop buildbot at ihevc_recon.c.
llvm-svn: 310461
|
|
|
|
|
| |
Suggested-by: Hongbin Zheng <etherzhhb@gmail.com>
llvm-svn: 310455
|
|
|
|
|
|
|
| |
No need to create an OptimizationRemarkMissed object if we are not going
to use it anyway.
llvm-svn: 310454
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we used to compute this with `elementSizeInBits / 8`. This
would yield an element size of 0 when the array had element size < 8 in
bits.
To fix this, ask data layout what the size in bytes should be.
Differential Revision: https://reviews.llvm.org/D36459
llvm-svn: 310448
|
|
|
|
|
|
|
|
| |
DeLICM and ZoneAlgo both implemented filterKnownValInst.
Declare ZoneAlgo's version in the header and let DeLCIM use it.
llvm-svn: 310381
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. To distinguish two
accesses, the comparison of raw pointers representing base pointers is used.
In case of, for example, ublas's prod function that implements GEMM, and
DeLiCM we can get accesses to same location represented by different raw
pointers. Consequently, we create different alias sets that can prevent
accesses from, for example, being sinked or hoisted.
To avoid the issue, we compare the corresponding SCEV information instead
of the corresponding raw pointers.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D35761
llvm-svn: 310380
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, only convex isolation sets can be efficiently processed by isl.
Consequently, as a temporary solution, we use a different algorithm for partial
tile isolation that helps to build convex isolation sets in some cases.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D36278
llvm-svn: 310374
|
|
|
|
|
|
|
|
| |
This allows us to get rid of stores that are overwritten within the very same
basic block, without ever being read beforehand. This simplification is
necessary for delicm to run on pb4's correlation.
llvm-svn: 310369
|
|
|
|
|
|
| |
"to conservative" -> "too conservative".
llvm-svn: 310353
|
|
|
|
|
|
|
|
|
|
|
| |
gracefully.
To do this, we replicate what `CodeGeneration` does. We expose
`markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`.
Differential Revision: https://reviews.llvm.org/D36457
llvm-svn: 310350
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible that partial writes are empty (write is never executed).
In this case, when in PHINode's incoming edge is never taken such that
the incoming write becomes an empty partial write, if enabled. The
issue is that when converting the union_map to an map, it's space
cannot be derived from the union_map itself. Rather, we need to
determine its space independently.
This fixes test-suite's MultiSource/Benchmarks/ASC_Sequoia/CrystalMk.
llvm-svn: 310348
|
|
|
|
|
|
|
|
|
|
|
|
| |
In certain cases delicm might decide to not leave the original array write in
the loop body, but to remove it and instead leave a transformed phi node as
write access. This commit teached the matmul pattern detection to order the
memory accesses according to when the access actually happens and use this
information to detect the new pattern. This makes pattern based matmul
optimization work for 2mm and 3mm in polybench 4 after
polly-position=before-vectorizer has been enabled.
llvm-svn: 310338
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Polly has traditionally always been executed at the beginning of the pass
pipeline as LLVM's inliner and DeLICM passes introduced plenty of scalar
dependences which prevented any kind of useful high-level loop optimizations
later in the pass pipeline. With DeLICM now being available, Polly can also
run optimizations when folded into the pass pipeline. This has the benefit
that Polly should now be more effective on C++ code and as an additional bonus,
no additional early canonicalization phase must be run. As a result, Polly
touches the code only if it applies a transformation. Code that does not
benefit from Polly is not touched and consequently will have the very same
execution time as without Polly enabled. Random performance changes, as could
sometimes be observed with polly-position=early are consequently not possible
any more. If performance is changed, this is due to Polly is choosing to
perform a transformation. If this choice is wrong, it can be fixed directly
in Polly.
http://polly.llvm.org/docs/Architecture.html#polly-in-the-llvm-pass-pipeline
llvm-svn: 310319
|
|
|
|
|
|
|
| |
This allows us to remove more scalar dependences. While this feature is still
rather experimental, we want to give it sufficient test coverage.
llvm-svn: 310314
|
|
|
|
|
|
|
| |
While this code is still rather we enable it by default to get better test
coverage.
llvm-svn: 310313
|
|
|
|
|
|
|
|
|
| |
Two write statements which write into the very same array slot generally are
conflicting. However, in case the value that is written is identical, this
does not cause any problem. Hence, allow such write pairs in this specific
situation.
llvm-svn: 310311
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit implements the initial version of fully-indexed static
expansion.
```
for(int i = 0; i<Ni; i++)
for(int j = 0; j<Ni; j++)
S: B[j] = j;
T: A[i] = B[i]
```
After the pass, we want this :
```
for(int i = 0; i<Ni; i++)
for(int j = 0; j<Ni; j++)
S: B[i][j] = j;
T: A[i] = B[i][i]
```
For now we bail (fail) in the following cases:
- Scalar access
- Multiple writes per SAI
- MayWrite Access
- Expansion that leads to an access to the original array
Furthermore: We still miss checks for escaping references to the array
base pointers. A future commit will add the missing escape-checks to
stay correct in those cases. The expansion is still locked behind a
CLI-Option and should not yet be used.
Patch contributed by: Nicholas Bonfante <bonfante.nicolas@gmail.com>
Reviewers: simbuerg, Meinersbur, bollu
Reviewed By: Meinersbur
Subscribers: mgorny, llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D34982
llvm-svn: 310304
|
|
|
|
| |
llvm-svn: 310284
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an addition to the -polly-optree pass that reuses the array
content analysis from DeLICM to find array elements that contain the
same value as the value loaded when the target statement instance
is executed.
The analysis is now enabled by default.
The known content analysis could also be used to rematerialize any
llvm::Value that was written to some array element, but currently
only loads are forwarded.
Differential Revision: https://reviews.llvm.org/D36380
llvm-svn: 310279
|
|
|
|
| |
llvm-svn: 310236
|
|
|
|
| |
llvm-svn: 310235
|
|
|
|
| |
llvm-svn: 310231
|
|
|
|
| |
llvm-svn: 310230
|
|
|
|
| |
llvm-svn: 310229
|
|
|
|
| |
llvm-svn: 310228
|
|
|
|
| |
llvm-svn: 310225
|