summaryrefslogtreecommitdiffstats
path: root/polly/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [GPGPU] Do not create copy statements when targetting managed memoryTobias Grosser2017-08-183-5/+13
| | | | | | | | | | | | | | | | | | Summary: They are not used and consequently do not even need to be computed. This reduces the overall compile time for our kernel from 1m33s to 17s. Reviewers: Meinersbur, bollu, singam-sanjay Reviewed By: bollu Subscribers: nemanjai, pollydev, llvm-commits, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D36868 llvm-svn: 311157
* [GPGPU] Synchronize after each kernel, not each copy outTobias Grosser2017-08-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change reduces the overall number of synchronize calls for kernels with a lot of output data at the cost of additional synchronize calls for kernels launched in sequence without any device to host transfers in between. As the latter pattern is a lot less frequent, this seems a better tradeoff. Even though the above motivation would be motivation enough, this is just a step towards enabling ppcg to not compute to and from device copy calls at all, which would be incorrect in case we still relied on these calls to place our synchronization statements. Reviewers: Meinersbur, bollu, singam-sanjay Reviewed By: bollu Subscribers: nemanjai, kbarton, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D36867 llvm-svn: 311155
* [ScpInliner] Move DEBUG-TYPE to below all includes to prevent cross-module ↵Siddharth Bhat2017-08-171-2/+2
| | | | | | | | interaction. [NFC] This fixes compile errors. llvm-svn: 311130
* [GPGPU] Only collect the access that belong to an array [NFC]Tobias Grosser2017-08-172-6/+10
| | | | | | | | This avoid the construction of very large sets and in many cases also keeps the number of parameters low. As a result, we see a compile time reduction from 5 minutes to only slightly above 1 minute for one of our larger test cases. llvm-svn: 311127
* [ScopInliner] Add a simple Scop-based inliner to polly.Siddharth Bhat2017-08-174-7/+129
| | | | | | | | | | | | | We add a ScopInliner pass which inlines functions based on a simple heuristic: Let `g` call `f`. If we can model all of `f` as a Scop, we inline `f` into `g`. This requires `-polly-detect-full-function` to be enabled. So, the pass asserts that `-polly-detect-full-function` is enabled. Differential Revision: https://reviews.llvm.org/D36832 llvm-svn: 311126
* [GPGPU] Move getExtend to C++ [NFC]Tobias Grosser2017-08-171-54/+35
| | | | llvm-svn: 311123
* [ManagedMemoryRewrite] Rewrite malloc, free correctly inside `Constant`s.Siddharth Bhat2017-08-171-2/+33
| | | | | | | | | | | | | | | | | Reuse the machinery built for replacing global arrays to replace malloc/free as well. Example replacement that was missed earlier: ``` call void \ bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13) ``` - Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss this. We don't miss this anymore. Differential Revision: https://reviews.llvm.org/D36825 llvm-svn: 311121
* [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.Siddharth Bhat2017-08-171-16/+263
| | | | | | | | | | | | | | | - If we have global arrays, we would like to rewrite them to global pointers which are allocated using `cudaMallocManaged`. - If we have allocas in a function, we would like to rewrite them to heap-allocations with `cudaMallocManaged` and `cudaFree`. - With these rewrite mechanisms, we can offload _any_ function to the GPU with no code rewrite whatsover. Differential Revision: https://reviews.llvm.org/D36516 llvm-svn: 311080
* Add rewrite by-reference parameter passTobias Grosser2017-08-174-0/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This pass detangles induction variables from functions, which take variables by reference. Most fortran functions compiled with gfortran pass variables by reference. Unfortunately a common pattern, printf calls of induction variables, prevent in this situation the promotion of the induction variable to a register, which again inhibits any kind of loop analysis. To work around this issue we developed a specialized pass which introduces separate alloca slots for known-read-only references, which indicate the mem2reg pass that the induction variables can be promoted to registers and consquently enable SCEV to work. We currently hardcode the information that a function _gfortran_transfer_integer_write does not read its second parameter, as dragonegg does not add the right annotations and we cannot change old dragonegg releases. Hopefully flang will produce the right annotations. Reviewers: Meinersbur, bollu, singam-sanjay Reviewed By: bollu Subscribers: mgorny, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D36800 llvm-svn: 311066
* [GPGPU] Also record invariant loads as kernel subtree valuesTobias Grosser2017-08-161-3/+9
| | | | | | | Before this change kernels that used invariant loads would have resulted in invalid PTX code. llvm-svn: 311042
* [Polly] Move ScopStmt::checkForReductions to islpp. NFC.Tobias Grosser2017-08-151-18/+11
| | | | | | | | Reviewers: grosser, bollu Differential Revision: https://reviews.llvm.org/D36714 llvm-svn: 310908
* Move ScopStmt::getSchedule to islpp. NFC.Tobias Grosser2017-08-141-23/+13
| | | | | | | | Reviewers: grosser, Meinersbur, bollu Differential Revision: https://reviews.llvm.org/D36660 llvm-svn: 310815
* [Polly] Move Scop::restrictDomains to islpp. NFC.Tobias Grosser2017-08-142-18/+10
| | | | | | | | Reviewers: grosser, Meinersbur, bollu Differential Revision: https://reviews.llvm.org/D36659 llvm-svn: 310814
* [ScopInfo] Translate ParameterIds to isl++Tobias Grosser2017-08-131-5/+4
| | | | llvm-svn: 310795
* Fix two warnings in polly, -Wmismatched-tags and -WreorderReid Kleckner2017-08-101-1/+1
| | | | llvm-svn: 310667
* [JSON] Make the failure to parse a jscop file a hard errorPhilip Pfaffe2017-08-101-5/+10
| | | | | | | | | | | | | | | | | | Summary: Before, if we fail to parse a jscop file, this will be reported as an error and importing is aborted. However, this isn't actually strong enough, since although the import is aborted, the scop has already been modified and is very likely broken. Instead, make this a hard failure and throw an LLVM error. This new behaviour requires small changes to the tests for the legacy pass, namely using `not` to verify the error. Further, fixed the jscop file for the base_pointer_load_is_inst_inside_invariant_1 testcase. Reviewed By: Meinersbur Split out of D36578. llvm-svn: 310599
* [JSON][PM] Port json import/export over to new pmPhilip Pfaffe2017-08-103-97/+126
| | | | | | | | | | | | | | | | Summary: I pulled out all functionality into static functions, and use those both in the legacy passes and in the new ones. Reviewers: grosser, Meinersbur, bollu Reviewed By: Meinersbur Subscribers: llvm-commits, pollydev Differential Revision: https://reviews.llvm.org/D36578 llvm-svn: 310597
* [GPGPU] Make the ast_build available to block generatorTobias Grosser2017-08-101-0/+2
| | | | | | This is necessary for partial writes (as used by delicm) to work. llvm-svn: 310553
* [Polly][PM] Improve invalidation in the Scop-PipelinePhilip Pfaffe2017-08-103-7/+29
| | | | | | | | | | | | | | | | | | | | | Summary: During code generation for a Scop we modify the IR of a function. While this shouldn't affect a Scop in the formal sense, the implementation caches various information about the IR such as SCEV expressions for bounds or parameters. This cached information needs to be updated or invalidated. To this end, SPMUpdater allows passes to report when they've invalidated a Scop to the PassManager, which will then flush and recompute all Scops. This in turn invalidates all iterators, so references to Scops shouldn't be held. Reviewers: grosser, Meinersbur, bollu Reviewed By: grosser Subscribers: llvm-commits, pollydev Differential Revision: https://reviews.llvm.org/D36524 llvm-svn: 310551
* [ManagedMemoryRewrite] [Polly] Erase original malloc and free. [NFC]Siddharth Bhat2017-08-091-0/+2
| | | | | | | We do not need to keep `malloc` and `free` around since they are replaced by `polly_{malloc,free}Managed.` llvm-svn: 310504
* Remove dependency of Scop::getStmtFor(Inst) on getStmtFor(BB). NFC.Michael Kruse2017-08-092-19/+32
| | | | | | | | | | | | | We are working towards removing uses of Scop::getStmtFor(BB). In this patch, we remove dependency of Scop::getStmtFor(Inst) on getStmtFor(BB). To do so, we introduce a map of instructions to their corresponding scop statements and use it to get the instructions' statement. Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in> Differential Revision: https://reviews.llvm.org/D35663 llvm-svn: 310494
* [ManagedMemoryRewrite] Introduce a new pass to rewrite modules to use ↵Siddharth Bhat2017-08-094-32/+190
| | | | | | | | | | | | | | | | managed memory. This pass is useful to automatically convert a codebase that uses malloc/free to use their managed memory counterparts. Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants. A future patch will teach ManagedMemoryRewrite to rewrite global arrays as pointers to globally allocated managed memory. Differential Revision: https://reviews.llvm.org/D36513 llvm-svn: 310471
* [CodeGen] Use isLatestArrayKind().Michael Kruse2017-08-091-1/+1
| | | | | | | | | | Codegen with -polly-parallel queried the unmapped MemoryAccess, but only the MemoryKind after mapping is relevant for codegen. This should fix various fails of the perf-x86_64-penryn-O3-polly-parallel-fast buildbot. llvm-svn: 310466
* [ForwardOpTree] Set DEBUG_TYPE to "polly-optree".Michael Kruse2017-08-091-1/+1
| | | | | | | | | The previous value of "polly-delicm" was forgotten to to be changed when ForwardOpTree was split from DeLICM. Thanks to Tobias for noticing! llvm-svn: 310465
* [ISLTools/ZoneAlgo] Make distributeDomain and filterKnownValInst ↵Michael Kruse2017-08-092-5/+6
| | | | | | | | | | | | | | | isl_error_quota proof. distributeDomain() and filterKnownValInst() are used in a scop of ForwardOpTree that limits the number of isl operations. Therefore some isl functions may return null after any operation. Remove assertion that assume non-null results and handle isl_*_foreach returning isl::stat::error. I hope this fixes the crash of the asop buildbot at ihevc_recon.c. llvm-svn: 310461
* [ZoneAlgo] Add motivation for exception. NFC.Michael Kruse2017-08-091-0/+20
| | | | | Suggested-by: Hongbin Zheng <etherzhhb@gmail.com> llvm-svn: 310455
* [ZoneAlgo] Consolditate condition. NFC.Michael Kruse2017-08-091-8/+7
| | | | | | | No need to create an OptimizationRemarkMissed object if we are not going to use it anyway. llvm-svn: 310454
* [PPCGCodeGeneration] Compute element size in bytes for arrays correctly.Siddharth Bhat2017-08-091-1/+14
| | | | | | | | | | | | Previously, we used to compute this with `elementSizeInBits / 8`. This would yield an element size of 0 when the array had element size < 8 in bits. To fix this, ask data layout what the size in bytes should be. Differential Revision: https://reviews.llvm.org/D36459 llvm-svn: 310448
* [DeLICM/ZoneAlgo] Remove duplicate code. NFC.Michael Kruse2017-08-082-31/+1
| | | | | | | | DeLICM and ZoneAlgo both implemented filterKnownValInst. Declare ZoneAlgo's version in the header and let DeLCIM use it. llvm-svn: 310381
* Use SCEV information for the second level aliasingRoman Gareev2017-08-081-8/+10
| | | | | | | | | | | | | | | | | | | | | We introduce another level of alias metadata to distinguish the individual non-aliasing accesses that have inter iteration alias-free base pointers marked with "Inter iteration alias-free" mark nodes. To distinguish two accesses, the comparison of raw pointers representing base pointers is used. In case of, for example, ublas's prod function that implements GEMM, and DeLiCM we can get accesses to same location represented by different raw pointers. Consequently, we create different alias sets that can prevent accesses from, for example, being sinked or hoisted. To avoid the issue, we compare the corresponding SCEV information instead of the corresponding raw pointers. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D35761 llvm-svn: 310380
* Do not use isl_set_project_out to get all loop prefixesRoman Gareev2017-08-081-3/+4
| | | | | | | | | | | | Currently, only convex isolation sets can be efficiently processed by isl. Consequently, as a temporary solution, we use a different algorithm for partial tile isolation that helps to build convex isolation sets in some cases. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D36278 llvm-svn: 310374
* [RegisterPasses] Run polly-simplify also right after scop modelingTobias Grosser2017-08-081-0/+2
| | | | | | | | This allows us to get rid of stores that are overwritten within the very same basic block, without ever being read beforehand. This simplification is necessary for delicm to run on pb4's correlation. llvm-svn: 310369
* [ScopInfo] [NFC] Typo fix.Siddharth Bhat2017-08-081-1/+1
| | | | | | "to conservative" -> "too conservative". llvm-svn: 310353
* [Polly] [PPCGCodeGeneration] Handle failing of invariant load hoisting ↵Siddharth Bhat2017-08-082-20/+40
| | | | | | | | | | | gracefully. To do this, we replicate what `CodeGeneration` does. We expose `markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`. Differential Revision: https://reviews.llvm.org/D36457 llvm-svn: 310350
* [DeLICM] Properly handle PHI writes becoming empty partial writes.Michael Kruse2017-08-081-3/+8
| | | | | | | | | | | | | It is possible that partial writes are empty (write is never executed). In this case, when in PHINode's incoming edge is never taken such that the incoming write becomes an empty partial write, if enabled. The issue is that when converting the union_map to an map, it's space cannot be derived from the union_map itself. Rather, we need to determine its space independently. This fixes test-suite's MultiSource/Benchmarks/ASC_Sequoia/CrystalMk. llvm-svn: 310348
* [ScheduleOptimizer] Make matmul pattern detection work with delicm outputTobias Grosser2017-08-082-35/+28
| | | | | | | | | | | | In certain cases delicm might decide to not leave the original array write in the loop body, but to remove it and instead leave a transformed phi node as write access. This commit teached the matmul pattern detection to order the memory accesses according to when the access actually happens and use this information to detect the new pattern. This makes pattern based matmul optimization work for 2mm and 3mm in polybench 4 after polly-position=before-vectorizer has been enabled. llvm-svn: 310338
* Change Polly's position to "before-vectorizer"Tobias Grosser2017-08-071-1/+1
| | | | | | | | | | | | | | | | | | | | | Polly has traditionally always been executed at the beginning of the pass pipeline as LLVM's inliner and DeLICM passes introduced plenty of scalar dependences which prevented any kind of useful high-level loop optimizations later in the pass pipeline. With DeLICM now being available, Polly can also run optimizations when folded into the pass pipeline. This has the benefit that Polly should now be more effective on C++ code and as an additional bonus, no additional early canonicalization phase must be run. As a result, Polly touches the code only if it applies a transformation. Code that does not benefit from Polly is not touched and consequently will have the very same execution time as without Polly enabled. Random performance changes, as could sometimes be observed with polly-position=early are consequently not possible any more. If performance is changed, this is due to Polly is choosing to perform a transformation. If this choice is wrong, it can be fixed directly in Polly. http://polly.llvm.org/docs/Architecture.html#polly-in-the-llvm-pass-pipeline llvm-svn: 310319
* [DeLICM] Enable partial writesTobias Grosser2017-08-071-1/+1
| | | | | | | This allows us to remove more scalar dependences. While this feature is still rather experimental, we want to give it sufficient test coverage. llvm-svn: 310314
* Enable delicm to automatically remove scalar loop carried dependencesTobias Grosser2017-08-071-1/+1
| | | | | | | While this code is still rather we enable it by default to get better test coverage. llvm-svn: 310313
* [ZoneAlgo] Allow two writes that write identical values into same array slotTobias Grosser2017-08-071-5/+30
| | | | | | | | | Two write statements which write into the very same array slot generally are conflicting. However, in case the value that is written is identical, this does not cause any problem. Hence, allow such write pairs in this specific situation. llvm-svn: 310311
* [Polly] Fully-Indexed static expansionAndreas Simbuerger2017-08-073-0/+404
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit implements the initial version of fully-indexed static expansion. ``` for(int i = 0; i<Ni; i++) for(int j = 0; j<Ni; j++) S: B[j] = j; T: A[i] = B[i] ``` After the pass, we want this : ``` for(int i = 0; i<Ni; i++) for(int j = 0; j<Ni; j++) S: B[i][j] = j; T: A[i] = B[i][i] ``` For now we bail (fail) in the following cases: - Scalar access - Multiple writes per SAI - MayWrite Access - Expansion that leads to an access to the original array Furthermore: We still miss checks for escaping references to the array base pointers. A future commit will add the missing escape-checks to stay correct in those cases. The expansion is still locked behind a CLI-Option and should not yet be used. Patch contributed by: Nicholas Bonfante <bonfante.nicolas@gmail.com> Reviewers: simbuerg, Meinersbur, bollu Reviewed By: Meinersbur Subscribers: mgorny, llvm-commits, pollydev Differential Revision: https://reviews.llvm.org/D34982 llvm-svn: 310304
* [GPGPU] Remove redundant constructorsTobias Grosser2017-08-071-4/+4
| | | | llvm-svn: 310284
* [ForwardOpTree] Use known array content analysis to forward load instructions.Michael Kruse2017-08-075-57/+605
| | | | | | | | | | | | | | | | | This is an addition to the -polly-optree pass that reuses the array content analysis from DeLICM to find array elements that contain the same value as the value loaded when the target statement instance is executed. The analysis is now enabled by default. The known content analysis could also be used to rematerialize any llvm::Value that was written to some array element, but currently only loads are forwarded. Differential Revision: https://reviews.llvm.org/D36380 llvm-svn: 310279
* [ScopInfo] Make Scop::canAlwaysBeHoisted a member functionTobias Grosser2017-08-071-4/+3
| | | | llvm-svn: 310236
* [ScopInfo] Move Scop::addInvariantLoads to isl++ [NFC]Tobias Grosser2017-08-061-31/+23
| | | | llvm-svn: 310235
* [ScopInfo] Move Scop::getPwAffOnly to isl++ [NFC]Tobias Grosser2017-08-0610-28/+28
| | | | llvm-svn: 310231
* [ScopInfo] Move Scop::getDomains to isl++ [NFC]Tobias Grosser2017-08-066-14/+15
| | | | llvm-svn: 310230
* [ScopInfo] Move Scop::getInvalidContext to isl++ [NFC]Tobias Grosser2017-08-062-4/+5
| | | | llvm-svn: 310229
* [ScopInfo] Move Scop::getAssumedContext to isl++ [NFC]Tobias Grosser2017-08-063-6/+7
| | | | llvm-svn: 310228
* [ScopInfo] Move Scop::addNonEmptyDomainConstraints to isl++ [NFC]Tobias Grosser2017-08-061-4/+4
| | | | llvm-svn: 310225
OpenPOWER on IntegriCloud