summaryrefslogtreecommitdiffstats
path: root/polly/lib/Transform/ScheduleOptimizer.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Sink all InitializePasses.h includesReid Kleckner2019-11-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211
* [Stats] More polly fixes following llvm::Statistic changes in r374490.Volodymyr Sapsai2019-10-111-3/+3
| | | | llvm-svn: 374501
* [Polly] Fix lib/Transform/ScheduleOptimizer.cpp compilation on SolarisRainer Orth2019-09-131-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lib/Transform/ScheduleOptimizer.cpp fails to compile on Solaris, both on the 9.x branch (first noticed when running test-release.sh without -no-polly) and on trunk: /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp: In function ‘MicroKernelParamsTy getMicroKernelParams(const llvm::TargetTransformInfo*, polly::MatMulInfoTy)’: /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp:914:62: error: call of overloaded ‘sqrt(long unsigned int)’ is ambiguous 914 | ceil(sqrt(Nvec * LatencyVectorFma * ThroughputVectorFma) / Nvec) * Nvec; | ^ In file included from /usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/math.h:24, from /usr/gcc/9/include/c++/9.1.0/cmath:45, from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm-c/DataTypes.h:28, from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/Support/DataTypes.h:16, from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/ADT/Hashing.h:47, from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/ADT/ArrayRef.h:12, from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/include/polly/ScheduleOptimizer.h:12, from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp:48: /usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:220:21: note: candidate: ‘long double std::sqrt(long double)’ 220 | inline long double sqrt(long double __X) { return __sqrtl(__X); } | ^~~~ /usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:186:15: note: candidate: ‘float std::sqrt(float)’ 186 | inline float sqrt(float __X) { return __sqrtf(__X); } | ^~~~ /usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:74:15: note: candidate: ‘double std::sqrt(double)’ 74 | extern double sqrt __P((double)); | ^~~~ /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp:915:67: error: call of overloaded ‘ceil(long unsigned int)’ is ambiguous 915 | int Mr = ceil(Nvec * LatencyVectorFma * ThroughputVectorFma / Nr); | ^ In file included from /usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/math.h:24, from /usr/gcc/9/include/c++/9.1.0/cmath:45, from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm-c/DataTypes.h:28, from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/Support/DataTypes.h:16, from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/ADT/Hashing.h:47, from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/ADT/ArrayRef.h:12, from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/include/polly/ScheduleOptimizer.h:12, from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp:48: /usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:196:21: note: candidate: ‘long double std::ceil(long double)’ 196 | inline long double ceil(long double __X) { return __ceill(__X); } | ^~~~ /usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:160:15: note: candidate: ‘float std::ceil(float)’ 160 | inline float ceil(float __X) { return __ceilf(__X); } | ^~~~ /usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:76:15: note: candidate: ‘double std::ceil(double)’ 76 | extern double ceil __P((double)); | ^~~~ Fixed by adding casts to disambiguate, checked that it now compiles on both amd64-pc-solaris2.11 and sparcv9-sun-solaris2.11 and on x86_64-pc-linux-gnu. Differential Revision: https://reviews.llvm.org/D67442 llvm-svn: 371825
* [ScheduleOptimizer] Hoist extension nodes after schedule optimization.Michael Kruse2019-05-311-2/+6
| | | | | | | | | | | | | | | | | | | | | | Extension nodes make schedule trees are less flexible: Many operations, such as rescheduling, do not work on such schedule trees with extension. As such, some functionality such as determining parallel loops in isl's AST are disabled. Currently, only the pattern-matching generalized matrix-matrix multiplication optimization adds extension nodes (to add copy-in statements). This patch removes all extension nodes as the last step of the schedule optimization by hoisting the extension node's added domain up to the root domain node. All following passes can assume that schedule trees work without restrictions, including the parallelism test. Mark the outermost loop of the optimized matrix-matrix multiplication as parallel such that -polly-parallel is able to parallelize that loop. Differential Revision: https://reviews.llvm.org/D58202 llvm-svn: 362257
* Apply include-what-you-use #include removal suggestions. NFC.Michael Kruse2019-03-281-5/+0
| | | | | | | | | | | | This removes unused includes (and forward declarations) as suggested by include-what-you-use. If a transitive include of a removed include is required to compile a file, I added the required header (or forward declaration if suggested by include-what-you-use). This should reduce compilation time and reduce the number of iterative recompilations when a header was changed. llvm-svn: 357209
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* Fix broken formatting caused by test commitTheodoros Theodoridis2018-10-171-1/+1
| | | | llvm-svn: 344694
* Test commitTheodoros Theodoridis2018-10-171-1/+1
| | | | llvm-svn: 344682
* getDependences to new C++ interfaceTobias Grosser2018-06-061-4/+4
| | | | | | | | | | | | | | Reviewers: Meinersbur, grosser, bollu, cs15btech11044, jdoerfert Reviewed By: grosser Subscribers: pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D47786 llvm-svn: 334092
* [polly] Update uses of DEBUG macro to LLVM_DEBUG.Nicola Zaghen2018-05-151-7/+7
| | | | | | | | | | | The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM Differential Revision: https://reviews.llvm.org/D44978 llvm-svn: 332352
* Remove the last uses of isl::give and isl::takeTobias Grosser2018-04-291-11/+10
| | | | llvm-svn: 331126
* [DeLICM] Remove uses of isl::giveTobias Grosser2018-04-281-3/+3
| | | | llvm-svn: 331122
* Adjust to clang-format changesTobias Grosser2018-03-201-1/+0
| | | | llvm-svn: 328005
* Use isl::manage_copy to simplify calls to isl::manage(isl_.._copy())Tobias Grosser2018-02-201-9/+8
| | | | | | | | | | | As part of this cleanup a couple of unnecessary isl::manage(obj.copy()) pattern are eliminated as well. We checked for all potential cleanups by scanning for: "grep -R isl::manage\( lib/ | grep copy" llvm-svn: 325558
* Run clang-format after r324003. NFC.Michael Kruse2018-02-021-4/+4
| | | | llvm-svn: 324112
* Update polly for r323999.Benjamin Kramer2018-02-011-5/+5
| | | | llvm-svn: 324003
* Port ScopInfo to the isl cpp bindingsPhilip Pfaffe2017-11-191-3/+3
| | | | | | | | | | | | | | | | | | | | | Summary: Most changes are mechanical, but in one place I changed the program semantics by fixing a likely bug: In `Scop::hasFeasibleRuntimeContext()`, I'm now explicitely handling the error-case. Before, when the call to `addNonEmptyDomainConstraints()` returned a null set, this (probably) accidentally worked because isl_bool_error converts to true. I'm checking for nullptr now. Reviewers: grosser, Meinersbur, bollu Reviewed By: Meinersbur Subscribers: nemanjai, kbarton, pollydev, llvm-commits Differential Revision: https://reviews.llvm.org/D39971 llvm-svn: 318632
* Check whether IslAstInfo and DependenceInfo were computed for the same Scop.Michael Kruse2017-09-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Since -polly-codegen reports itself to preserve DependenceInfo and IslAstInfo, we might get those analysis that were computed by a different ScopInfo for a different Scop structure. This would be unfortunate because DependenceInfo and IslAstInfo hold references to resources allocated by ScopInfo/ScopBuilder/Scop (e.g. isl_id). If -polly-codegen and DependenceInfo/IslAstInfo do not agree on which Scop to use, unpredictable things can happen. When the ScopInfo/Scop object is freed, there is a high probability that the new ScopInfo/Scop object will be created at the same heap position with the same address. Comparing whether the Scop or ScopInfo address is the expected therefore is unreliable. Instead, we compare the address of the isl_ctx object. Both, DependenceInfo and IslAstInfo must hold a reference to the isl_ctx object to ensure it is not freed before the destruction of those analyses which might happen after the destruction of the Scop/ScopInfo they refer to. Hence, the isl_ctx will not be freed and its address not reused as long there is a DependenceInfo or IslAstInfo around. This fixes llvm.org/PR34441 llvm-svn: 313842
* [ScheduleOptimizer] Fix and test schedule tree statistics.Michael Kruse2017-09-201-32/+38
| | | | | | | | | Fix walking over the schedule tree to collect its properties (Number of permutable bands etc.). Also add regression tests for these statistics. llvm-svn: 313750
* Unroll and separate the remaining parts of isolationRoman Gareev2017-09-111-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | The remaining parts produced by the full partial tile isolation can contain hot spots that are worth to be optimized. Currently, we rely on the simple loop unrolling pass, LiCM and the SLP vectorizer to optimize such parts. However, the approach can suffer from the lack of the information about aliasing that Polly provides using additional alias metadata or/and the lack of the information required by simple loop unrolling pass. This patch is the first step to optimize the remaining parts. To do it, we unroll and separate them. In case of, for instance, Intel Kaby Lake, it helps to increase the performance of the generated code from 39.87 GFlop/s to 49.23 GFlop/s. The next possible step is to avoid unrolling performed by Polly in case of isolated and remaining parts and rely only on simple loop unrolling pass and the Loop vectorizer. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D37692 llvm-svn: 312929
* Use the information about the target cache provided by the TargetTransformInfo.Roman Gareev2017-08-311-8/+72
| | | | | | | | Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D37178 llvm-svn: 312255
* [PM] Properly require and preserve OptimizationRemarkEmitter. NFCI.Michael Kruse2017-08-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | Properly require and preserve the OptimizationRemarkEmitter for use in ScopPass. Previously one had to get the ORE from ScopDetection because CodeGeneration did not mark it as preserved. It would need to be recomputed which results in the legacy PM to throw away all previous SCoP analysis. This also changes the implementation of ScopPass::getAnalysisUsage to not unconditionally preserve all passes, but only those needed to be preserved by any SCoP pass (at least when using the legacy PM). This allows invalidating DependenceInfo (and IslAstInfo) in case the pass would cause them to change (e.g. OpTree, DeLICM, MaximalArrayExpansion) JSONImporter should also invalidate the DependenceInfo. In this patch it marks DependenceInfo as preserved anyway because some regression tests depend on it. Differential Revision: https://reviews.llvm.org/D37010 llvm-svn: 311888
* [Polly] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-08-241-26/+36
| | | | | | other minor fixes (NFC). llvm-svn: 311704
* Add more statistics.Michael Kruse2017-08-231-3/+97
| | | | | | | | | | | | | | | | Add statistics about - Which optimizations are applied - Number of loops in Scops at various stages - Number of scalar/singleton writes at various stages representative for scalar false dependencies - Number of parallel loops These will be useful to find regressions due to moving Polly further down of LLVM's pass pipeline. Differential Revision: https://reviews.llvm.org/D37049 llvm-svn: 311553
* Disable the Loop Vectorizer in case of GEMMRoman Gareev2017-08-221-4/+14
| | | | | | | | | | | | | | Currently, in case of GEMM and the pattern matching based optimizations, we use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop Vectorizer can get in the way of optimal code generation, we disable the Loop Vectorizer for the innermost loop using mark nodes and emitting the corresponding metadata. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D36928 llvm-svn: 311473
* [MatMul] Make MatMul detection independent of internal isl representations.Michael Kruse2017-08-201-86/+47
| | | | | | | | | | | | | | | | | The pattern recognition for MatMul is restrictive. The number of "disjuncts" in the isl_map containing constraint information was previously required to be 1 (as per isl_*_coalesce - which should ideally produce a domain map with a single disjunct, but does not under some circumstances). This was changed and made more flexible. Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in> Differential Revision: https://reviews.llvm.org/D36460 llvm-svn: 311302
* Do not use isl_set_project_out to get all loop prefixesRoman Gareev2017-08-081-3/+4
| | | | | | | | | | | | Currently, only convex isolation sets can be efficiently processed by isl. Consequently, as a temporary solution, we use a different algorithm for partial tile isolation that helps to build convex isolation sets in some cases. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D36278 llvm-svn: 310374
* [ScheduleOptimizer] Make matmul pattern detection work with delicm outputTobias Grosser2017-08-081-2/+7
| | | | | | | | | | | | In certain cases delicm might decide to not leave the original array write in the loop body, but to remove it and instead leave a transformed phi node as write access. This commit teached the matmul pattern detection to order the memory accesses according to when the access actually happens and use this information to detect the new pattern. This makes pattern based matmul optimization work for 2mm and 3mm in polybench 4 after polly-position=before-vectorizer has been enabled. llvm-svn: 310338
* [ScopInfo] Move Scop::getPwAffOnly to isl++ [NFC]Tobias Grosser2017-08-061-1/+1
| | | | llvm-svn: 310231
* [ScopInfo] Move Scop::getDomains to isl++ [NFC]Tobias Grosser2017-08-061-1/+1
| | | | llvm-svn: 310230
* [ScopInfo] Move ScopStmt::ScopStmt to isl++ [NFC]Tobias Grosser2017-08-061-4/+2
| | | | llvm-svn: 310210
* Move ScopInfo::getDomain(), getDomainSpace(), getDomainId() to isl++Tobias Grosser2017-08-061-5/+3
| | | | llvm-svn: 310209
* [unittests] Add unittest for getPartialTilePrefixesTobias Grosser2017-08-051-17/+1
| | | | | | | | In https://reviews.llvm.org/D36278 it was pointed out that the behavior of getPartialTilePrefixes is not very well understood. To allow for a better understanding, we first provide some basic unittests. llvm-svn: 310175
* Move setNewAccessRelation to isl++Tobias Grosser2017-08-021-2/+2
| | | | llvm-svn: 309871
* [ScheduleOptimizer] Translate to C++ bindingsRoman Gareev2017-07-261-385/+289
| | | | | | | | | | Translate the ScheduleOptimizer to use the new isl C++ bindings. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D35845 llvm-svn: 309119
* Move MemoryAccess::isStride* to isl++Tobias Grosser2017-07-241-3/+3
| | | | llvm-svn: 308927
* Move MemoryAccess::NewAccessRelation to isl++Tobias Grosser2017-07-231-6/+6
| | | | | | We also move related accessor functions llvm-svn: 308840
* Move ScopArrayInfo to isl++Tobias Grosser2017-07-211-2/+4
| | | | | | This moves the full ScopArrayInfo class to isl++ llvm-svn: 308801
* [ScopInfo] Print instructions in dump().Michael Kruse2017-07-211-1/+1
| | | | | | | | | | | | | | | | | | | Print a statement's instruction on dump() regardless of -polly-print-instructions. dump() is supposed to be used in the debugger only and never in regression tests. While debugging, get all the information we have and we are not bound to break anything. For non-dump purposes of print, forward the setting of -polly-print-instructions as parameters. Some calls to print() had to be changed because the PollyPrintInstructions setting is only available in ScopInfo.cpp. In ScheduleOptimizer.cpp, dump() was used in regression tests. That's not what dump() is for. The print parameter "PrintInstructions" will also be useful for an explicit print SCoP pass in a future patch. llvm-svn: 308746
* Make the pattern matching work with modified memory accessesRoman Gareev2017-07-191-10/+10
| | | | | | | | | | | | | Some optimizations (e.g., DeLICM) can modify memory accesses (e.g., change their MemoryKind). Consequently, the pattern matching should take it into the account. Reviewed-by: Tobias Grosser <tobias@grosser.es>, Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D33138 llvm-svn: 308494
* Introduce a hybrid target to generate code for either the GPU or CPUSingapuram Sanjay Srivallabh2017-06-301-0/+4
| | | | | | | | | | | | | | | | | | | | | Summary: Introduce a "hybrid" `-polly-target` option to optimise code for either the GPU or CPU. When this target is selected, PPCGCodeGeneration will attempt first to optimise a Scop. If the Scop isn't modified, it is then sent to the passes that form the CPU pipeline, i.e. IslScheduleOptimizerPass, IslAstInfoWrapperPass and CodeGeneration. In case the Scop is modified, it is marked to be skipped by the subsequent CPU optimisation passes. Reviewers: grosser, Meinersbur, bollu Reviewed By: grosser Subscribers: kbarton, nemanjai, pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D34054 llvm-svn: 306863
* [ScheduleOptimizer] Fix minor typo [NFC]Tobias Grosser2017-06-191-1/+1
| | | | llvm-svn: 305709
* [ScheduleOptimizer] Move isolateFullPartialTiles and ↵Tobias Grosser2017-06-191-94/+88
| | | | | | isolateAndUnrollMatMulInnerLoops to C++ llvm-svn: 305676
* Fix a lot of typos. NFC.Michael Kruse2017-06-081-2/+2
| | | | llvm-svn: 304974
* [ScheduleOptimizer] Move schedule construction to isl C++ [NFC]Tobias Grosser2017-05-211-20/+16
| | | | llvm-svn: 303508
* [Polly] Canonicalize arrays according to base-ptr equivalence classTobias Grosser2017-05-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In case two arrays share base pointers in the same invariant load equivalence class, we canonicalize all memory accesses to the first of these arrays (according to their order in the equivalence class). This enables us to optimize kernels such as boost::ublas by ensuring that different references to the C array are interpreted as accesses to the same array. Before this change the runtime alias check for ublas would fail, as it would assume models of the C array with differing (but identically valued) base pointers would reference distinct regions of memory whereas the referenced memory regions were indeed identical. As part of this change we remove most of the MemoryAccess::get*BaseAddr interface. We removed already all references to get*BaseAddr in previous commits to ensure that no code relies on matching base pointers between memory accesses and scop arrays -- except for three remaining uses where we need the original base pointer. We document for these situations that MemoryAccess::getOriginalBaseAddr may return a base pointer that is distinct to the base pointer of the scop array referenced by this memory access. Reviewers: sebpop, Meinersbur, zinob, gareevroman, pollydev, huihuiz, efriedma, jdoerfert Reviewed By: Meinersbur Subscribers: etherzhhb Tags: #polly Differential Revision: https://reviews.llvm.org/D28518 llvm-svn: 302636
* [FIX] Fix ScheduleTreeOptimizer::optimizeMatMulPatternRoman Gareev2017-04-061-1/+1
| | | | | | Use new values of the dimensions during their permutation. llvm-svn: 299663
* Restore the initial ordering of dimensions before applying the pattern matchingRoman Gareev2017-04-061-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dimensions of band nodes can be implicitly permuted by the algorithm applied during the schedule generation. For example, in case of the following matrix-matrix multiplication, for (i = 0; i < 1024; i++) for (k = 0; k < 1024; k++) for (j = 0; j < 1024; j++) C[i][j] += A[i][k] * B[k][j]; it can produce the following schedule tree domain: "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and 0 <= i2 <= 1023 }" child: schedule: "[{ Stmt_for_body6[i0, i1, i2] -> [(i0)] }, { Stmt_for_body6[i0, i1, i2] -> [(i1)] }, { Stmt_for_body6[i0, i1, i2] -> [(i2)] }]" permutable: 1 coincident: [ 1, 1, 0 ] The current implementation of the pattern matching optimizations relies on the initial ordering of dimensions. Otherwise, it can produce the miscompilation (e.g., [1]). This patch helps to restore the initial ordering of dimensions by recreating the band node when the corresponding conditions are satisfied. Refs.: [1] - https://bugs.llvm.org/show_bug.cgi?id=32500 Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D31741 llvm-svn: 299662
* [Polly] [ScheduleOptimizer] Prevent incorrect tile size computationSiddharth Bhat2017-04-061-0/+11
| | | | | | | | | | | | | | | | | Because Polly exposes parameters that directly influence tile size calculations, one can setup situations like divide-by-zero. Check against a possible divide-by-zero in getMacroKernelParams and return early. Also assert at the end of getMacroKernelParams that the block sizes computed for matrices are positive (>= 1). Tags: #polly Differential Revision: https://reviews.llvm.org/D31708 llvm-svn: 299633
* [Polly] [DependenceInfo] change WAR, WAW generation to correct semanticsSiddharth Bhat2017-04-041-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | = Change of WAR, WAW generation: = - `buildFlow(Sink, MustSource, MaySource, Sink)` treates any flow of the form `sink <- may source <- must source` as a *may* dependence. - we used to call: ```lang=cpp, name=old-flow-call.cpp Flow = buildFlow(MustWrite, MustWrite, Read, Schedule); WAW = isl_union_flow_get_must_dependence(Flow); WAR = isl_union_flow_get_may_dependence(Flow); ``` - This caused some WAW dependences to be treated as WAR dependences. - Incorrect semantics. - Now, we call WAR and WAW correctly. == Correct WAW: == ```lang=cpp, name=new-waw-call.cpp Flow = buildFlow(Write, MustWrite, MayWrite, Schedule); WAW = isl_union_flow_get_may_dependence(Flow); isl_union_flow_free(Flow); ``` == Correct WAR: == ```lang=cpp, name=new-war-call.cpp Flow = buildFlow(Write, Read, MustaWrite, Schedule); WAR = isl_union_flow_get_must_dependence(Flow); isl_union_flow_free(Flow); ``` - We want the "shortest" WAR possible (exact dependences). - We mark all the *must-writes* as may-source, reads as must-souce. - Then, we ask for *must* dependence. - This removes all the reads that flow through a *must-write* before reaching a sink. - Note that we only block ealier writes with *must-writes*. This is intuitively correct, as we do not want may-writes to block must-writes. - Leaves us with direct (R -> W). - This affects reduction generation since RED is built using WAW and WAR. = New StrictWAW for Reductions: = - We used to call: ```lang=cpp,name=old-waw-war-call.cpp Flow = buildFlow(MustWrite, MustWrite, Read, Schedule); WAW = isl_union_flow_get_must_dependence(Flow); WAR = isl_union_flow_get_may_dependence(Flow); ``` - This *is* the right model of WAW we need for reductions, just not in general. - Reductions need to track only *strict* WAW, without any interfering reductions. = Explanation: Why the new WAR dependences in tests are correct: = - We no longer set WAR = WAR - WAW - Hence, we will have WAR dependences that were originally removed. - These may look incorrect, but in fact make sense. == Code: == ```lang=llvm, name=new-war-dependence.ll ; void manyreductions(long *A) { ; for (long i = 0; i < 1024; i++) ; for (long j = 0; j < 1024; j++) ; S0: *A += 42; ; ; for (long i = 0; i < 1024; i++) ; for (long j = 0; j < 1024; j++) ; S1: *A += 42; ; ``` === WAR dependence: === { S0[1023, 1023] -> S1[0, 0] } - Between `S0[1023, 1023]` and `S1[0, 0]`, we will have the dependences: ```lang=cpp, name=dependence-incorrect, counterexample S0[1023, 1023]: *-- tmp = *A (load0)--* WAR 2 add = tmp + 42 | *-> *A = add (store0) | WAR 1 S1[0, 0]: | tmp = *A (load1) | add = tmp + 42 | A = add (store1)<-* ``` - One may assume that WAR2 *hides* WAR1 (since store0 happens before store1). However, within a statement, Polly has no idea about the ordering of loads and stores. - Hence, according to Polly, the code may have looked like this: ```lang=cpp, name=dependence-correct S0[1023, 1023]: A = add (store0) tmp = A (load0) ---* add = A + 42 | WAR 1 S1[0, 0]: | tmp = A (load1) | add = A + 42 | A = add (store1) <-* ``` - So, Polly generates (correct) WAR dependences. It does not make sense to remove these dependences, since they are correct with respect to Polly's model. Reviewers: grosser, Meinersbur tags: #polly Differential revision: https://reviews.llvm.org/D31386 llvm-svn: 299429
OpenPOWER on IntegriCloud