summaryrefslogtreecommitdiffstats
path: root/polly/test
Commit message (Collapse)AuthorAgeFilesLines
...
* [lit] Force site configs to be run before source-tree configsZachary Turner2017-09-142-136/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch simplifies LLVM's lit infrastructure by enforcing an ordering that a site config is always run before a source-tree config. A significant amount of the complexity from lit config files arises from the fact that inside of a source-tree config file, we don't yet know if the site config has been run. However it is *always* required to run a site config first, because it passes various variables down through CMake that the main config depends on. As a result, every config file has to do a bunch of magic to try to reverse-engineer the location of the site config file if they detect (heuristically) that the site config file has not yet been run. This patch solves the problem by emitting a mapping from source tree config file to binary tree site config file in llvm-lit.py. Then, during discovery when we find a config file, we check to see if we have a target mapping for it, and if so we use that instead. This mechanism is generic enough that it does not affect external users of lit. They will just not have a config mapping defined, and everything will work as normal. On the other hand, for us it allows us to make many simplifications: * We are guaranteed that a site config will be executed first * Inside of a main config, we no longer have to assume that attributes might not be present and use getattr everywhere. * We no longer have to pass parameters such as --param llvm_site_config=<path> on the command line. * It is future-proof, meaning you don't have to edit llvm-lit.in to add support for new projects. * All of the duplicated logic of trying various fallback mechanisms of finding a site config from the main config are now gone. One potentially noteworthy thing that was required to implement this change is that whereas the ninja check targets previously used the first method to spawn lit, they now use the second. In particular, you can no longer run lit.py against the source tree while specifying the various `foo_site_config=<path>` parameters. Instead, you need to run llvm-lit.py. Differential Revision: https://reviews.llvm.org/D37756 llvm-svn: 313270
* Unroll and separate the remaining parts of isolationRoman Gareev2017-09-114-167/+722
| | | | | | | | | | | | | | | | | | | | | | | | The remaining parts produced by the full partial tile isolation can contain hot spots that are worth to be optimized. Currently, we rely on the simple loop unrolling pass, LiCM and the SLP vectorizer to optimize such parts. However, the approach can suffer from the lack of the information about aliasing that Polly provides using additional alias metadata or/and the lack of the information required by simple loop unrolling pass. This patch is the first step to optimize the remaining parts. To do it, we unroll and separate them. In case of, for instance, Intel Kaby Lake, it helps to increase the performance of the generated code from 39.87 GFlop/s to 49.23 GFlop/s. The next possible step is to avoid unrolling performed by Polly in case of isolated and remaining parts and rely only on simple loop unrolling pass and the Loop vectorizer. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D37692 llvm-svn: 312929
* [CodeGen] Bitcast scalar writes to actual value.Michael Kruse2017-09-071-0/+37
| | | | | | | | The type of NewValue might change due to ScalarEvolution looking though bitcasts. The synthesized NewValue therefore becomes the type before the bitcast. llvm-svn: 312718
* Revert "[ScopDetect/Info] Look through PHIs that follow an error block"Michael Kruse2017-09-061-61/+0
| | | | | | | | | | This reverts commit r312410 - [ScopDetect/Info] Look through PHIs that follow an error block The commit caused generation of invalid IR due to accessing a parameter that does not dominate the SCoP. llvm-svn: 312663
* [test] Add forgotten REQUIRES: line.Michael Kruse2017-09-061-0/+1
| | | | llvm-svn: 312632
* [ZoneAlgo] Handle non-StoreInst/LoadInst MemoryAccesses including memset.Michael Kruse2017-09-064-7/+220
| | | | | | | | | | | Up to now ZoneAlgo considered array elements access by something else than a LoadInst or StoreInst as not analyzable. This patch removes that restriction by using the unknown ValInst to describe the written content, repectively the element type's null value in case of memset. Differential Revision: https://reviews.llvm.org/D37362 llvm-svn: 312630
* [Simplify] Actually remove unsed instruction from region header.Michael Kruse2017-09-051-0/+62
| | | | | | | | | | | | | | | | | Since r312249 instructions of a entry block of region statements are not marked as root anymore and hence can theoretically be removed if unused. Theoretically, because the instruction list was not changed. Still, MemoryAccesses for unused instructions were removed. This lead to a failed assertion in the code generator when the MemoryAccess for the still listed instruction was not found. This hould fix the Assertion failed: ArrayAccess && "No array access found for instruction!", file ScopInfo.h, line 1494 compiler crashes. llvm-svn: 312566
* [ForwardOp] Remove read accesses for all instructions that have been movedTobias Grosser2017-09-032-4/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, OpTree did not consider forwarding an operand tree consisting of only single LoadInst as useful. The motivation was that, like an access to a read-only variable, it would just replace one MemoryAccess by another. However, in contrast to read-only accesses, this would replace a scalar access by an array access, which is something worth doing. In addition, leaving scalar MemoryAccess is problematic in that VirtualUse prioritizes inter-Stmt use over intra-Stmt. It was possible that the same LLVM value has a MemoryAccess for accessing the remote Stmt's LoadInst as well as having the same LoadInst in its own instruction list (due to being forwarded from another operand tree). With this patch we ensure that if a LoadInst is forwarded is any operand tree, also the operand tree containing just the LoadInst is forwarded as well, which effectively removes the scalar MemoryAccess such that only the array access remains, not both. Thanks Michael for the detailed explanation. Reviewers: Meinersbur, bellu, singam-sanjay, gareevroman Subscribers: hfinkel, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D37424 llvm-svn: 312456
* [IslAst] Do not assert in case of empty min/max alias locationsTobias Grosser2017-09-031-0/+72
| | | | | | | | | | | | | In certain situations, the context in the isl_ast_build could result for the min/max locations of our alias sets to become empty, which would cause an internal error in isl, which is then unable to derive a value for these expressions. Check these conditions before code generating expressions and instead assume that alias check succeeded. This is valid, as the corresponding memory accesses will not be executed under any valid context. This fixed llvm.org/PR34432. Thanks to Qirun Zhang for reporting. llvm-svn: 312455
* [ScopHelper] Do not crash on unreachable blocksTobias Grosser2017-09-031-0/+36
| | | | | | This resolves llvm.org/PR34433. Thanks to Zhendong Su for reporting. llvm-svn: 312451
* [ScopDetect/Info] Look through PHIs that follow an error blockTobias Grosser2017-09-021-0/+61
| | | | | | | | In case a PHI node follows an error block we can assume that the incoming value can only come from the node that is not an error block. As a result, conditions that seemed non-affine before are now in fact affine. llvm-svn: 312410
* [ISLNodeBuilder] Materialize Fortran array sizes of arrays without memory ↵Siddharth Bhat2017-09-011-0/+102
| | | | | | | | | | | | | | | | | | | | | | accesses. In Polly, we specifically add a paramter to represent the outermost dimension size of fortran arrays. We do this because this information is statically available from the fortran metadata generated by dragonegg. However, we were only materializing these parameters (meaning, creating an llvm::Value to back the isl_id) from *memory accesses*. This is wrong, we should materialize parameters from *scop array info*. It is wrong because if there is a case where we detect 2 fortran arrays, but only one of them is accessed, we may not materialize the other array's dimensions at all. This is incorrect. We fix this by looping over all `polly::ScopArrayInfo` in a scop, rather that just all `polly::MemoryAccess`. Differential Revision: https://reviews.llvm.org/D37379 llvm-svn: 312350
* Fix Memory Access of failing tests.Michael Kruse2017-09-012-0/+103
| | | | | | | | | | | Mark scalar dependences for different statements belonging to same BB as 'Inter'. Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in> Differential Revision: https://reviews.llvm.org/D37147 llvm-svn: 312324
* [ForwardOpTree] Allow forwarding in the presence of region statementsTobias Grosser2017-08-313-11/+166
| | | | | | | | | | | | | | | | | | Summary: After region statements now also have instruction lists, this is a straightforward extension. Reviewers: Meinersbur, bollu, singam-sanjay, gareevroman Reviewed By: Meinersbur Subscribers: hfinkel, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D37298 llvm-svn: 312249
* [PPCGCodeGen] Convert intrinsics to libdevice functions whenever possible.Siddharth Bhat2017-08-312-6/+14
| | | | | | | | | | | | | | | | This is useful when we face certain intrinsics such as `llvm.exp.*` which cannot be lowered by the NVPTX backend while other intrinsics can. So, we would need to keep blacklists of intrinsics that cannot be handled by the NVPTX backend. It is much simpler to try and promote all intrinsics to libdevice versions. This patch makes function/intrinsic very uniform, and will always try to use a libdevice version if it exists. Differential Revision: https://reviews.llvm.org/D37056 llvm-svn: 312239
* [BlockGenerator] Generate entry block of regions from instruction listsTobias Grosser2017-08-312-4/+58
| | | | | | | | | The adds code generation support for the previous commit. This patch has been re-applied, after the memory issue in the previous patch has been fixed. llvm-svn: 312211
* [ScopInfo] Use statement lists for entry blocks of region statementsTobias Grosser2017-08-312-0/+75
| | | | | | | | | | | | | | | By using statement lists in the entry blocks of region statements, instruction level analyses also work on region statements. We currently only model the entry block of a region statements, as this is sufficient for most transformations the known-passes currently execute. Modeling instructions in the presence of control flow (e.g. infinite loops) is left out to not increase code complexity too much. It can be added when good use cases are found. This change set is reapplied, after a memory corruption issue had been fixed. llvm-svn: 312210
* Revert "[ScopInfo] Use statement lists for entry blocks of region statements"Tobias Grosser2017-08-312-75/+0
| | | | | | This reverts commit r312128. It aused some memory issues. llvm-svn: 312209
* Revert "[BlockGenerator] Generate entry block of regions from instruction lists"Tobias Grosser2017-08-312-58/+4
| | | | | | This reverts commit r312129. It caused some memory issues. llvm-svn: 312208
* Adapt testcase to LLVM change in DIGlobalVariableExpression.Adrian Prantl2017-08-301-1/+1
| | | | llvm-svn: 312147
* [BlockGenerator] Generate entry block of regions from instruction listsTobias Grosser2017-08-302-4/+58
| | | | | | The adds code generation support for the previous commit. llvm-svn: 312129
* [ScopInfo] Use statement lists for entry blocks of region statementsTobias Grosser2017-08-302-0/+75
| | | | | | | | | | | | | By using statement lists in the entry blocks of region statements, instruction level analyses also work on region statements. We currently only model the entry block of a region statements, as this is sufficient for most transformations the known-passes currently execute. Modeling instructions in the presence of control flow (e.g. infinite loops) is left out to not increase code complexity too much. It can be added when good use cases are found. llvm-svn: 312128
* [ScopBuilder] Introduce metadata for splitting scop statement.Michael Kruse2017-08-307-14/+397
| | | | | | | | | | | | This patch allows annotating of metadata in ir instruction (with "polly_split_after"), which specifies where to split a particular scop statement. Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in> Differential Revision: https://reviews.llvm.org/D36402 llvm-svn: 312107
* [ZoneAlgo] More fine-grained bail-out.Michael Kruse2017-08-282-0/+167
| | | | | | | | | | | | | | | | | | | | | ZoneAlgo used to bail out for the complete SCoP if it encountered something violating its assumption. This meant the neither OpTree can forward any load nor DeLICM do anything in such cases, even if their transformations are unrelated to the violations. This patch adds a list of compatible elements (currently with the granularity of entire arrays) that can be used for analysis. OpTree and DeLICM can then check whether their transformations only concern compatible elements, and skip non-compatible ones. This will be useful for e.g. Polybench's benchmarks covariance, correlation, bicg, doitgen, durbin, gramschmidt, adi that have assumption violation, but which are not necessarily relevant for all transformations. Differential Revision: https://reviews.llvm.org/D37219 llvm-svn: 311929
* [IslAst] Do not compare arrays in alias check which are known to be identicalTobias Grosser2017-08-281-0/+33
| | | | | | This possibly helps to avoid run-time check failures in the COSMO kernels. llvm-svn: 311920
* [Detect] Consider nested loop profitable if entry block is not in loopTobias Grosser2017-08-271-0/+94
| | | | | | | | | | | | | In cases where the entry block of a scop was not contained in a loop that was part of the scop region and at the same time there was a loop surrounding the scop, we missed to count the loops in the scop and consequently did not consider the scop profitable. We correct this by only moving to the loop parent, in case the current loop is loop contained in the scop. This increases the number of loops in COSMO which we assume to be profitable from 3974 to 4981. llvm-svn: 311863
* Revert "[polly] Fix ScopDetectionDiagnostic test failure caused by r310940"Tobias Grosser2017-08-241-1/+1
| | | | | | | | This reverts commit 950849ece9bb8fdd2b41e3ec348b9653b4e37df6. This commit broke various buildbots. llvm-svn: 311692
* [CodeGen] Detect impossible partial write conditions more reliably.Michael Kruse2017-08-243-0/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whether a partial write is tautological/unsatisfiable not only depends on the access domain, but also on the domain covered by its node in the AST. In the example below, there are two instances of Stmt_cond_false. It may have a partial write access that is not executed in instance Stmt_cond_false(0). for (int c0 = 0; c0 < tmp5; c0 += 1) { Stmt_for_body344(c0); if (tmp5 >= c0 + 2) Stmt_cond_false(c0); Stmt_cond_end(c0); } if (tmp5 <= 0) { Stmt_for_body344(0); Stmt_cond_false(0); Stmt_cond_end(0); } Isl cannot derive a subscript for an array element that is never accessed. This caused an error in that no subscript expression has been generated in IslNodeBuilder::createNewAccesses, but BlockGenerator expected one to exist because there is an execution of that write, just not in that ast node. Fixed by instead of determining whether the access domain is empty, inspect whether isl generated a constant "false" ast expression in the current ast node. This should fix a compiler crash of the aosp buildbot. llvm-svn: 311663
* [Polly][WIP] Scalar fully indexed expansionAndreas Simbuerger2017-08-246-1/+463
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch comes directly after https://reviews.llvm.org/D34982 which allows fully indexed expansion of MemoryKind::Array. This patch allows expansion for MemoryKind::Value and MemoryKind::PHI. MemoryKind::Value seems to be working with no majors modifications of D34982. A test case has been added. Unfortunatly, no "run time" checks can be done for now because as @Meinersbur explains in a comment on D34982, DependenceInfo need to be cleared and reset to take expansion into account in the remaining part of the Polly pipeline. There is no way to do that in Polly for now. MemoryKind::PHI is not working. Test case is in place, but not working. To expand MemoryKind::Array, we expand first the write and then after the reads. For MemoryKind::PHI, the idea of the current implementation is to exchange the "roles" of the read and write and expand first the read according to its domain and after the writes. But with this strategy, I still encounter the problem of union_map in new access map. For example with the following source code (source code of the test case) : ``` void mse(double A[Ni], double B[Nj]) { int i,j; double tmp = 6; for (i = 0; i < Ni; i++) { for (int j = 0; j<Nj; j++) { tmp = tmp + 2; } B[i] = tmp; } } ``` Polly gives us the following statements and memory accesses : ``` Statements { Stmt_for_body Domain := { Stmt_for_body[i0] : 0 <= i0 <= 9999 }; Schedule := { Stmt_for_body[i0] -> [i0, 0, 0] }; ReadAccess := [Reduction Type: NONE] [Scalar: 1] { Stmt_for_body[i0] -> MemRef_tmp_04__phi[] }; MustWriteAccess := [Reduction Type: NONE] [Scalar: 1] { Stmt_for_body[i0] -> MemRef_tmp_11__phi[] }; Instructions { %tmp.04 = phi double [ 6.000000e+00, %entry.split ], [ %add.lcssa, %for.end ] } Stmt_for_inc Domain := { Stmt_for_inc[i0, i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9999 }; Schedule := { Stmt_for_inc[i0, i1] -> [i0, 1, i1] }; MustWriteAccess := [Reduction Type: NONE] [Scalar: 1] { Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] }; ReadAccess := [Reduction Type: NONE] [Scalar: 1] { Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] }; MustWriteAccess := [Reduction Type: NONE] [Scalar: 1] { Stmt_for_inc[i0, i1] -> MemRef_add_lcssa__phi[] }; Instructions { %tmp.11 = phi double [ %tmp.04, %for.body ], [ %add, %for.inc ] %add = fadd double %tmp.11, 2.000000e+00 %exitcond = icmp ne i32 %inc, 10000 } Stmt_for_end Domain := { Stmt_for_end[i0] : 0 <= i0 <= 9999 }; Schedule := { Stmt_for_end[i0] -> [i0, 2, 0] }; MustWriteAccess := [Reduction Type: NONE] [Scalar: 1] { Stmt_for_end[i0] -> MemRef_tmp_04__phi[] }; ReadAccess := [Reduction Type: NONE] [Scalar: 1] { Stmt_for_end[i0] -> MemRef_add_lcssa__phi[] }; MustWriteAccess := [Reduction Type: NONE] [Scalar: 0] { Stmt_for_end[i0] -> MemRef_B[i0] }; Instructions { %add.lcssa = phi double [ %add, %for.inc ] store double %add.lcssa, double* %arrayidx, align 8 %exitcond5 = icmp ne i64 %indvars.iv.next, 10000 } } ``` and the following dependences : ``` { Stmt_for_inc[i0, 9999] -> Stmt_for_end[i0] : 0 <= i0 <= 9999; Stmt_for_inc[i0, i1] -> Stmt_for_inc[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998; Stmt_for_body[i0] -> Stmt_for_inc[i0, 0] : 0 <= i0 <= 9999; Stmt_for_end[i0] -> Stmt_for_body[1 + i0] : 0 <= i0 <= 9998 } ``` When trying to expand this memory access : ``` { Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] }; ``` The new access map would look like this : ``` { Stmt_for_inc[i0, 9999] -> MemRef_tmp_11__phi_exp[i0] : 0 <= i0 <= 9999; Stmt_for_inc[i0, i1] ->MemRef_tmp_11__phi_exp[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998 } ``` The idea to implement the expansion for PHI access is an idea from @Meinersbur and I don't understand why my implementation does not work. I should have miss something in the understanding of the idea. Contributed by: Nicolas Bonfante <nicolas.bonfante@gmail.com> Reviewers: Meinersbur, simbuerg, bollu Reviewed By: Meinersbur Subscribers: llvm-commits, pollydev, Meinersbur Differential Revision: https://reviews.llvm.org/D36647 llvm-svn: 311619
* [ScopDetect] Include zero-iteration loops in loop count.Michael Kruse2017-08-231-8/+36
| | | | | | | | | | | | | | | | | | | | Loop with zero iteration are, syntactically, loops. They have been excluded from the loop counter even for the non-profitable counters. This seems to be unintentially as the sentinel value of '0' minimal iterations does exclude such loops. Fix by never considering the iteration count when the sentinel value of 0 is found. This makes the recently added NumTotalLoops couter redundant with NumLoopsOverall, which now is equivalent. Hence, NumTotalLoops is removed as well. Note: The test case 'ScopDetect/statistics.ll' effectively does not check profitability, because -polly-process-unprofitable is passed to all test cases. llvm-svn: 311551
* [polly] Fix ScopDetectionDiagnostic test failure caused by r310940Jakub Kuderski2017-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Summary: ScopDetection used to check if a loop withing a region was infinite and emitted a diagnostic in such cases. After r310940 there's no point checking against that situation, as infinite loops don't appear in regions anymore. The test failure was observed on these two polly buildbots: http://lab.llvm.org:8011/builders/polly-arm-linux/builds/8368 http://lab.llvm.org:8011/builders/polly-amd64-linux/builds/10310 This patch XFAILs `ReportLoopHasNoExit.ll` and turns infinite loop detection into an assert. Reviewers: grosser, sanjoy, bollu Reviewed By: grosser Subscribers: efriedma, aemerson, kristof.beyls, dberlin, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D36776 llvm-svn: 311503
* [IRBuilder] Only emit alias scop metadata for arrays, but not scalarsTobias Grosser2017-08-222-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: There is no need to emit alias metadata for scalars, as basicaa will easily distinguish them from arrays. This reduces the size of the metadata we generate. This is especially useful after we moved to -polly-position=before-vectorizer, where a lot more scalar dependences are introduced, which increased the size of the alias analysis metadata and made us commonly reach the limits after which we do not emit alias metadata that have been introduced to prevent quadratic growth of this alias metadata. This improves 2mm performance from 1.5 seconds to 0.17 seconds. Reviewers: Meinersbur, bollu, singam-sanjay Reviewed By: Meinersbur Subscribers: pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D37028 llvm-svn: 311498
* Disable the Loop Vectorizer in case of GEMMRoman Gareev2017-08-2211-0/+19
| | | | | | | | | | | | | | Currently, in case of GEMM and the pattern matching based optimizations, we use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop Vectorizer can get in the way of optimal code generation, we disable the Loop Vectorizer for the innermost loop using mark nodes and emitting the corresponding metadata. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D36928 llvm-svn: 311473
* [test] Do not pipe binary data to FileCheck.Michael Kruse2017-08-221-1/+1
| | | | llvm-svn: 311470
* [ScopDetection] Add stat for total number of loops.Michael Kruse2017-08-221-0/+1
| | | | | | | The total number of loops is useful as a baseline comparing how many loops have been optimized in different configurations. llvm-svn: 311469
* test/GPGPU/invalid-kernel-assert-verifymodule.ll also requires assertionsTobias Grosser2017-08-221-1/+1
| | | | llvm-svn: 311423
* [ScopInfo] Add option to treat all function parameters as dereferencible.Siddharth Bhat2017-08-211-0/+98
| | | | | | | | | | | | | | | | | | | Dragonegg generates most function parameters as pointers to the actual parameters. However, it does not mark these parameters with the dereferencable attribute. Polly is conservative when it comes to invariant load hoisting, thus we add runtime checks to invariant load hoisted pointers when we do not know that pointers are dereferencable. This is correct behaviour, but is a performance penalty. Add a flag that allows all pointer parameters to be dereferencable. That way, polly can speculatively load-hoist paramters to functions without runtime checks. Differential Revision: https://reviews.llvm.org/D36461 llvm-svn: 311329
* [GPGPU] Add llvm.powi to the libdevice supported functionsTobias Grosser2017-08-211-2/+6
| | | | | | These intrinsics are used in COSMO. llvm-svn: 311324
* [GPGPU] Add log / logf to the libdevice supported functionsTobias Grosser2017-08-212-4/+11
| | | | | | These two functions are used in COSMO llvm-svn: 311322
* [MatMul] Make MatMul detection independent of internal isl representations.Michael Kruse2017-08-203-0/+151
| | | | | | | | | | | | | | | | | The pattern recognition for MatMul is restrictive. The number of "disjuncts" in the isl_map containing constraint information was previously required to be 1 (as per isl_*_coalesce - which should ideally produce a domain map with a single disjunct, but does not under some circumstances). This was changed and made more flexible. Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in> Differential Revision: https://reviews.llvm.org/D36460 llvm-svn: 311302
* Revert "[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]"Tobias Grosser2017-08-192-8/+8
| | | | | | | | | We still see some issues with parameter space mismatches. Revert this to get a clean baseline. We will recommit after these issues have been resolved. This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49. llvm-svn: 311268
* [GPGPU] Correctly initialize array order and fixed_element informationTobias Grosser2017-08-193-33/+98
| | | | | | | | | | | | | | | | | | | | | | Summary: This information is necessary for PPCG to perform correct life range reordering. With these changes applied we can live-range reorder some of the important kernels in COSMO. We also update and rename one test case, which previously could not be optimized and now is optimized thanks to live-range reordering. To preserve test coverage we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises our automatic abort in case of scalar writes in the kernel. Reviewers: Meinersbur, bollu, singam-sanjay Subscribers: nemanjai, pollydev, llvm-commits, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D36929 llvm-svn: 311259
* [PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtimePhilipp Schaad2017-08-196-20/+12
| | | | | | | | Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen. Differential revision: D36925 llvm-svn: 311248
* [GPGPU] Collect parameter dimension used in MemoryAccessesTobias Grosser2017-08-191-0/+44
| | | | | | | | | | | When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory we add parameter dimensions lazily to the domains, which results in PPCG not including parameter dimensions that are only used in memory accesses in the kernel space. To make sure these parameters are still passed to the kernel, we collect these parameter dimensions and align the kernel's parameter space before code-generating it. llvm-svn: 311239
* [Polly][Bug fix] Wrong dependences filtering during Fully Indexed expansionAndreas Simbuerger2017-08-186-189/+335
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When trying to expand memory accesses, the current version of Polly uses statement Level dependences. The actual implementation is not working in case of multiple dependences per statement. For example in the following source code : ``` void mse(double A[Ni], double B[Nj], double C[Nj], double D[Nj]) { int i,j; for (j = 0; j < Ni; j++) { for (int i = 0; i<Nj; i++) S: B[i] = i; for (int i = 0; i<Nj; i++) T: D[i] = i; U: A[j] = B[j]; C[j] = D[j]; } } ``` The statement U has two dependences with S and T. The current version of polly fails during expansion. This patch aims to fix this bug. For that, we use Reference Level dependences to be able to filter dependences according to statement and memory ref. The principle of expansion remains the same as before. We also noticed that we need to bail out if load come after store (at the same position) in same statement. So a check was added to isExpandable. Contributed by: Nicholas Bonfante <nicolas.bonfante@insa-lyon.fr> Reviewers: Meinersbur, simbuerg, bollu Reviewed By: Meinersbur, simbuerg Subscribers: pollydev, llvm-commits Differential Revision: https://reviews.llvm.org/D36791 llvm-svn: 311165
* [GPGPU] Simplify PPCGSCop to reduce compile time [NFC]Tobias Grosser2017-08-182-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Drop unused parameter dimensions to reduce the size of the sets we are working with. Especially the computed dependences tend to accumulate a lot of parameters that are present in the input memory accesses, but often not necessary to express the actual dependences. As isl represents maps and sets with dense matrices, reducing the dimensionality of isl sets commonly reduces code generation performance. This reduces compile time from 17 to 11 seconds for our test case. While this is not impressive, this patch helped me to identify the previous two performance improvements and additionally also increases readability of the isl data structures we use. Reviewers: Meinersbur, bollu, singam-sanjay Reviewed By: bollu Subscribers: nemanjai, pollydev, llvm-commits, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D36869 llvm-svn: 311161
* [ScopInliner] Add a simple Scop-based inliner to polly.Siddharth Bhat2017-08-173-0/+146
| | | | | | | | | | | | | We add a ScopInliner pass which inlines functions based on a simple heuristic: Let `g` call `f`. If we can model all of `f` as a Scop, we inline `f` into `g`. This requires `-polly-detect-full-function` to be enabled. So, the pass asserts that `-polly-detect-full-function` is enabled. Differential Revision: https://reviews.llvm.org/D36832 llvm-svn: 311126
* [ManagedMemoryRewrite] Rewrite malloc, free correctly inside `Constant`s.Siddharth Bhat2017-08-171-0/+95
| | | | | | | | | | | | | | | | | Reuse the machinery built for replacing global arrays to replace malloc/free as well. Example replacement that was missed earlier: ``` call void \ bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13) ``` - Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss this. We don't miss this anymore. Differential Revision: https://reviews.llvm.org/D36825 llvm-svn: 311121
* [GPGPU] Make test case independent of LLVM namesTobias Grosser2017-08-171-2/+2
| | | | | | | | In release builds LLVM may not pass along LLVM names consistently. We make the test cases independent of the LLVM-IR names to avoid spurious test case failures. llvm-svn: 311118
* [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.Siddharth Bhat2017-08-172-0/+134
| | | | | | | | | | | | | | | - If we have global arrays, we would like to rewrite them to global pointers which are allocated using `cudaMallocManaged`. - If we have allocas in a function, we would like to rewrite them to heap-allocations with `cudaMallocManaged` and `cudaFree`. - With these rewrite mechanisms, we can offload _any_ function to the GPU with no code rewrite whatsover. Differential Revision: https://reviews.llvm.org/D36516 llvm-svn: 311080
OpenPOWER on IntegriCloud