summaryrefslogtreecommitdiffstats
path: root/polly/test
Commit message (Collapse)AuthorAgeFilesLines
...
* [tests] Make sure tests do not end in 'unreachable' - Part IITobias Grosser2017-03-0710-14/+14
| | | | | | | | | | There is no point in optimizing unreachable code, hence our test cases should always return. This commit is part of a series that makes Polly more robust on the presence of unreachables. llvm-svn: 297150
* [tests] Make sure tests do not end in 'unreachable'Tobias Grosser2017-03-075-5/+5
| | | | | | | | | | There is no point in optimizing unreachable code, hence our test cases should always return. This commit is part of a series that makes Polly more robust on the presence of unreachables. llvm-svn: 297147
* Adapt to llvm change r296992 to unbreak the botsSanjoy Das2017-03-061-2/+2
| | | | | | | | | | | | | r296992 made ScalarEvolution's CompareValueComplexity less aggressive, and that broke the polly test being fixed in this change. This change explicitly bumps CompareValueComplexity in said test case to make it pass. Can someone from the polly team please can give me an idea on if this case is important enough to have scalar-evolution-max-value-compare-depth be 3 by default? llvm-svn: 296994
* [tests] Specify the dependence to NVPTX backend for Polly ACC test casesTobias Grosser2017-03-0313-12/+16
| | | | | | | | | | | | | Some Polly ACC test cases fail without a working NVPTX backend. We explicitly specify this dependence in REQUIRES. Alternatively, we could have only marked polly-acc as supported in case the NVPTX backend is available, but as we might use other backends in the future, this does not seem to be the best choice. For this to work, we also need to make the 'targets_to_build' information available. Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 296853
* [test] Do not emit binary data to outputTobias Grosser2017-03-038-8/+8
| | | | | Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 296852
* Revert "Currently broken by recent LLVM upstream changes"Tobias Grosser2017-03-021-2/+0
| | | | | | | This reverts commit r296579, which is not needed anymore as the relevant changes in trunk have been reverted. llvm-svn: 296817
* [ScopDetection] Do not allow required-invariant loads in non-affine regionTobias Grosser2017-03-021-0/+66
| | | | | | | | | | These loads cannot be savely hoisted as the condition guarding the non-affine region cannot be duplicated to also protect the hoisted load later on. Today they are dropped in ScopInfo. By checking for this early, we do not even try to model them and possibly can still optimize smaller regions not containing this specific required-invariant load. llvm-svn: 296744
* [ScopInfo] Disable memory folding in case it results in multi-disjunct relationsTobias Grosser2017-03-014-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multi-disjunct access maps can easily result in inbound assumptions which explode in case of many memory accesses and many parameters. This change reduces compilation time of some larger kernel from over 15 minutes to less than 16 seconds. Interesting is the test case test/ScopInfo/multidim_param_in_subscript.ll which has a memory access [n] -> { Stmt_for_body3[i0, i1] -> MemRef_A[i0, -1 + n - i1] } which requires folding, but where only a single disjunct remains. We can still model this test case even when only using limited memory folding. For people only reading commit messages, here the comment that explains what memory folding is: To recover memory accesses with array size parameters in the subscript expression we post-process the delinearization results. We would normally recover from an access A[exp0(i) * N + exp1(i)] into an array A[][N] the 2D access A[exp0(i)][exp1(i)]. However, another valid delinearization is A[exp0(i) - 1][exp1(i) + N] which - depending on the range of exp1(i) - may be preferrable. Specifically, for cases where we know exp1(i) is negative, we want to choose the latter expression. As we commonly do not have any information about the range of exp1(i), we do not choose one of the two options, but instead create a piecewise access function that adds the (-1, N) offsets as soon as exp1(i) becomes negative. For a 2D array such an access function is created by applying the piecewise map: [i,j] -> [i, j] : j >= 0 [i,j] -> [i-1, j+N] : j < 0 After this patch we generate only the first case, except for situations where we can proove the first case to be invalid and can consequently select the second without introducing disjuncts. llvm-svn: 296679
* Currently broken by recent LLVM upstream changesTobias Grosser2017-03-011-0/+2
| | | | | | | We mark it as XFAIL to get buildbots back to green, until the upstream changes have been addressed. llvm-svn: 296579
* [ScopInfo] Simplify inbounds assumptions under domain constraintsTobias Grosser2017-02-285-4/+779
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this simplification for a loop nest: void foo(long n1_a, long n1_b, long n1_c, long n1_d, long p1_b, long p1_c, long p1_d, float A_1[][p1_b][p1_c][p1_d]) { for (long i = 0; i < n1_a; i++) for (long j = 0; j < n1_b; j++) for (long k = 0; k < n1_c; k++) for (long l = 0; l < n1_d; l++) A_1[i][j][k][l] += i + j + k + l; } the assumption: n1_a <= 0 or (n1_a > 0 and n1_b <= 0) or (n1_a > 0 and n1_b > 0 and n1_c <= 0) or (n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d <= 0) or (n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d > 0 and p1_b >= n1_b and p1_c >= n1_c and p1_d >= n1_d) is taken rather than the simpler assumption: p9_b >= n9_b and p9_c >= n9_c and p9_d >= n9_d. The former is less strict, as it allows arbitrary values of p1_* in case, the loop is not executed at all. However, in practice these precise constraints explode when combined across different accesses and loops. For now it seems to make more sense to take less precise, but more scalable constraints by default. In case we find a practical example where more precise constraints are needed, we can think about allowing such precise constraints in specific situations where they help. This change speeds up the new test case from taking very long (waited at least a minute, but it probably takes a lot more) to below a second. llvm-svn: 296456
* [Cmake] Optionally use a system isl version.Michael Kruse2017-02-271-17/+27
| | | | | | | | | | | | | | | This patch adds an option to build against a version of libisl already installed on the system. The installation is autodetected using the pkg-config file shipped with isl. The detection of the library is in the FindISL.cmake module that creates an imported target. Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com> Differential Revision: https://reviews.llvm.org/D30043 llvm-svn: 296361
* [DeLICM] Add nomap regressions tests. NFC.Michael Kruse2017-02-277-0/+547
| | | | | | | | | | | | These verify that some scalars are not mapped because it would be incorrect to do so. For these check we verify that no transformation has been executed from output of the pass's '-analyze'. Adding optimization remarks is not useful as it would result in too many messages, even repeated ones. I avoided checking the '-debug-only=polly-delicm' output which is an antipattern. llvm-svn: 296348
* Make optimizations based on pattern matching be enabled by defaultRoman Gareev2017-02-234-8/+17
| | | | | | | | | | | | | Currently, pattern based optimizations of Polly can identify matrix multiplication and optimize it according to BLIS matmul optimization pattern (see ScheduleTreeOptimizer for details). This patch makes optimizations based on pattern matching be enabled by default. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30293 llvm-svn: 295958
* [DeLICM] Regression test for skipping map targets.Michael Kruse2017-02-233-0/+183
| | | | | | | Add optimization-remarks-missed for when mapping targets have been skipped and add regression tests for them. llvm-svn: 295953
* [DeLICM] Add regression tests for DeLICM reject cases.Michael Kruse2017-02-225-0/+349
| | | | | | | | | | | | | | | | | These tests were not included in the main DeLICM commit. These check the cases where zone analysis cannot be successful because of assumption violations. We use the LLVM optimization remark infrastructure as it seems to be the best fit for this kind of messages. I tried to make use if the OptimizationRemarkEmitter. However, it would insert additional function passes into the pass manager to get the hotness information. The pass manager would insert them between the flatten pass and delicm, causing the ScopInfo with the flattened schedule being thrown away. Differential Revision: https://reviews.llvm.org/D30253 llvm-svn: 295846
* [DeLICM] Map values hoisted by LICM back to the array.Michael Kruse2017-02-211-0/+130
| | | | | | | | | | | | | | | | | | | | Implement the -polly-delicm pass. The pass intends to undo the effects of LoopInvariantCodeMotion (LICM) which adds additional scalar dependencies into SCoPs. DeLICM will try to map those scalars back to the array elements they were promoted from, as long as the array element is unused. The is the main patch from the DeLICM/DePRE patch series. It does not yet undo GVN PRE for which additional information about known values is needed and does not handle PHI write accesses that have have no target. As such its usefulness is limited. Patches for these issues including regression tests for error situatons will follow. Reviewers: grosser Differential Revision: https://reviews.llvm.org/D24716 llvm-svn: 295713
* [ScopInfo] Count read-only arrays when computing complexity of alias checkTobias Grosser2017-02-181-0/+224
| | | | | | | | | | | Instead of counting the number of read-only accesses, we now count the number of distinct read-only array references when checking if a run-time alias check may be too complex. The run-time alias check is quadratic in the number of base pointers, not the number of accesses. Before this change we accidentally skipped SPEC's lbm test case. llvm-svn: 295567
* [test] Add reduction sequence test case [NFC]Tobias Grosser2017-02-181-0/+539
| | | | | | | | | | | | | | | | | This test case is a mini performance test case that shows the time needed for a couple of simple reductions. It takes today about 325ms on my machine to run this test case through 'opt' with scop construction and reduction detection. It can be used as mini-proxy for further tuning of the reduction code. Generally we do not commit performance test cases, but as this is very small and also very fast it seems OK to keep it in the lit test suite. This test case will also help to verify that future changes to the reduction code will not affect the ordering of the reduction sets and will consequently not cause spurious performance changes that only result from reordering of dependences in the reduction set. llvm-svn: 295549
* [ScopInfo] Add statistics to count loops after scop modelingTobias Grosser2017-02-171-0/+241
| | | | llvm-svn: 295431
* [ScopDetection] Compute the maximal loop depth correctlyTobias Grosser2017-02-171-0/+249
| | | | | | | Before this change, we obtained loop depth numbers that were deeper then the actual loop depth. llvm-svn: 295430
* Updated isl to isl-0.18-254-g6bc184dTobias Grosser2017-02-1711-23/+25
| | | | | | | This update includes a couple more coalescing changes as well as a large number of isl-internal code cleanups (dead assigments, ...). llvm-svn: 295419
* [ScopInfo] Do not try to fold array dimensions of size zeroTobias Grosser2017-02-171-0/+59
| | | | | | | | | | Trying to fold such kind of dimensions will result in a division by zero, which crashes the compiler. As such arrays are likely to invalidate the scop anyhow (but are not illegal in LLVM-IR), there is no point in trying to optimize the array layout. Hence, we just avoid the folding of constant dimensions of size zero. llvm-svn: 295415
* [tests] Fix some misspellings [NFC]Tobias Grosser2017-02-161-1/+1
| | | | llvm-svn: 295361
* [ScopInfo] Bound the number of disjuncts in contextTobias Grosser2017-02-161-0/+207
| | | | | | | | | Before this change wrapping range metadata resulted in exponential growth of the context, which made context construction of large scops very slow. Instead, we now just do not model the range information precisely, in case the number of disjuncts in the context has already reached a certain limit. llvm-svn: 295360
* [ScopInfo] Always derive upper and lower bounds for parametersTobias Grosser2017-02-161-1/+1
| | | | | | | | | | | | | | | | | | | | Commit r230230 introduced the use of range metadata to derive bounds for parameters, instead of just looking at the type of the parameter. As part of this commit support for wrapping ranges was added, where the lower bound of a parameter is larger than the upper bound: { 255 < p || p < 0 } However, at the same time, for wrapping ranges support for adding bounds given by the size of the containing type has acidentally been dropped. As a result, the range of the parameters was not guaranteed to be bounded any more. This change makes sure we always add the bounds given by the size of the type and then additionally add bounds based on signed wrapping, if available. For a parameter p with a type size of 32 bit, the valid range is then: { -2147483648 <= p <= 2147483647 and (255 < p or p < 0) } llvm-svn: 295349
* Do not use wrapping ranges to bound non-affine accessesTobias Grosser2017-02-1210-11/+52
| | | | | | | | | | | | | When deriving the range of valid values of a scalar evolution expression might be a range [12, 8), where the upper bound is smaller than the lower bound and where the range is expected to possibly wrap around. We theoretically could model such a range as a union of two non-wrapping ranges, but do not do this as of yet. Instead, we just do not derive any bounds. Before this change, we could have obtained bounds where the maximal possible value is strictly smaller than the minimal possible value, which is incorrect and also caused assertions during scop modeling. llvm-svn: 294891
* Check reduction dependencies in case of the matrix multiplication optimizationRoman Gareev2017-02-111-0/+365
| | | | | | | | | | | | | | To determine parameters of the matrix multiplication, we check RAW dependencies that can be expressed using only reduction dependencies. Consequently, we should check the reduction dependencies, if this is the case. Reviewed-by: Tobias Grosser <tobias@grosser.es>, Sven Verdoolaege <skimo-polly@kotnet.org> Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D29814 llvm-svn: 294836
* Use the size of the widest type of the matrix multiplication operandsRoman Gareev2017-02-112-0/+282
| | | | | | | | | | | | | The size of the operands type is the one of the parameters required to determine the BLIS micro-kernel. We get the size of the widest type of the matrix multiplication operands in case there are several different types. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D29269 llvm-svn: 294828
* [IRBuilder] Extract base pointers directly from ScopArrayTobias Grosser2017-02-091-3/+3
| | | | | | | | | | | | | Instead of iterating over statements and their memory accesses to extract the set of available base pointers, just directly iterate over all ScopArray objects. This reflects more the actual intend of the code: collect all arrays (and their base pointers) to emit alias information that specifies that accesses to different arrays cannot alias. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294574
* [FIX] Disable the problematic run linesRoman Gareev2017-02-092-19/+19
| | | | | | | | There are problems with using the machine information to derive the precise vector size on polly-amd64-linux and polly-arm-linux. We temporarily disable the problematic run lines. llvm-svn: 294571
* [FIX] Specify the CPU to overwrite the machine info and set a fixed vectorRoman Gareev2017-02-092-2/+3
| | | | | | size. llvm-svn: 294569
* [IslAst] Print the ScopArray name to mark reductionsTobias Grosser2017-02-095-8/+8
| | | | | | | | | | | | | Before this change we used the name of the base pointer to mark reductions. This is imprecise as the canonical reference is the ScopArray itself and not the basepointer of a reduction. Using the base pointer of reductions is problematic in cases where a single ScopArray is referenced through two different base pointers. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294568
* Isolate a set of partial tile prefixes in case of the matrix multiplicationRoman Gareev2017-02-092-0/+355
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | optimization Isolate a set of partial tile prefixes to allow hoisting and sinking out of the unrolled innermost loops produced by the optimization of the matrix multiplication. In case it cannot be proved that the number of loop iterations can be evenly divided by tile sizes and we tile and unroll the point loop, the isl generates conditional expressions. Subsequently, the conditional expressions can prevent stores and loads of the unrolled loops from being sunk and hoisted. The patch isolates a set of partial tile prefixes, which have exactly Mr x Nr iterations of the two innermost loops, the result of the loop tiling performed by the matrix multiplication optimization, where Mr and Mr are parameters of the micro-kernel. This helps to get rid of the conditional expressions of the unrolled innermost loops. Probably this approach can be replaced with padding in future. In case of, for example, the gemm from Polybench/C 3.2 and parametric loop bounds, it helps to increase the performance from 7.98 GFlops (27.71% of theoretical peak) to 21.47 GFlops (74.57% of theoretical peak). Hence, we get the same performance as in case of scalar loops bounds. It also cause compile time regression. The compile-time is increased from 0.795 seconds to 0.837 seconds in case of scalar loops bounds and from 1.222 seconds to 1.490 seconds in case of parametric loops bounds. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D29244 llvm-svn: 294564
* [NFC] Make ScheduleTreeOptimizer::optimizeBand return a schedule node optimizedRoman Gareev2017-02-082-6/+0
| | | | | | | | | | | | | | with optimizeMatMulPattern This patch makes ScheduleTreeOptimizer::optimizeBand return a schedule node optimized with optimizeMatMulPattern. Otherwise, it could not use the isolate option, because standardBandOpts could try to tile a band node with anchored subtree and get the error, since the use of the isolate option causes any tree containing the node to be considered anchored. Furthermore, it is not intended to apply standard optimizations, when the matrix multiplication has been detected. llvm-svn: 294444
* A new algorithm for identification of a SCoP statement that implement a matrixRoman Gareev2017-02-023-221/+183
| | | | | | | | | | | | | | | | | | | | multiplication The current identification of a SCoP statement that implement a matrix multiplication does not help to identify different permutations of loops that contain it and check for dependencies, which can prevent it from being optimized. It also requires external determination of the operands of the matrix multiplication. This patch contains the implementation of a new algorithm that helps to avoid these issues. It also modifies the test cases that generate matrix multiplications with linearized accesses, because the new algorithm does not support them. Reviewed-by: Michael Kruse <llvm@meinersbur.de>, Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28357 llvm-svn: 293890
* Add forgotten test case for r293169Tobias Grosser2017-01-281-0/+26
| | | | llvm-svn: 293383
* ScopDetectionDiagnostics: Also emit diagnostics in case no debug info is ↵Tobias Grosser2017-01-261-0/+26
| | | | | | | | available In this case, we just use the start of the scop as the debug location. llvm-svn: 293165
* BlockGenerator: Do not redundantly reload from PHI-allocas in non-affine stmtsTobias Grosser2017-01-192-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this change we created an additional reload in the copy of the incoming block of a PHI node to reload the incoming value, even though the necessary value has already been made available by the normally generated scalar loads. In this change, we drop the code that generates this redundant reload and instead just reuse the scalar value already available. Besides making the generated code slightly cleaner, this change also makes sure that scalar loads go through the normal logic, which means they can be remapped (e.g. to array slots) and corresponding code is generated to load from the remapped location. Without this change, the original scalar load at the beginning of the non-affine region would have been remapped, but the redundant scalar load would continue to load from the old PHI slot location. It might be possible to further simplify the code in addOperandToPHI, but this would not only mean to pull out getNewValue, but to also change the insertion point update logic. As this did not work when trying it the first time, this change is likely not trivial. To not introduce bugs last minute, we postpone further simplications to a subsequent commit. We also document the current behavior a little bit better. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D28892 llvm-svn: 292486
* Improve test coverage in test/Isl/CodeGen/loop_partially_in_scop.ll [NFC]Tobias Grosser2017-01-191-20/+37
| | | | | | | | | | | | | | We rename the test case with -metarenamer to make the variable names easier to read and add additional check lines that verify the code we currently generate for PHI nodes. This code is interesting as it contains a PHI node in a non-affine sub-region, where some incoming blocks are within the non-affine sub-region and others are outside of the non-affine subregion. As can be seen in the check lines we currently load the PHI-node value twice. This commit documents this behavior. In a subsequent patch we will try to improve this. llvm-svn: 292470
* Relax assert when setting access functions with invariant base pointersTobias Grosser2017-01-171-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Instead of forbidding such access functions completely, we verify that their base pointer has been hoisted and only assert in case the base pointer was not hoisted. I was trying for a little while to get a test case that ensures the assert is correctly fired in case of invariant load hoisting being disabled, but I could not find a good way to do so, as llvm-lit immediately aborts if a command yields a non-zero return value. As we do not generally test our asserts, not having a test case here seems OK. This resolves http://llvm.org/PR31494 Suggested-by: Michael Kruse <llvm@meinersbur.de> Reviewers: efriedma, jdoerfert, Meinersbur, gareevroman, sebpop, zinob, huihuiz, pollydev Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D28798 llvm-svn: 292213
* test: harden test case to fail even in non-asserts buildTobias Grosser2017-01-171-0/+2
| | | | | | | The original test case was added in r292147. Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 292202
* Add test showing the update of access functions with in-scop defined base ptrsTobias Grosser2017-01-162-0/+95
| | | | | | | | | This feature is currently not supported and an explicit assert to prevent the introduction of such accesses has been added in r282893. This test case allows to reproduce the assert (and without the assert the miscompile) added in r282893. It will help when adding such support at some point. llvm-svn: 292147
* Un-XFAIL test case after half support was added to PTX backend in r291956Tobias Grosser2017-01-161-2/+0
| | | | llvm-svn: 292124
* ScheduleOptimizer: Allow to set register width in command lineTobias Grosser2017-01-143-4/+35
| | | | | | | We use this option to set a fixed register width in our test cases to make sure the results are identical accross platforms. llvm-svn: 292002
* Update tests to more precise analysis results in LLVM coreTobias Grosser2017-01-111-2/+2
| | | | | | | LLVM's range analysis became a little tighter, which means Polly can derive tighter bounds as well. llvm-svn: 291718
* Update to isl-0.18-43-g0b4256fTobias Grosser2016-12-311-1/+1
| | | | | | Even more isl coalesce changes. llvm-svn: 290783
* Specify the default values of the cache parametersRoman Gareev2016-12-253-4/+4
| | | | | | | | | | | | | | | | | If the parameters of the target cache (i.e., cache level sizes, cache level associativities) are not specified or have wrong values, we use ones for parameters of the macro-kernel and do not perform data-layout optimizations of the matrix multiplication. In this patch we specify the default values of the cache parameters to be able to apply the pattern matching optimizations even in this case. Since there is no typical values of this parameters, we use the parameters of Intel Core i7-3820 SandyBridge that also help to attain the high-performance on IBM POWER System S822 and IBM Power 730 Express server. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28090 llvm-svn: 290518
* ScheduleOptimizer: Fix spelling of option '-polly-target-throughput-vector-fma'Tobias Grosser2016-12-233-4/+4
| | | | | | througput -> throughput llvm-svn: 290418
* Change the determination of parameters of macro-kernelRoman Gareev2016-12-213-101/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Typically processor architectures do not include an L3 cache, which means that Nc, the parameter of the micro-kernel, is, for all practical purposes, redundant ([1]). However, its small values can cause the redundant packing of the same elements of the matrix A, the first operand of the matrix multiplication. At the same time, big values of the parameter Nc can cause segmentation faults in case the available stack is exceeded. This patch adds an option to specify the parameter Nc as a multiple of the parameter of the micro-kernel Nr. In case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=8 it helps to improve the performance from 11.303 GFlops/sec (39,247% of theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak). Refs.: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28019 llvm-svn: 290256
* [Polly] Use three-dimensional arrays to store packed operands of the matrixRoman Gareev2016-12-211-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | multiplication Previously we had two-dimensional accesses to store packed operands of the matrix multiplication for the sake of simplicity of the packed arrays. However, addition of the third dimension helps to simplify the corresponding memory access, reduce the execution time of isl operations applied to it, and consequently reduce the compile-time of Polly. For example, in case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=7 it helps to reduce the compile-time from about 361.456 seconds to about 0.816 seconds. Reviewed-by: Michael Kruse <llvm@meinersbur.de>, Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D27878 llvm-svn: 290251
OpenPOWER on IntegriCloud