summaryrefslogtreecommitdiffstats
path: root/polly/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* Relax assert when setting access functions with invariant base pointersTobias Grosser2017-01-171-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Instead of forbidding such access functions completely, we verify that their base pointer has been hoisted and only assert in case the base pointer was not hoisted. I was trying for a little while to get a test case that ensures the assert is correctly fired in case of invariant load hoisting being disabled, but I could not find a good way to do so, as llvm-lit immediately aborts if a command yields a non-zero return value. As we do not generally test our asserts, not having a test case here seems OK. This resolves http://llvm.org/PR31494 Suggested-by: Michael Kruse <llvm@meinersbur.de> Reviewers: efriedma, jdoerfert, Meinersbur, gareevroman, sebpop, zinob, huihuiz, pollydev Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D28798 llvm-svn: 292213
* Adjust formatting to commit r292110 [NFC]Tobias Grosser2017-01-162-9/+12
| | | | llvm-svn: 292123
* Use typed enums to model MemoryKind and move MemoryKind out of ScopArrayInfoTobias Grosser2017-01-142-6/+6
| | | | | | | | | | | | | | | | | | To benefit of the type safety guarantees of C++11 typed enums, which would have caught the type mismatch fixed in r291960, we make MemoryKind a typed enum. This change also allows us to drop the 'MK_' prefix and to instead use the more descriptive full name of the enum as prefix. To reduce the amount of typing needed, we use this opportunity to move MemoryKind from ScopArrayInfo to a global scope, which means the ScopArrayInfo:: prefix is not needed. This move also makes historically sense. In the beginning of Polly we had different MemoryKind enums in both MemoryAccess and ScopArrayInfo, which were later canonicalized to one. During this canonicalization we just choose the enum in ScopArrayInfo, but did not consider to move this shared enum to global scope. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D28090 llvm-svn: 292030
* Update to recent clang-format changesTobias Grosser2017-01-121-2/+3
| | | | llvm-svn: 291810
* Align newly created arrays to the first level cache line boundaryRoman Gareev2016-12-211-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Aligning data to cache lines boundaries helps to avoid overheads related to an access to it ([1]). This patch aligns newly created arrays and adds an option to specify the first level cache line size. By default we use 64 bytes, which is a typical cache-line size ([2]). In case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=8 it helps to improve the performance from 11.303 GFlops/sec (39,247% of theoretical peak) to 12.63 GFlops/sec (43,8542% of theoretical peak). Refs.: [1] - http://www.alexonlinux.com/aligned-vs-unaligned-memory-access [2] - http://igoro.com/archive/gallery-of-processor-cache-effects/ Differential Revision: https://reviews.llvm.org/D28020 Reviewed-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 290253
* Fix clang-formatTobias Grosser2016-12-191-2/+5
| | | | llvm-svn: 290103
* Add newline at end of debug printTobias Grosser2016-12-011-5/+5
| | | | | | | | In '[DBG] Allow to emit the RTC value at runtime' the diagnostics were printed without a newline at the end of each diagnostic. We add such a newline to improve readability. llvm-svn: 288323
* canSynthesize: Remove unused argument LI. NFC.Michael Kruse2016-11-292-3/+2
| | | | | | | The helper function polly::canSynthesize() does not directly use the LoopInfo analysis, hence remove it from its argument list. llvm-svn: 288144
* Update for clang-format change in r288119Tobias Grosser2016-11-292-6/+8
| | | | llvm-svn: 288134
* [CodeGen] Add flag to code-generate most memory access expressionsTobias Grosser2016-11-221-2/+25
| | | | | | | | | | | | | | Introduce the new flag -polly-codegen-generate-expressions which forces Polly to code generate AST expressions instead of using our SCEV based access expression generation even for cases where the original memory access relation was not changed and the SCEV based access expression could be code generated without any issue. This is an experimental option for better testing the isl ast expression generation. The default behavior of Polly remains unchanged. We also exclude a couple of cases for which the AST expression is not yet working. llvm-svn: 287694
* [NFC] Adjust naming scheme of statistic variablesJohannes Doerfert2016-11-181-2/+2
| | | | | Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 287347
* [DBG] Collect statistics about actually versioned SCoPsJohannes Doerfert2016-11-171-0/+8
| | | | llvm-svn: 287267
* [DBG] Allow to emit the RTC value at runtimeJohannes Doerfert2016-11-171-0/+16
| | | | | | | | The new command line flag "polly-codegen-emit-rtc-print" can be used to place a "printf" in the generated code that will print the RTC value and the overflow state. llvm-svn: 287265
* IslAst: always use the context during ast generationTobias Grosser2016-11-101-1/+1
| | | | | | | | | | | | | | | | | | | Providing the context to the ast generator allows for additional simplifcations and -- more importantly -- allows to generate loops with only partially bounded domains, assuming the domains are bounded for all parameter configurations that are valid as defined by the context. This change fixes the crash reported in http://llvm.org/PR30956 The original reason why we did not include the context when generating an AST was that CLooG and later isl used to sometimes transfer some of the constraints that bound the size of parameters from the context into the generated AST. This resulted in operations with very large constants, which sometimes introduced problematic integer overflows. The latest versions of the isl AST generator are careful to not introduce such constants. Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286442
* IslNodeBuilder: Ensure newly generated memory accesses are well-definedTobias Grosser2016-11-051-0/+18
| | | | | | | Add some additional asserts that ensure newly code-generated memory accesses are defined on all domain and schedule domain instances. llvm-svn: 286050
* [Polly CodeGen] Break critical edge from RTC to original loop.Eli Friedman2016-11-026-35/+51
| | | | | | | | | | | | | | | This makes polly generate a CFG which is closer to what we want in LLVM IR, with a loop preheader for the original loop. This is just a cleanup, but it exposes some fragile assumptions. I'm not completely happy with the changes related to expandCodeFor; RTCBB->getTerminator() is basically a random insertion point which happens to work due to the way we generate runtime checks. I'm not sure what the right answer looks like, though. Differential Revision: https://reviews.llvm.org/D26053 llvm-svn: 285864
* [polly] Fix non-determinism in polly BlockGeneratorsMandeep Singh Grang2016-10-191-2/+2
| | | | | | | | | | | | Summary: Iterating over SeenBlocks which is a SmallPtrSet results in non-determinism in codegen Reviewers: jdoerfert, zinob, grosser Tags: #polly Differential Revision: https://reviews.llvm.org/D25778 llvm-svn: 284622
* Handle multi-dimensional invariant load.Eli Friedman2016-10-171-0/+19
| | | | | | | If the address of a load depends on another load, make sure to emit the loads in the right order. llvm-svn: 284426
* [ScopInfo/CodeGen] ExitPHI reads are implicit.Michael Kruse2016-10-121-1/+1
| | | | | | | | | | | | | | | | | Under some conditions MK_Value read accessed where converted to MK_ExitPHI read accessed. This is unexpected because MK_ExitPHI read accesses are implicit after the scop execution. This behaviour was introduced in r265261, which fixed a failed assertion/crash in CodeGen. Instead, we fix this failure in CodeGen itself. createExitPHINodeMerges(), despite its name, also handles accesses of kind MK_Value, only to skip them because they access values that are usually not PHI nodes in the SCoP region's exit block. Except in the situation observed in r265261. Do not convert value accessed to ExitPHI accesses and do not handle value accesses like ExitPHI accessed in CodeGen anymore. llvm-svn: 284023
* Turn cl::values() (for enum) from a vararg function to using C++ variadic ↵Mehdi Amini2016-10-081-2/+1
| | | | | | | | | | | | | | | template The core of the change is supposed to be NFC, however it also fixes what I believe was an undefined behavior when calling: va_start(ValueArgs, Desc); with Desc being a StringRef. Differential Revision: https://reviews.llvm.org/D25342 llvm-svn: 283671
* [CodeGen] Add assertion for indirect array index expression generation. NFC.Michael Kruse2016-09-301-0/+3
| | | | | | | | | | | Currently Polly cannot generate code for index expressions if the base pointer is computed within the scop. The base pointer must be generated as well, but there is no code that triggers that. Add an assertion to detect when this would occur and miscompile. The IR verifier should catch it as well. llvm-svn: 282893
* [CodeGen] Change 'Scalar' to 'Array' in method names. NFC.Michael Kruse2016-09-301-9/+9
| | | | | | | | | | | | | generateScalarLoad() and generateScalarStore() are used for explicit (MK_Array) memory accesses, therefore the method names were misleading. The names also were similar to generateScalarLoads() and generateScalarStores() (plural forms) which indeed handle scalar accesses. Presumbly, they were originally named to contrast VectorBlockGenerator::generateLoad(). Rename the two methods to generateArrayLoad(), respectively generateArrayStore(). llvm-svn: 282861
* [CodeGen] Add assertion for partial scalar accesses. NFC.Michael Kruse2016-09-301-0/+18
| | | | | | | The code generator always adds unconditional LoadInst and StoreInst, hence the MemoryAccess must be defined over all statement instances. llvm-svn: 282853
* GPGPU: Do not run mostly sequential kernels in GPUTobias Grosser2016-09-181-0/+19
| | | | | | | | In case sequential kernels are found deeper in the loop tree than any parallel kernel, the overall scop is probably mostly sequential. Hence, run it on the CPU. llvm-svn: 281849
* GPGPU: Dynamically ensure 'sufficient compute'Tobias Grosser2016-09-181-0/+110
| | | | | | | | | | | | | Offloading to a GPU is only beneficial if there is a sufficient amount of compute that can be accelerated. Many kernels just have a very small number of dynamic compute, which means GPU acceleration is not beneficial. We compute at run-time an approximation of how many dynamic instructions will be executed and fall back to CPU code in case this number is not sufficiently large. To keep the run-time checking code simple, we over-approximate the number of instructions executed in each statement by computing the volume of the rectangular hull of its iteration space. llvm-svn: 281848
* GPGPU: Store back non-read-only scalarsTobias Grosser2016-09-171-2/+55
| | | | | | | | | | | | | | | | We may generate GPU kernels that store into scalars in case we run some sequential code on the GPU because the remaining data is expected to already be on the GPU. For these kernels it is important to not keep the scalar values in thread-local registers, but to store them back to the corresponding device memory objects that backs them up. We currently only store scalars back at the end of a kernel. This is only correct if precisely one thread is executed. In case more than one thread may be run, we currently invalidate the scop. To support such cases correctly, we would need to always load and store back from a corresponding global memory slot instead of a thread-local alloca slot. llvm-svn: 281838
* GPGPU: Detect read-only scalar arrays ...Tobias Grosser2016-09-171-14/+32
| | | | | | and pass these by value rather than by reference. llvm-svn: 281837
* GPGPU: Do not assume arrays start at 0Tobias Grosser2016-09-151-0/+86
| | | | | | | | | | | | | | | | | | Our alias checks precisely check that the minimal and maximal accessed elements do not overlap in a kernel. Hence, we must ensure that our host <-> device transfers do not touch additional memory locations that are not covered in the alias check. To ensure this, we make sure that the data we copy for a given array is only the data from the smallest element accessed to the largest element accessed. We also adjust the size of the array according to the offset at which the array is actually accessed. An interesting result of this is: In case array are accessed with negative subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to cover the full array. This is important as such code indeed exists in the wild. llvm-svn: 281611
* Perform copying to created arrays according to the packing transformationRoman Gareev2016-09-144-10/+34
| | | | | | | | | | | | | | | | This is the fourth patch to apply the BLIS matmul optimization pattern on matmul kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf). BLIS implements gemm as three nested loops around a macro-kernel, plus two packing routines. The macro-kernel is implemented in terms of two additional loops around a micro-kernel. The micro-kernel is a loop around a rank-1 (i.e., outer product) update. In this change we perform copying to created arrays, which is the last step to implement the packing transformation. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D23260 llvm-svn: 281441
* GPGPU: Use const_cast to avoid compiler warning [NFC]Tobias Grosser2016-09-131-1/+1
| | | | llvm-svn: 281333
* GPGPU: Allow region statementsTobias Grosser2016-09-131-1/+1
| | | | llvm-svn: 281305
* GPGPU: Extend types when array sizes have smaller typesTobias Grosser2016-09-131-0/+2
| | | | | | This prevents a compiler crash. llvm-svn: 281303
* Store the size of the outermost dimension in case of newly created arrays ↵Roman Gareev2016-09-122-1/+7
| | | | | | | | | | | | | that require memory allocation. We do not need the size of the outermost dimension in most cases, but if we allocate memory for newly created arrays, that size is needed. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D23991 llvm-svn: 281234
* GPGPU: Bail out gracefully in case of invalid IRTobias Grosser2016-09-121-4/+13
| | | | | | | | | | | Instead of aborting, we now bail out gracefully in case the kernel IR we generate is invalid. This can currently happen in case the SCoP stores pointer values, which we model as arrays, as data values into other arrays. In this case, the original pointer value is not available on the device and can consequently not be stored. As detecting this ahead of time is not so easy, we detect these situations after the invalid IR has been generated and bail out. llvm-svn: 281193
* GPGPU: Do not fail in case of arrays never accessedTobias Grosser2016-09-111-11/+21
| | | | | | | | | If these arrays have never been accessed we failed to derive an upper bound of the accesses and consequently a size for the outermost dimension. We now explicitly check for empty access sets and then just use zero as size for the outermost dimension. llvm-svn: 281165
* IslNodeBuilder: Add missing __isl_take annotationTobias Grosser2016-09-091-2/+2
| | | | llvm-svn: 281034
* IslNodeBuilder: Add missing __isl_take annotationsTobias Grosser2016-09-081-2/+3
| | | | llvm-svn: 280936
* Drop '@brief' from doxygen commentsTobias Grosser2016-09-025-23/+23
| | | | | | | | LLVM's coding guideline suggests to not use @brief for one-sentence doxygen comments to improve readability. Switch this once and for all to ensure people do not copy @brief comments from other parts of Polly, when writing new code. llvm-svn: 280468
* Allow mapping scalar MemoryAccesses to array elements.Michael Kruse2016-09-011-19/+54
| | | | | | | | | | | | | | | | | | | | | | Change the code around setNewAccessRelation to allow to use a an existing array element for memory instead of an ad-hoc alloca. This facility will be used for DeLICM/DeGVN to convert scalar dependencies into regular ones. The changes necessary include: - Make the code generator use the implicit locations instead of the alloca ones. - A test case - Make the JScop importer accept changes of scalar accesses for that test case. - Adapt the MemoryAccess interface to the fact that the MemoryKind can change. They are named (get|is)OriginalXXX() to get the status of the memory access before any change by setNewAccessRelation() (some properties such as getIncoming() do not change even if the kind is changed and are still required). To get the modified properties, there is (get|is)LatestXXX(). The old accessors without Original|Latest become synonyms of the (get|is)OriginalXXX() to not make functional changes in unrelated code. Differential Revision: https://reviews.llvm.org/D23962 llvm-svn: 280408
* [BlockGenerator] Invalidate SCEV values for instructions in scopTobias Grosser2016-08-181-0/+14
| | | | | | | | | | | | | | | We already invalidated a couple of critical values earlier on, but we now invalidate all instructions contained in a scop after the scop has been code generated. This is necessary as later scops may otherwise obtain SCEV expressions that reference values in the earlier scop that before dominated the later scop, but which had been moved into the conditional branch and consequently do not dominate the later scop any more. If these very values are then used during code generation of the later scop, we generate used that are dominated by the values they use. This fixes: http://llvm.org/PR28984 llvm-svn: 279047
* [GPGPU] Ensure arrays where only parts are modified are copied to GPUTobias Grosser2016-08-101-4/+62
| | | | | | | | | | | | | To do so we change the way array exents are computed. Instead of the precise set of memory locations accessed, we now compute the extent as the range between minimal and maximal address in the first dimension and the full extent defined by the sizes of the inner array dimensions. We also move the computation of the may_persist region after the construction of the arrays, as it relies on array information. Without arrays being constructed no useful information is computed at all. llvm-svn: 278212
* [GPGPU] Support PHI nodes used in GPU kernelTobias Grosser2016-08-091-10/+14
| | | | | | | | | | Ensure the right scalar allocations are used as the host location of data transfers. For the device code, we clear the allocation cache before device code generation to be able to generate new device-specific allocation and we need to make sure to add back the old host allocations as soon as the device code generation is finished. llvm-svn: 278126
* [GPGPU] Use separate basic block for GPU initialization codeTobias Grosser2016-08-091-0/+6
| | | | | | | | | | | This increases the readability of the IR and also clarifies that the GPU inititialization is executed _after_ the scalar initialization which needs to before the code of the transformed scop is executed. Besides increased readability, the IR should not change. Specifically, I do not expect any changes in program semantics due to this patch. llvm-svn: 278125
* [BlockGenerator] Insert initializations at beginning of start blockTobias Grosser2016-08-091-1/+1
| | | | | | | | | | | | | | In case some code -- not guarded by control flow -- would be emitted directly in the start block, it may happen that this code would use uninitalized scalar values if the scalar initialization is only emitted at the end of the start block. This is not a problem today in normal Polly, as all statements are emitted in their own basic blocks, but Polly-ACC emits host-to-device copy statements into the start block. Additional Polly-ACC test coverage will be added in subsequent changes that improve the handling of PHI nodes in Polly-ACC. llvm-svn: 278124
* [BlockGenerator] Also eliminate dead code not originating from BBTobias Grosser2016-08-091-12/+9
| | | | | | | | | | | | | | | | | After having generated the code for a ScopStmt, we run a simple dead-code elimination that drops all instructions that are known to be and remain unused. Until this change, we only considered instructions for dead-code elimination, if they have a corresponding instruction in the original BB that belongs to ScopStmt. However, when generating code we do not only copy code from the BB belonging to a ScopStmt, but also generate code for operands referenced from BB. After this change, we now also considers code for dead code elimination, which does not have a corresponding instruction in BB. This fixes a bug in Polly-ACC where such dead-code referenced CPU code from within a GPU kernel, which is possible as we do not guarantee that all variables that are used in known-dead-code are moved to the GPU. llvm-svn: 278103
* [GPGPU] Pass parameters always by using their own typeTobias Grosser2016-08-091-2/+6
| | | | llvm-svn: 278100
* [GPGPU] Support Values referenced from both isl expr and llvm instructionsTobias Grosser2016-08-081-0/+2
| | | | | | | | | | | When adding code that avoids to pass values used in isl expressions and LLVM instructions twice, we forgot to make single variable passed to the kernel available in the ValueMap that makes it usable for instructions that are not replaced with isl ast expressions. This change adds the variable that is passed to the kernel to the ValueMap to ensure it is available for such use cases as well. llvm-svn: 278039
* [GPGPU] Create code to verify run-time conditionsTobias Grosser2016-08-081-1/+9
| | | | llvm-svn: 278026
* Fix compilation in 'asserts' modeTobias Grosser2016-08-081-1/+1
| | | | llvm-svn: 278025
* [IslNodeBuilder] Move run-time check generation to NodeBuilder [NFC]Tobias Grosser2016-08-082-22/+18
| | | | | | | This improves the structure of the code and allows us to reuse the runtime code generation in the PPCGCodeGeneration. llvm-svn: 278017
OpenPOWER on IntegriCloud