summaryrefslogtreecommitdiffstats
path: root/polly/lib/CodeGen/PPCGCodeGeneration.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Update to recent formatting changesTobias Grosser2017-02-011-6/+4
| | | | llvm-svn: 293756
* [Polly] [BlockGenerator] Unify ScalarMap and PhiOpsMapTobias Grosser2017-01-281-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Instead of keeping two separate maps from Value to Allocas, one for MemoryType::Value and the other for MemoryType::PHI, we introduce a single map from ScopArrayInfo to the corresponding Alloca. This change is intended, both as a general simplification and cleanup, but also to reduce our use of MemoryAccess::getBaseAddr(). Moving away from using getBaseAddr() makes sure we have only a single place where the array (and its base pointer) for which we generate code for is specified, which means we can more easily introduce new access functions that use a different ScopArrayInfo as base. We already today experiment with modifiable access functions, so this change does not address a specific bug, but it just reduces the scope one needs to reason about. Another motivation for this patch is https://reviews.llvm.org/D28518, where memory accesses with different base pointers could possibly be mapped to a single ScopArrayInfo object. Such a mapping is currently not possible, as we currently generate alloca instructions according to the base addresses of the memory accesses, not according to the ScopArrayInfo object they belong to. By making allocas ScopArrayInfo specific, a mapping to a single ScopArrayInfo object will automatically mean that the same stack slot is used for these arrays. For D28518 this is not a problem, as only MemoryType::Array objects are mapping, but resolving this inconsistency will hopefully avoid confusion. llvm-svn: 293374
* Use typed enums to model MemoryKind and move MemoryKind out of ScopArrayInfoTobias Grosser2017-01-141-4/+4
| | | | | | | | | | | | | | | | | | To benefit of the type safety guarantees of C++11 typed enums, which would have caught the type mismatch fixed in r291960, we make MemoryKind a typed enum. This change also allows us to drop the 'MK_' prefix and to instead use the more descriptive full name of the enum as prefix. To reduce the amount of typing needed, we use this opportunity to move MemoryKind from ScopArrayInfo to a global scope, which means the ScopArrayInfo:: prefix is not needed. This move also makes historically sense. In the beginning of Polly we had different MemoryKind enums in both MemoryAccess and ScopArrayInfo, which were later canonicalized to one. During this canonicalization we just choose the enum in ScopArrayInfo, but did not consider to move this shared enum to global scope. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D28090 llvm-svn: 292030
* Update to recent clang-format changesTobias Grosser2017-01-121-2/+3
| | | | llvm-svn: 291810
* Update for clang-format change in r288119Tobias Grosser2016-11-291-2/+3
| | | | llvm-svn: 288134
* [Polly CodeGen] Break critical edge from RTC to original loop.Eli Friedman2016-11-021-5/+7
| | | | | | | | | | | | | | | This makes polly generate a CFG which is closer to what we want in LLVM IR, with a loop preheader for the original loop. This is just a cleanup, but it exposes some fragile assumptions. I'm not completely happy with the changes related to expandCodeFor; RTCBB->getTerminator() is basically a random insertion point which happens to work due to the way we generate runtime checks. I'm not sure what the right answer looks like, though. Differential Revision: https://reviews.llvm.org/D26053 llvm-svn: 285864
* GPGPU: Do not run mostly sequential kernels in GPUTobias Grosser2016-09-181-0/+19
| | | | | | | | In case sequential kernels are found deeper in the loop tree than any parallel kernel, the overall scop is probably mostly sequential. Hence, run it on the CPU. llvm-svn: 281849
* GPGPU: Dynamically ensure 'sufficient compute'Tobias Grosser2016-09-181-0/+110
| | | | | | | | | | | | | Offloading to a GPU is only beneficial if there is a sufficient amount of compute that can be accelerated. Many kernels just have a very small number of dynamic compute, which means GPU acceleration is not beneficial. We compute at run-time an approximation of how many dynamic instructions will be executed and fall back to CPU code in case this number is not sufficiently large. To keep the run-time checking code simple, we over-approximate the number of instructions executed in each statement by computing the volume of the rectangular hull of its iteration space. llvm-svn: 281848
* GPGPU: Store back non-read-only scalarsTobias Grosser2016-09-171-2/+55
| | | | | | | | | | | | | | | | We may generate GPU kernels that store into scalars in case we run some sequential code on the GPU because the remaining data is expected to already be on the GPU. For these kernels it is important to not keep the scalar values in thread-local registers, but to store them back to the corresponding device memory objects that backs them up. We currently only store scalars back at the end of a kernel. This is only correct if precisely one thread is executed. In case more than one thread may be run, we currently invalidate the scop. To support such cases correctly, we would need to always load and store back from a corresponding global memory slot instead of a thread-local alloca slot. llvm-svn: 281838
* GPGPU: Detect read-only scalar arrays ...Tobias Grosser2016-09-171-14/+32
| | | | | | and pass these by value rather than by reference. llvm-svn: 281837
* GPGPU: Do not assume arrays start at 0Tobias Grosser2016-09-151-0/+86
| | | | | | | | | | | | | | | | | | Our alias checks precisely check that the minimal and maximal accessed elements do not overlap in a kernel. Hence, we must ensure that our host <-> device transfers do not touch additional memory locations that are not covered in the alias check. To ensure this, we make sure that the data we copy for a given array is only the data from the smallest element accessed to the largest element accessed. We also adjust the size of the array according to the offset at which the array is actually accessed. An interesting result of this is: In case array are accessed with negative subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to cover the full array. This is important as such code indeed exists in the wild. llvm-svn: 281611
* GPGPU: Use const_cast to avoid compiler warning [NFC]Tobias Grosser2016-09-131-1/+1
| | | | llvm-svn: 281333
* GPGPU: Allow region statementsTobias Grosser2016-09-131-1/+1
| | | | llvm-svn: 281305
* GPGPU: Extend types when array sizes have smaller typesTobias Grosser2016-09-131-0/+2
| | | | | | This prevents a compiler crash. llvm-svn: 281303
* Store the size of the outermost dimension in case of newly created arrays ↵Roman Gareev2016-09-121-0/+2
| | | | | | | | | | | | | that require memory allocation. We do not need the size of the outermost dimension in most cases, but if we allocate memory for newly created arrays, that size is needed. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D23991 llvm-svn: 281234
* GPGPU: Bail out gracefully in case of invalid IRTobias Grosser2016-09-121-4/+13
| | | | | | | | | | | Instead of aborting, we now bail out gracefully in case the kernel IR we generate is invalid. This can currently happen in case the SCoP stores pointer values, which we model as arrays, as data values into other arrays. In this case, the original pointer value is not available on the device and can consequently not be stored. As detecting this ahead of time is not so easy, we detect these situations after the invalid IR has been generated and bail out. llvm-svn: 281193
* GPGPU: Do not fail in case of arrays never accessedTobias Grosser2016-09-111-11/+21
| | | | | | | | | If these arrays have never been accessed we failed to derive an upper bound of the accesses and consequently a size for the outermost dimension. We now explicitly check for empty access sets and then just use zero as size for the outermost dimension. llvm-svn: 281165
* [GPGPU] Ensure arrays where only parts are modified are copied to GPUTobias Grosser2016-08-101-4/+62
| | | | | | | | | | | | | To do so we change the way array exents are computed. Instead of the precise set of memory locations accessed, we now compute the extent as the range between minimal and maximal address in the first dimension and the full extent defined by the sizes of the inner array dimensions. We also move the computation of the may_persist region after the construction of the arrays, as it relies on array information. Without arrays being constructed no useful information is computed at all. llvm-svn: 278212
* [GPGPU] Support PHI nodes used in GPU kernelTobias Grosser2016-08-091-10/+14
| | | | | | | | | | Ensure the right scalar allocations are used as the host location of data transfers. For the device code, we clear the allocation cache before device code generation to be able to generate new device-specific allocation and we need to make sure to add back the old host allocations as soon as the device code generation is finished. llvm-svn: 278126
* [GPGPU] Use separate basic block for GPU initialization codeTobias Grosser2016-08-091-0/+6
| | | | | | | | | | | This increases the readability of the IR and also clarifies that the GPU inititialization is executed _after_ the scalar initialization which needs to before the code of the transformed scop is executed. Besides increased readability, the IR should not change. Specifically, I do not expect any changes in program semantics due to this patch. llvm-svn: 278125
* [GPGPU] Pass parameters always by using their own typeTobias Grosser2016-08-091-2/+6
| | | | llvm-svn: 278100
* [GPGPU] Support Values referenced from both isl expr and llvm instructionsTobias Grosser2016-08-081-0/+2
| | | | | | | | | | | When adding code that avoids to pass values used in isl expressions and LLVM instructions twice, we forgot to make single variable passed to the kernel available in the ValueMap that makes it usable for instructions that are not replaced with isl ast expressions. This change adds the variable that is passed to the kernel to the ValueMap to ensure it is available for such use cases as well. llvm-svn: 278039
* [GPGPU] Create code to verify run-time conditionsTobias Grosser2016-08-081-1/+9
| | | | llvm-svn: 278026
* GPGPU: Sort dimension sizes of multi-dimensional shared memory arrays correctlyTobias Grosser2016-08-051-1/+7
| | | | | | | | | | Before this commit we generated the array type in reverse order and we also added the outermost dimension size to the new array declaration, which is incorrect as Polly additionally assumed an additional unsized outermost dimension, such that we had an off-by-one error in the linearization of access expressions. llvm-svn: 277802
* GPGPU: Add cuda annotations to specify maximal number of threads per blockTobias Grosser2016-08-051-3/+40
| | | | | | | | These annotations ensure that the NVIDIA PTX assembler limits the number of registers used such that we can be certain the resulting kernel can be executed for the number of threads in a thread block that we are planning to use. llvm-svn: 277799
* GPGPU: Support scalars that are mapped to shared memoryTobias Grosser2016-08-041-9/+5
| | | | llvm-svn: 277726
* GPGPU: Disable verbose debug outputTobias Grosser2016-08-041-0/+1
| | | | llvm-svn: 277724
* Remove leftover debug outputTobias Grosser2016-08-041-1/+0
| | | | llvm-svn: 277723
* GPGPU: Add private memory supportTobias Grosser2016-08-041-14/+25
| | | | llvm-svn: 277722
* GPGPU: Add support for shared memoryTobias Grosser2016-08-041-5/+90
| | | | llvm-svn: 277721
* GPGPU: Handle scalar array referencesTobias Grosser2016-08-041-1/+34
| | | | | | | Pass the content of scalar array references to the alloca on the kernel side and do not pass them additional as normal LLVM scalar value. llvm-svn: 277699
* GPGPU: Pass subtree values correctly to the kernelTobias Grosser2016-08-041-6/+22
| | | | llvm-svn: 277697
* GPGPU: Mark kernel functions as polly.skipTobias Grosser2016-08-031-0/+3
| | | | | | | | | Otherwise, we would try to re-optimize them with Polly-ACC and possibly even generate kernels that try to offload themselves, which does not work as the GPURuntime is not available on the accelerator and also does not make any sense. llvm-svn: 277589
* Extend the jscop interface to allow the user to declare new arrays and to ↵Roman Gareev2016-07-301-7/+4
| | | | | | | | | | | | | | | | reference these arrays from access expressions Extend the jscop interface to allow the user to export arrays. It is required that already existing arrays of the list of arrays correspond to arrays of the SCoP. Each array that is appended to the list will be newly created. Furthermore, we allow the user to modify access expressions to reference any array in case it has the same element type. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D22828 llvm-svn: 277263
* GPGPU: Pass context parameters to GPU kernelTobias Grosser2016-07-281-0/+18
| | | | llvm-svn: 276963
* GPGPU: Pass host iterators to kernelTobias Grosser2016-07-281-0/+18
| | | | llvm-svn: 276962
* GPGPU: use current 'Index' to find slot in parameter arrayTobias Grosser2016-07-281-2/+2
| | | | | | | | Before this change we used the array index, which would result in us accessing the parameter array out-of-bounds. This bug was visible for test cases where not all arrays in a scop are passed to a given kernel. llvm-svn: 276961
* GPGPU: Generate kernel parameter allocation with right sizeTobias Grosser2016-07-281-1/+2
| | | | | | Before this change we miscounted the number of function parameters. llvm-svn: 276960
* GPGPU: Add basic support for kernel launchesTobias Grosser2016-07-271-0/+171
| | | | llvm-svn: 276863
* GPGPU: Load GPU kernelsTobias Grosser2016-07-251-3/+60
| | | | | | | We embed the PTX code into the host IR as a global variable and compile it at run-time into a GPU kernel. llvm-svn: 276645
* GPGPU: Emit data-transfer codeTobias Grosser2016-07-251-25/+139
| | | | | | | Also factor out getArraySize() to avoid code dupliciation and reorder some function arguments to indicate the direction into which data is transferred. llvm-svn: 276636
* GPGPU: Complete code to allocate and free device arraysTobias Grosser2016-07-251-4/+45
| | | | | | | At the beginning of each SCoP, we allocate device arrays for all arrays used on the GPU and we free such arrays after the SCoP has been executed. llvm-svn: 276635
* GPGPU: initialize GPU context and simplify the corresponding GPURuntime ↵Tobias Grosser2016-07-251-0/+116
| | | | | | | | | interface. There is no need to expose the selected device at the moment. We also pass back pointers as return values, as this simplifies the interface. llvm-svn: 276623
* IslNodeBuilder: Make finalize() virtualTobias Grosser2016-07-251-1/+1
| | | | | | | | This allows the finalization routine of the IslNodeBuilder to be overwritten by derived classes. Being here, we also drop the unnecessary 'Scop' postfix and the unnecessary 'Scop' parameter. llvm-svn: 276622
* GPGPU: Optimize kernel IR before generating assembly codeTobias Grosser2016-07-241-0/+9
| | | | | | | | We optimize the kernel _after_ dumping the IR we generate to make the IR we dump easier readable and independent of possible changes in the general purpose LLVM optimizers. llvm-svn: 276551
* GPGPU: Verify kernel IR before generating assemblyTobias Grosser2016-07-241-0/+5
| | | | llvm-svn: 276550
* GPGPU: Generate PTX assembly code for the kernel modulesTobias Grosser2016-07-221-0/+123
| | | | | | | | | | | | | | | | | Run the NVPTX backend over the GPUModule IR and write the resulting assembly code in a string. To work correctly, it is important to invalidate analysis results that still reference the IR in the kernel module. Hence, this change clears all references to dominators, loop info, and scalar evolution. Finally, the NVPTX backend has troubles to generate code for various special floating point types (not surprising), but also for uncommon integer types. This commit does not resolve these issues, but pulls out problematic test cases into separate files to XFAIL them individually and resolve them in future (not immediate) changes one by one. llvm-svn: 276396
* GPGPU: generate code for ScopStatementsTobias Grosser2016-07-211-15/+202
| | | | | | | | | | | | | | | This change introduces the actual compute code in the GPU kernels. To ensure all values referenced from the statements in the GPU kernel are indeed available we scan all ScopStmts in the GPU kernel for references to llvm::Values that are not yet covered by already modeled outer loop iterators, parameters, or array base pointers and also pass these additional llvm::Values to the GPU kernel. For arrays used in the GPU kernel we introduce a new ScopArrayInfo object, which is referenced by the newly generated access functions within the GPU kernel and which is used to help with code generation. llvm-svn: 276270
* GPGPU: Bail out of scops with hoisted invariant loadsTobias Grosser2016-07-191-0/+4
| | | | | | | This is currently not supported and will only be added later. Also update the test cases to ensure no invariant code hoisting is applied. llvm-svn: 275987
* GPGPU: Emit in-kernel synchronization statementsTobias Grosser2016-07-191-0/+49
| | | | | | | We use this opportunity to further classify the different user statements that can arise and add TODOs for the ones not yet implemented. llvm-svn: 275957
OpenPOWER on IntegriCloud