summaryrefslogtreecommitdiffstats
path: root/polly/lib/CodeGen/PPCGCodeGeneration.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [ScopInfo] Move Scop::getDomains to isl++ [NFC]Tobias Grosser2017-08-061-2/+3
| | | | llvm-svn: 310230
* [ScopInfo] Translate Scop::getParamSpace to isl++ [NFC]Tobias Grosser2017-08-061-9/+7
| | | | llvm-svn: 310224
* [ScopInfo] Translate Scop::getContext to isl++ [NFC]Tobias Grosser2017-08-061-9/+8
| | | | llvm-svn: 310221
* [ScopInfo] Translate Scop::getIdForParam to isl++ [NFC]Tobias Grosser2017-08-061-1/+1
| | | | llvm-svn: 310220
* [ScopInfo] Move get*Writes/getReads/getAccesses to isl++Tobias Grosser2017-08-061-4/+4
| | | | llvm-svn: 310219
* Move ScopInfo::getDomain(), getDomainSpace(), getDomainId() to isl++Tobias Grosser2017-08-061-4/+5
| | | | llvm-svn: 310209
* [GPGPU] Make sure managed arrays are prepared at the beginning of the scopTobias Grosser2017-08-061-32/+41
| | | | | | | | | | | | | | | Summary: This resolves some "instruction does not dominate use" errors, as we used to prepare the arrays at the location of the first kernel, which not necessarily dominated all other kernel calls. Reviewers: Meinersbur, bollu, singam-sanjay Subscribers: nemanjai, pollydev, llvm-commits, kbarton Differential Revision: https://reviews.llvm.org/D36372 llvm-svn: 310196
* [GPGPU] Rename all, not only the first libdevice functionTobias Grosser2017-08-061-2/+3
| | | | llvm-svn: 310194
* [Polly] [PPCGCodeGeneration] Deal with loops outside the Scop correctly in ↵Siddharth Bhat2017-08-061-11/+29
| | | | | | | | | | | | | | | PPCGCodeGeneration. A Scop with a loop outside it is not handled currently by PPCGCodeGeneration. The test case is such that the Scop has only one inner loop that is detected. This currently breaks codegen. The fix is to reuse the existing mechanism in `IslNodeBuilder` within `GPUNodeBuilder. Differential Revision: https://reviews.llvm.org/D36290 llvm-svn: 310193
* [PPCGCodeGeneration] [NFC] Log every location from which PPCGCodegen bails.Siddharth Bhat2017-08-041-5/+23
| | | | | | | | This is useful when trying to understand why no GPU code was produced. Differential Revision: https://reviews.llvm.org/D36318 llvm-svn: 310103
* Make sure that all parameter dimensions are set in scheduleTobias Grosser2017-08-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | Summary: In case the option -polly-ignore-parameter-bounds is set, not all parameters will be added to context and domains. This is useful to keep the size of the sets and maps we work with small. Unfortunately, for AST generation it is necessary to ensure all parameters are part of the schedule tree. Hence, we modify the GPGPU code generation to make sure this is the case. To obtain the necessary information we expose a new function Scop::getFullParamSpace(). We also make a couple of functions const to be able to make SCoP::getFullParamSpace() const. Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg Subscribers: nemanjai, kbarton, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D36243 llvm-svn: 309939
* [PPCGCodeGeneration] Construct `isl_multi_pw_aff` of PPCGArray.bounds even ↵Siddharth Bhat2017-08-031-16/+75
| | | | | | | | | | | | | | | | | | when polly-ignore-parameter-bounds is turned on. When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain all the paramters present in the program. The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff` to have the same parameter dimensions. To achieve this, we used to realign every `pw_aff` with `Scop::Context`. However, in conjunction with `-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context` does not contain all parameters. We set this up correctly by creating a space that has all the parameters used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space. llvm-svn: 309934
* [PPCGCodeGeneration] Correct usage of llvm::Value with getLatestValue.Siddharth Bhat2017-08-011-0/+2
| | | | | | | | | | It is possible that the `HostPtr` that coresponds to an array could be invariant load hoisted. Make sure we use the invariant load hoisted value by using `IslNodeBuilder::getLatestValue`. Differential Revision: https://reviews.llvm.org/D36001 llvm-svn: 309681
* [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getGridSizes to isl++.Siddharth Bhat2017-08-011-5/+7
| | | | llvm-svn: 309671
* [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getArrayOffset to isl++.Siddharth Bhat2017-08-011-24/+17
| | | | llvm-svn: 309669
* [GPGPU] Add support for NVIDIA libdeviceTobias Grosser2017-07-311-12/+98
| | | | | | | | | | | | | | | | | | | | | Summary: This allows us to map functions such as exp, expf, expl, for which no LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides high-performance implementations of a wide range of (math) functions. We currently link only a small subset, the exp, cos and copysign functions. Other functions will be enabled as needed. Reviewers: bollu, singam-sanjay Reviewed By: bollu Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D35703 llvm-svn: 309560
* [PPCGCodeGeneration] Check that invariant load hoisting succeeded.Siddharth Bhat2017-07-281-1/+4
| | | | | | If we fail, throw an error for now. We can gracefully handle this later. llvm-svn: 309387
* [GPGPU] Do not require the Scop::Context to have information about all ↵Tobias Grosser2017-07-281-4/+2
| | | | | | parameters llvm-svn: 309368
* [GPGPU] Fix compilation issue with latest CUDA upgrade to i128Tobias Grosser2017-07-281-4/+4
| | | | llvm-svn: 309366
* [PPCGCodeGeneration] Skip arrays with empty extent.Siddharth Bhat2017-07-251-4/+19
| | | | | | | | | | | | | | | | | Invariant load hoisted scalars, and arrays whose size we can statically compute to be 0 do not need to be allocated as arrays. Invariant load hoisted scalars are sent to the kernel directly as parameters. Earlier, we used to allocate `0` bytes of memory for these because our computation of size from `PPCGCodeGeneration::getArraySize` would result in `0`. Now, since we don't invariant loads as arrays in PPCGCodeGeneration, this problem does not occur anymore. Differential Revision: https://reviews.llvm.org/D35795 llvm-svn: 308971
* Move ScopArrayInfo::getFromAccessFunction and getFromId to isl++Tobias Grosser2017-07-241-6/+9
| | | | llvm-svn: 308892
* Convert GPUNodeBuilder::getArraySize to islcpp.Siddharth Bhat2017-07-241-8/+11
| | | | | | | | | Note: PPCGCodeGeneration::pollyBuildAstExprForStmt is at https://reviews.llvm.org/D35770 Differential Revision: https://reviews.llvm.org/D35771 llvm-svn: 308870
* [NFC] Move PPCGCodeGeneration::pollyBuildAstExprForStmt to isl++.Siddharth Bhat2017-07-241-19/+21
| | | | | | Differential Revision: https://reviews.llvm.org/D35771 llvm-svn: 308869
* Move MemoryAccess::getAddressFunction to isl++Tobias Grosser2017-07-231-1/+1
| | | | llvm-svn: 308841
* Move MemoryAccess::NewAccessRelation to isl++Tobias Grosser2017-07-231-3/+3
| | | | | | We also move related accessor functions llvm-svn: 308840
* Move MemoryAccess::id to isl++Tobias Grosser2017-07-231-4/+5
| | | | llvm-svn: 308836
* Move ScopArrayInfo to isl++Tobias Grosser2017-07-211-16/+18
| | | | | | This moves the full ScopArrayInfo class to isl++ llvm-svn: 308801
* [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support ↵Philipp Schaad2017-07-211-11/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | for Intel Summary: Added SPIR Code Generation to the PPCG Code Generator. This can be invoked using the polly-gpu-arch flag value 'spir32' or 'spir64' for 32 and 64 bit code respectively. In addition to that, runtime support has been added to execute said SPIR code on Intel GPU's, where the system is equipped with Intel's open source driver Beignet (development version). This requires the cmake flag 'USE_INTEL_OCL' to be turned on, and the polly-gpu-runtime flag value to be 'libopencl'. The transformation of LLVM IR to SPIR is currently quite a hack, consisting in part of regex string transformations. Has been tested (working) with Polybench 3.2 on an Intel i7-5500U (integrated graphics chip). Reviewers: bollu, grosser, Meinersbur, singam-sanjay Reviewed By: grosser, singam-sanjay Subscribers: pollydev, nemanjai, mgorny, Anastasia, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D35185 llvm-svn: 308751
* [NFC] [PPCGCodeGeneration] Print `verifyModule` failure to debug stream.Siddharth Bhat2017-07-211-0/+2
| | | | | | | If verifyModule fails, it is helpful to know why it failed. Add a log to the debug stream that prints the failure. llvm-svn: 308727
* Fix typo in function name Bllock -> BlockTobias Grosser2017-07-211-3/+3
| | | | llvm-svn: 308715
* Support fabs and copysign in Polly-ACCTobias Grosser2017-07-201-2/+6
| | | | llvm-svn: 308649
* [PPCGCodeGen] [3/3] Update PPCGCodeGen + tests to latest ppcg.Siddharth Bhat2017-07-201-20/+77
| | | | | | | | | | | | | | | | | | | | | This commit *WILL COMPILE*. 1. `PPCG` now uses `isl_multi_pw_aff` instead of an array of `pw_aff`. This needs us to adjust how we index array bounds and how we construct array bounds. 2. `PPCG` introduces two new kinds of nodes: `init_device` and `clear_device`. We should investigate what the correct way to handle these are. 3. `PPCG` has gotten smarter with its use of live range reordering, so some of the tests have a qualitative improvement. 4. `PPCG` changed its output style, so many test cases need to be updated to fit the new style for `polly-acc-dump-code` checks. Differential Revision: https://reviews.llvm.org/D35677 llvm-svn: 308625
* [NFC] [PPCGCodeGeneration] cleanup kills related code.Siddharth Bhat2017-07-181-24/+29
| | | | | | | | We extended kills in Polly to handle both `phi` nodes and scalars that are not used within the Scop. Update the comments and choice of variable names to reflect this. llvm-svn: 308279
* [PPCGCodeGeneration] Generate invariant loads before trying to generate IR.Siddharth Bhat2017-07-171-0/+1
| | | | | | | | | - We should call `preloadInvariantLoads` to make sure that code is generated for invariant loads in the kernel. Differential Revision: https://reviews.llvm.org/D35410 llvm-svn: 308187
* [PPCGCodeGeneration] Fix runtime check adjustments since they make ↵Siddharth Bhat2017-07-141-3/+7
| | | | | | | | | | | | | | | | | | | | | | | assumptions about BB layout. - There is a conditional branch that is used to switch between the old and new versions of the code. - If we detect that the build was unsuccessful, `PPCGCodeGeneration` will change the runtime check to be always set to false. - To actually *reach* this runtime check instruction, `PPCGCodeGeneration` was using assumptions about the layout of the BBs. - However, invariant load hoisting violates this assumption by inserting an extra basic block in the middle. - Fix the assumption on the layout by having `createScopConditionally` return the conditional branch instruction. - Use this reference to set to always-false. llvm-svn: 308010
* [Invariant Loads] Do not consider invariant loads to have dependences.Siddharth Bhat2017-07-131-1/+12
| | | | | | | | | | | | | | | | | We need to relax constraints on invariant loads so that they do not create fake RAW dependences. So, we do not consider invariant loads as scalar dependences in a region. During these changes, it turned out that we do not consider `llvm::Value` replacements correctly within `PPCGCodeGeneration` and `ISLNodeBuilder`. The replacements dictated by `ValueMap` were not being followed in all places. This was fixed in this commit. There is no clean way to decouple this change because this bug only seems to arise when the relaxed version of invariant load hoisting was enabled. Differential Revision: https://reviews.llvm.org/D35120 llvm-svn: 307907
* [PPCGCodeGen] Differentiate kernels based on their parent ScopSingapuram Sanjay Srivallabh2017-07-121-2/+2
| | | | | | | | | | | | | | | | | | | | | Summary: Add a sequence number that identifies a ptx_kernel's parent Scop within a function to it's name to differentiate it from other kernels produced from the same function, yet different Scops. Kernels produced from different Scops can end up having the same name. Consider a function with 2 Scops and each Scop being able to produce just one kernel. Both of these kernels have the name "kernel_0". This can lead to the wrong kernel being launched when the runtime picks a kernel from its cache based on the name alone. This patch supplements D33985, by differentiating kernels across Scops as well. Previously (even before D33985) while profiling kernels generated through JIT e.g. Julia, [[ https://groups.google.com/d/msg/polly-dev/J1j587H3-Qw/mR-jfL16BgAJ | kernels associated with different functions, and even different SCoPs within a function, would be grouped together due to the common name ]]. This patch prevents this grouping and the kernels are reported separately. Reviewers: grosser, bollu Reviewed By: grosser Subscribers: mehdi_amini, nemanjai, pollydev, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D35176 llvm-svn: 307814
* [Polly] [PPCGCodeGeneration] Teach `must_kills` to kill scalars that are ↵Siddharth Bhat2017-07-061-5/+25
| | | | | | | | | | | | | | local to the scop. - By definition, we can pass something as a `kill` to PPCG if we know that no data can flow across a kill. - This is useful for more complex examples where we have scalars that are local to a scop. - If the local is only used within a scop, we are free to kill it. Differential Revision: https://reviews.llvm.org/D35045 llvm-svn: 307260
* Prefix the name of the calling host function in the name of callee GPU kernelSingapuram Sanjay Srivallabh2017-07-051-3/+11
| | | | | | | | | | | | | | | | | | | Summary: Provide more context to the name of a GPU kernel by prefixing its name with the host function that calls it. E.g. The first kernel called by `gemm` would be `FUNC_gemm_KERNEL_0`. Kernels currently follow the "kernel_#" (# = 0,1,2,3,...) nomenclature. This patch makes it easier to map host caller and device callee, especially when there are many kernels produced by Polly-ACC. Reviewers: grosser, Meinersbur, bollu, philip.pfaffe, kbarton! Reviewed By: grosser Subscribers: nemanjai, pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D33985 llvm-svn: 307173
* [PPCGCodeGeneration] Teach Polly to start using live range reordering.Siddharth Bhat2017-07-051-6/+124
| | | | | | | | | | | | | | | | | | | | Polly did not use PPCG's live range reordering feature. Teach PPCGCodeGeneration to use this. Documentation on this is sparse, so much of the code is conservative. We currently kill all phi nodes in a Scop by appending them to the must_kill map we pass to PPCG. I do not have a proof of correctness, but it seems to be intuitively correct. We also do not handle `array_order`, which, quoting PPCG, is: PPCG/gpu.h: "Order dependences on non-scalars." It seems to consist of RAW dependences between arrays. We need to pass this information for more complex privatization cases. Differential Revision: https://reviews.llvm.org/D34941 llvm-svn: 307163
* Introduce a hybrid target to generate code for either the GPU or CPUSingapuram Sanjay Srivallabh2017-06-301-1/+3
| | | | | | | | | | | | | | | | | | | | | Summary: Introduce a "hybrid" `-polly-target` option to optimise code for either the GPU or CPU. When this target is selected, PPCGCodeGeneration will attempt first to optimise a Scop. If the Scop isn't modified, it is then sent to the passes that form the CPU pipeline, i.e. IslScheduleOptimizerPass, IslAstInfoWrapperPass and CodeGeneration. In case the Scop is modified, it is marked to be skipped by the subsequent CPU optimisation passes. Reviewers: grosser, Meinersbur, bollu Reviewed By: grosser Subscribers: kbarton, nemanjai, pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D34054 llvm-svn: 306863
* [PPCGCodeGeneration] Add flag to allow polly to fail in GPU kernel fails.Siddharth Bhat2017-06-261-0/+15
| | | | | | - This is useful for debugging GPU code. llvm-svn: 306290
* [PPCGCodeGeneration] Allow intrinsics within kernels.Siddharth Bhat2017-06-261-20/+125
| | | | | | | | | | | | | | | | | | | - In D33414, if any function call was found within a kernel, we would bail out. - This is an over-approximation. This patch changes this by allowing the `llvm.sqrt.*` family of intrinsics. - This introduces an additional step when creating a separate llvm::Module for a kernel (GPUModule). We now copy function declarations from the original module to new module. - We also populate IslNodeBuilder::ValueMap so it replaces the function references to the old module to the ones in the new module (GPUModule). Differential Revision: https://reviews.llvm.org/D34145 llvm-svn: 306284
* [NFC] Return both polly.start and polly.exiting from executeScopConditionally.Andreas Simbuerger2017-06-261-1/+2
| | | | | | | | | | | | | | This commit returns both the start and the exit block that are created by executeScopConditionally. In a future commit we will make use of the exit block. Before we would have to use the implicit property that there won't be any code generated between polly.start and polly.exiting at the time of use to find the correct block ('polly.exiting'). All usage location are semantically unchanged. llvm-svn: 306283
* [PPCGCodeGeneration] Enable GPU code generation with invariant loads.Siddharth Bhat2017-06-251-4/+0
| | | | | | | | | | The condition that disallowed code generation in PPCGCodeGeneration with invariant loads is not required. I haven't been able to construct a counterexample where this generates invalid code. Differential Revision: https://reviews.llvm.org/D34604 llvm-svn: 306245
* [Polly] [PPCGCodeGeneration] Skip Scops which contain function pointers.Siddharth Bhat2017-06-121-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In `PPCGCodeGeneration`, we try to take the references of every `Value` that is used within a Scop to offload to the kernel. This occurs in `GPUNodeBuilder::createLaunchParameters`. This breaks if one of the values is a function pointer, since one of these cases will trigger: 1. We try to to take the references of an intrinsic function, and this breaks at `verifyModule`, since it is illegal to take the reference of an intrinsic. 2. We manage to take the reference to a function, but this fails at `verifyModule` since the function will not be present in the module that is created in the kernel. 3. Even if `verifyModule` succeeds (which should not occur), we would then try to call a *host function* from the *device*, which is illegal runtime behaviour. So, we disable this entire range of possibilities by simply not allowing function references within a `Scop` which corresponds to a kernel. However, note that this is too conservative. We *can* allow intrinsics within kernels if the backend can lower the intrinsic correctly. For example, an intrinsic like `llvm.powi.*` can actually be lowered by the `NVPTX` backend. We will now gradually whitelist intrinsics which are known to be safe. Differential Revision: https://reviews.llvm.org/D33414 llvm-svn: 305185
* Fix a lot of typos. NFC.Michael Kruse2017-06-081-5/+5
| | | | llvm-svn: 304974
* [Polly][NewPM] Port IslAst to the new ScopPassManagerPhilip Pfaffe2017-05-231-1/+1
| | | | | | | | | | | | | | | | Summary: This patch ports IslAst to the new PM. The change is mostly straightforward. The only major modification required is making IslAst move-only, to correctly manage the isl resources it owns. Reviewers: grosser, Meinersbur Reviewed By: grosser Subscribers: nemanjai, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D33422 llvm-svn: 303622
* [Fortran Support] Materialize outermost dimension for Fortran array.Siddharth Bhat2017-05-191-1/+9
| | | | | | | | | | | | | | | | | | | - We use the outermost dimension of arrays since we need this information to generate GPU transfers. - In general, if we do not know the outermost dimension of the array (because the indexing expression is non-affine, for example) then we simply cannot generate transfer code. - However, for Fortran arrays, we can use the Fortran array representation which stores the dimensions of all arrays. - This patch uses the Fortran array representation to generate code that computes the outermost dimension size. Differential Revision: https://reviews.llvm.org/D32967 llvm-svn: 303429
* [Polly][NewPM] Port ScopDetection to the new PassManagerPhilip Pfaffe2017-05-121-3/+3
| | | | | | | | | | | | | | | | Summary: This is a proof of concept of how to port polly-passes to the new PassManager architecture. This approach works ootb for Function-Passes, but might not be directly applicable to Scop/Region-Passes. While we could just run the Analyses/Transforms over functions instead, we'd surrender the nice pipelining behaviour we have now. Reviewers: Meinersbur, grosser Reviewed By: grosser Subscribers: pollydev, sanjoy, nemanjai, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D31459 llvm-svn: 302902
OpenPOWER on IntegriCloud