summaryrefslogtreecommitdiffstats
path: root/polly/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* [PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtimePhilipp Schaad2017-08-191-14/+28
| | | | | | | | Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen. Differential revision: D36925 llvm-svn: 311248
* Clarify the intend of the run-time checkTobias Grosser2017-08-191-2/+6
| | | | llvm-svn: 311243
* [GPGPU] Collect parameter dimension used in MemoryAccessesTobias Grosser2017-08-192-6/+24
| | | | | | | | | | | When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory we add parameter dimensions lazily to the domains, which results in PPCG not including parameter dimensions that are only used in memory accesses in the kernel space. To make sure these parameters are still passed to the kernel, we collect these parameter dimensions and align the kernel's parameter space before code-generating it. llvm-svn: 311239
* [GPGPU] Simplify PPCGSCop to reduce compile time [NFC]Tobias Grosser2017-08-181-3/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Drop unused parameter dimensions to reduce the size of the sets we are working with. Especially the computed dependences tend to accumulate a lot of parameters that are present in the input memory accesses, but often not necessary to express the actual dependences. As isl represents maps and sets with dense matrices, reducing the dimensionality of isl sets commonly reduces code generation performance. This reduces compile time from 17 to 11 seconds for our test case. While this is not impressive, this patch helped me to identify the previous two performance improvements and additionally also increases readability of the isl data structures we use. Reviewers: Meinersbur, bollu, singam-sanjay Reviewed By: bollu Subscribers: nemanjai, pollydev, llvm-commits, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D36869 llvm-svn: 311161
* [Polly] [PPCGCodeGeneration] Print current Scop and loop depth in ↵Siddharth Bhat2017-08-181-0/+3
| | | | | | | | PPCGCodeGen. [NFC] Differential Revision: https://reviews.llvm.org/D36871 llvm-svn: 311158
* [GPGPU] Do not create copy statements when targetting managed memoryTobias Grosser2017-08-181-1/+2
| | | | | | | | | | | | | | | | | | Summary: They are not used and consequently do not even need to be computed. This reduces the overall compile time for our kernel from 1m33s to 17s. Reviewers: Meinersbur, bollu, singam-sanjay Reviewed By: bollu Subscribers: nemanjai, pollydev, llvm-commits, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D36868 llvm-svn: 311157
* [GPGPU] Synchronize after each kernel, not each copy outTobias Grosser2017-08-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change reduces the overall number of synchronize calls for kernels with a lot of output data at the cost of additional synchronize calls for kernels launched in sequence without any device to host transfers in between. As the latter pattern is a lot less frequent, this seems a better tradeoff. Even though the above motivation would be motivation enough, this is just a step towards enabling ppcg to not compute to and from device copy calls at all, which would be incorrect in case we still relied on these calls to place our synchronization statements. Reviewers: Meinersbur, bollu, singam-sanjay Reviewed By: bollu Subscribers: nemanjai, kbarton, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D36867 llvm-svn: 311155
* [GPGPU] Only collect the access that belong to an array [NFC]Tobias Grosser2017-08-171-6/+5
| | | | | | | | This avoid the construction of very large sets and in many cases also keeps the number of parameters low. As a result, we see a compile time reduction from 5 minutes to only slightly above 1 minute for one of our larger test cases. llvm-svn: 311127
* [GPGPU] Move getExtend to C++ [NFC]Tobias Grosser2017-08-171-54/+35
| | | | llvm-svn: 311123
* [ManagedMemoryRewrite] Rewrite malloc, free correctly inside `Constant`s.Siddharth Bhat2017-08-171-2/+33
| | | | | | | | | | | | | | | | | Reuse the machinery built for replacing global arrays to replace malloc/free as well. Example replacement that was missed earlier: ``` call void \ bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13) ``` - Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss this. We don't miss this anymore. Differential Revision: https://reviews.llvm.org/D36825 llvm-svn: 311121
* [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.Siddharth Bhat2017-08-171-16/+263
| | | | | | | | | | | | | | | - If we have global arrays, we would like to rewrite them to global pointers which are allocated using `cudaMallocManaged`. - If we have allocas in a function, we would like to rewrite them to heap-allocations with `cudaMallocManaged` and `cudaFree`. - With these rewrite mechanisms, we can offload _any_ function to the GPU with no code rewrite whatsover. Differential Revision: https://reviews.llvm.org/D36516 llvm-svn: 311080
* [GPGPU] Also record invariant loads as kernel subtree valuesTobias Grosser2017-08-161-3/+9
| | | | | | | Before this change kernels that used invariant loads would have resulted in invalid PTX code. llvm-svn: 311042
* [GPGPU] Make the ast_build available to block generatorTobias Grosser2017-08-101-0/+2
| | | | | | This is necessary for partial writes (as used by delicm) to work. llvm-svn: 310553
* [Polly][PM] Improve invalidation in the Scop-PipelinePhilip Pfaffe2017-08-101-1/+3
| | | | | | | | | | | | | | | | | | | | | Summary: During code generation for a Scop we modify the IR of a function. While this shouldn't affect a Scop in the formal sense, the implementation caches various information about the IR such as SCEV expressions for bounds or parameters. This cached information needs to be updated or invalidated. To this end, SPMUpdater allows passes to report when they've invalidated a Scop to the PassManager, which will then flush and recompute all Scops. This in turn invalidates all iterators, so references to Scops shouldn't be held. Reviewers: grosser, Meinersbur, bollu Reviewed By: grosser Subscribers: llvm-commits, pollydev Differential Revision: https://reviews.llvm.org/D36524 llvm-svn: 310551
* [ManagedMemoryRewrite] [Polly] Erase original malloc and free. [NFC]Siddharth Bhat2017-08-091-0/+2
| | | | | | | We do not need to keep `malloc` and `free` around since they are replaced by `polly_{malloc,free}Managed.` llvm-svn: 310504
* [ManagedMemoryRewrite] Introduce a new pass to rewrite modules to use ↵Siddharth Bhat2017-08-092-30/+181
| | | | | | | | | | | | | | | | managed memory. This pass is useful to automatically convert a codebase that uses malloc/free to use their managed memory counterparts. Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants. A future patch will teach ManagedMemoryRewrite to rewrite global arrays as pointers to globally allocated managed memory. Differential Revision: https://reviews.llvm.org/D36513 llvm-svn: 310471
* [CodeGen] Use isLatestArrayKind().Michael Kruse2017-08-091-1/+1
| | | | | | | | | | Codegen with -polly-parallel queried the unmapped MemoryAccess, but only the MemoryKind after mapping is relevant for codegen. This should fix various fails of the perf-x86_64-penryn-O3-polly-parallel-fast buildbot. llvm-svn: 310466
* [PPCGCodeGeneration] Compute element size in bytes for arrays correctly.Siddharth Bhat2017-08-091-1/+14
| | | | | | | | | | | | Previously, we used to compute this with `elementSizeInBits / 8`. This would yield an element size of 0 when the array had element size < 8 in bits. To fix this, ask data layout what the size in bytes should be. Differential Revision: https://reviews.llvm.org/D36459 llvm-svn: 310448
* Use SCEV information for the second level aliasingRoman Gareev2017-08-081-8/+10
| | | | | | | | | | | | | | | | | | | | | We introduce another level of alias metadata to distinguish the individual non-aliasing accesses that have inter iteration alias-free base pointers marked with "Inter iteration alias-free" mark nodes. To distinguish two accesses, the comparison of raw pointers representing base pointers is used. In case of, for example, ublas's prod function that implements GEMM, and DeLiCM we can get accesses to same location represented by different raw pointers. Consequently, we create different alias sets that can prevent accesses from, for example, being sinked or hoisted. To avoid the issue, we compare the corresponding SCEV information instead of the corresponding raw pointers. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D35761 llvm-svn: 310380
* [Polly] [PPCGCodeGeneration] Handle failing of invariant load hoisting ↵Siddharth Bhat2017-08-082-20/+40
| | | | | | | | | | | gracefully. To do this, we replicate what `CodeGeneration` does. We expose `markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`. Differential Revision: https://reviews.llvm.org/D36457 llvm-svn: 310350
* [GPGPU] Remove redundant constructorsTobias Grosser2017-08-071-4/+4
| | | | llvm-svn: 310284
* [ScopInfo] Move Scop::getPwAffOnly to isl++ [NFC]Tobias Grosser2017-08-063-5/+5
| | | | llvm-svn: 310231
* [ScopInfo] Move Scop::getDomains to isl++ [NFC]Tobias Grosser2017-08-061-2/+3
| | | | llvm-svn: 310230
* [ScopInfo] Move Scop::getInvalidContext to isl++ [NFC]Tobias Grosser2017-08-061-1/+2
| | | | llvm-svn: 310229
* [ScopInfo] Move Scop::getAssumedContext to isl++ [NFC]Tobias Grosser2017-08-061-1/+2
| | | | llvm-svn: 310228
* [ScopInfo] Translate Scop::getParamSpace to isl++ [NFC]Tobias Grosser2017-08-063-14/+15
| | | | llvm-svn: 310224
* [ScopInfo] Translate Scop::getContext to isl++ [NFC]Tobias Grosser2017-08-065-15/+14
| | | | llvm-svn: 310221
* [ScopInfo] Translate Scop::getIdForParam to isl++ [NFC]Tobias Grosser2017-08-062-3/+3
| | | | llvm-svn: 310220
* [ScopInfo] Move get*Writes/getReads/getAccesses to isl++Tobias Grosser2017-08-061-4/+4
| | | | llvm-svn: 310219
* [ScopInfo] Move ScopStmt::setAstBuild/getAstBuild to isl++Tobias Grosser2017-08-062-2/+2
| | | | llvm-svn: 310216
* Move ScopInfo::getDomain(), getDomainSpace(), getDomainId() to isl++Tobias Grosser2017-08-063-10/+11
| | | | llvm-svn: 310209
* [GPGPU] Make sure managed arrays are prepared at the beginning of the scopTobias Grosser2017-08-061-32/+41
| | | | | | | | | | | | | | | Summary: This resolves some "instruction does not dominate use" errors, as we used to prepare the arrays at the location of the first kernel, which not necessarily dominated all other kernel calls. Reviewers: Meinersbur, bollu, singam-sanjay Subscribers: nemanjai, pollydev, llvm-commits, kbarton Differential Revision: https://reviews.llvm.org/D36372 llvm-svn: 310196
* [GPGPU] Rename all, not only the first libdevice functionTobias Grosser2017-08-061-2/+3
| | | | llvm-svn: 310194
* [Polly] [PPCGCodeGeneration] Deal with loops outside the Scop correctly in ↵Siddharth Bhat2017-08-062-11/+30
| | | | | | | | | | | | | | | PPCGCodeGeneration. A Scop with a loop outside it is not handled currently by PPCGCodeGeneration. The test case is such that the Scop has only one inner loop that is detected. This currently breaks codegen. The fix is to reuse the existing mechanism in `IslNodeBuilder` within `GPUNodeBuilder. Differential Revision: https://reviews.llvm.org/D36290 llvm-svn: 310193
* [IslNodeBuilder] [NFC] Refactor creation of loop induction variables of ↵Siddharth Bhat2017-08-061-11/+23
| | | | | | | | | | | | loops outside scops. This logic is duplicated, so we refactor it into a separate function. This will be used in a later patch to teach PPCGCodeGen code generation for loops that are outside the scop. Differential Revision: https://reviews.llvm.org/D36310 llvm-svn: 310192
* [PPCGCodeGeneration] [NFC] Log every location from which PPCGCodegen bails.Siddharth Bhat2017-08-041-5/+23
| | | | | | | | This is useful when trying to understand why no GPU code was produced. Differential Revision: https://reviews.llvm.org/D36318 llvm-svn: 310103
* Make sure that all parameter dimensions are set in scheduleTobias Grosser2017-08-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | Summary: In case the option -polly-ignore-parameter-bounds is set, not all parameters will be added to context and domains. This is useful to keep the size of the sets and maps we work with small. Unfortunately, for AST generation it is necessary to ensure all parameters are part of the schedule tree. Hence, we modify the GPGPU code generation to make sure this is the case. To obtain the necessary information we expose a new function Scop::getFullParamSpace(). We also make a couple of functions const to be able to make SCoP::getFullParamSpace() const. Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg Subscribers: nemanjai, kbarton, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D36243 llvm-svn: 309939
* [PPCGCodeGeneration] Construct `isl_multi_pw_aff` of PPCGArray.bounds even ↵Siddharth Bhat2017-08-031-16/+75
| | | | | | | | | | | | | | | | | | when polly-ignore-parameter-bounds is turned on. When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain all the paramters present in the program. The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff` to have the same parameter dimensions. To achieve this, we used to realign every `pw_aff` with `Scop::Context`. However, in conjunction with `-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context` does not contain all parameters. We set this up correctly by creating a space that has all the parameters used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space. llvm-svn: 309934
* Fix code format on r309826Singapuram Sanjay Srivallabh2017-08-021-2/+1
| | | | | | | | | | | | | | | | | Summary: Fix code format on r309826 / D35458 Reviewers: grosser, bollu Reviewed By: grosser Subscribers: pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D36232 llvm-svn: 309845
* Remove debug metadata from copied instruction to prevent GPUModule ↵Singapuram Sanjay Srivallabh2017-08-021-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | verification failure Summary: **Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.** When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string. The only debug metadata of the instruction, the DebugLoc, is unset by this patch. This patch reattempts https://reviews.llvm.org/D35630 by targeting only those instructions that are to end up in a Module meant for the GPU. Reviewers: grosser, bollu Reviewed By: grosser Subscribers: pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D36161 llvm-svn: 309822
* [PPCGCodeGeneration] Correct usage of llvm::Value with getLatestValue.Siddharth Bhat2017-08-011-0/+2
| | | | | | | | | | It is possible that the `HostPtr` that coresponds to an array could be invariant load hoisted. Make sure we use the invariant load hoisted value by using `IslNodeBuilder::getLatestValue`. Differential Revision: https://reviews.llvm.org/D36001 llvm-svn: 309681
* [NFC] [IslNodeBuilder, GPUNodeBuilder] Unify mechanism for looking up ↵Siddharth Bhat2017-08-011-5/+8
| | | | | | | | | | | | | | | | | | replacement Values. We populate `IslNodeBuilder::ValueMap` which contains replacements for `llvm::Value`s. There was no simple method to pick up a replacement if it exists, otherwise fall back to the original. Create a method `IslNodeBuilder::getLatestValue` which provides this functionality. This will be used in a later patch to fix bugs in `PPCGCodeGeneration` where the latest value is not being used. Differential Revision: https://reviews.llvm.org/D36000 llvm-svn: 309674
* [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getGridSizes to isl++.Siddharth Bhat2017-08-011-5/+7
| | | | llvm-svn: 309671
* [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getArrayOffset to isl++.Siddharth Bhat2017-08-011-24/+17
| | | | llvm-svn: 309669
* [GPGPU] Add support for NVIDIA libdeviceTobias Grosser2017-07-311-12/+98
| | | | | | | | | | | | | | | | | | | | | Summary: This allows us to map functions such as exp, expf, expl, for which no LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides high-performance implementations of a wide range of (math) functions. We currently link only a small subset, the exp, cos and copysign functions. Other functions will be enabled as needed. Reviewers: bollu, singam-sanjay Reviewed By: bollu Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D35703 llvm-svn: 309560
* Revert "Remove Debug metadata from copied instruction to prevent Module ↵Tobias Grosser2017-07-311-8/+0
| | | | | | | | | | | verification failure" This reverts commit r309490 as it triggers on our AOSP buildbut error messages of the form: inlinable function call in a function with debug info must have a !dbg location llvm-svn: 309556
* [IslNodeBuilder] Remove unused instructionTobias Grosser2017-07-311-1/+0
| | | | | Suggested-by: Maximilian Falkenstein <falkensm@student.ethz.ch> llvm-svn: 309533
* Remove Debug metadata from copied instruction to prevent Module verification ↵Singapuram Sanjay Srivallabh2017-07-291-0/+8
| | | | | | | | | | | | | | | | | | | | | | | failure Summary: **Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.** When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string. The only debug metadata of the instruction, the DebugLoc, is unset by this patch. Reviewers: grosser, bollu, Meinersbur Reviewed By: grosser, bollu Subscribers: pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D35630 llvm-svn: 309490
* [PPCGCodeGeneration] Check that invariant load hoisting succeeded.Siddharth Bhat2017-07-281-1/+4
| | | | | | If we fail, throw an error for now. We can gracefully handle this later. llvm-svn: 309387
* [GPGPU] Do not require the Scop::Context to have information about all ↵Tobias Grosser2017-07-281-4/+2
| | | | | | parameters llvm-svn: 309368
OpenPOWER on IntegriCloud