summaryrefslogtreecommitdiffstats
path: root/polly/lib/CodeGen/PPCGCodeGeneration.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Run polly-update-format. NFC.Michael Kruse2017-11-211-2/+2
| | | | | | | polly-check-format has been failing since at least r318517, due to more than one cause. llvm-svn: 318795
* Port ScopInfo to the isl cpp bindingsPhilip Pfaffe2017-11-191-20/+21
| | | | | | | | | | | | | | | | | | | | | Summary: Most changes are mechanical, but in one place I changed the program semantics by fixing a likely bug: In `Scop::hasFeasibleRuntimeContext()`, I'm now explicitely handling the error-case. Before, when the call to `addNonEmptyDomainConstraints()` returned a null set, this (probably) accidentally worked because isl_bool_error converts to true. I'm checking for nullptr now. Reviewers: grosser, Meinersbur, bollu Reviewed By: Meinersbur Subscribers: nemanjai, kbarton, pollydev, llvm-commits Differential Revision: https://reviews.llvm.org/D39971 llvm-svn: 318632
* [polly] Remove redundant return [NFC]Mandeep Singh Grang2017-11-101-1/+0
| | | | | | | | | | | | | | Reviewers: grosser, bollu Reviewed By: grosser Subscribers: nemanjai, kbarton, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D39916 llvm-svn: 317922
* [Acc] Do not statically dispatch into IslNodeBuilder's createForPhilip Pfaffe2017-10-291-0/+7
| | | | | | | | | | | | | | | | | | | | Summary: When GPUNodeBuilder creates loops inside the kernel, it dispatches to IslNodeBuilder. This however is surprisingly dangerous, since it accesses the AST Node's user through the wrong type. This patch fixes this problem by overriding createFor correctly. This fixes PR35010. Reviewers: grosser, bollu, Meinersbur Reviewed By: Meinersbur Subscribers: Meinersbur, nemanjai, pollydev, llvm-commits, kbarton Differential Revision: https://reviews.llvm.org/D39364 llvm-svn: 316872
* [GPGPU] Make sure escaping invariant load hoisted scalars are preservedTobias Grosser2017-10-041-1/+3
| | | | | | | | | | | | | | | We make sure that the final reload of an invariant scalar memory access uses the same stack slot into which the invariant memory access was stored originally. Earlier, this was broken as we introduce a new stack slot aside of the preload stack slot, which remained uninitialized and caused our escaping loads to contain garbage. This happened due to us clearing the pre-populated values in EscapeMap after kernel code generation. We address this issue by preserving the original host values and restoring them after kernel code generation. EscapeMap is not expected to be used during kernel code generation, hence we clear it during kernel generation to make sure that any unintended uses are noticed. llvm-svn: 314894
* [GPGPU] Set Polly's RTC to false in case invariant load hoisting failsTobias Grosser2017-10-011-0/+6
| | | | | | | | | | | This matches the behavior we already have in lib/Codegen/CodeGeneration.cpp and makes sure that we fall back to the original code. It seems when invariant load hoisting was introduced to the GPGPU backend we missed to reset the RTC flag, such that kernels where invariant load hoisting failed executed the 'optimized' SCoP, which however is set to a simple 'unreachable'. Unsurprisingly, this results in hard to debug issues that are a lot of fun to debug. llvm-svn: 314624
* [PPCGCodeGen] Document pre-composition with Zero in getExtent. [NFC]Siddharth Bhat2017-09-071-0/+26
| | | | | | | It's weird at first glance that we do this, so I wrote up some documentation on why we need to perform this process. llvm-svn: 312715
* [PPCGCodeGen] Convert intrinsics to libdevice functions whenever possible.Siddharth Bhat2017-08-311-7/+41
| | | | | | | | | | | | | | | | This is useful when we face certain intrinsics such as `llvm.exp.*` which cannot be lowered by the NVPTX backend while other intrinsics can. So, we would need to keep blacklists of intrinsics that cannot be handled by the NVPTX backend. It is much simpler to try and promote all intrinsics to libdevice versions. This patch makes function/intrinsic very uniform, and will always try to use a libdevice version if it exists. Differential Revision: https://reviews.llvm.org/D37056 llvm-svn: 312239
* [PM] Properly require and preserve OptimizationRemarkEmitter. NFCI.Michael Kruse2017-08-281-11/+2
| | | | | | | | | | | | | | | | | | | | | | Properly require and preserve the OptimizationRemarkEmitter for use in ScopPass. Previously one had to get the ORE from ScopDetection because CodeGeneration did not mark it as preserved. It would need to be recomputed which results in the legacy PM to throw away all previous SCoP analysis. This also changes the implementation of ScopPass::getAnalysisUsage to not unconditionally preserve all passes, but only those needed to be preserved by any SCoP pass (at least when using the legacy PM). This allows invalidating DependenceInfo (and IslAstInfo) in case the pass would cause them to change (e.g. OpTree, DeLICM, MaximalArrayExpansion) JSONImporter should also invalidate the DependenceInfo. In this patch it marks DependenceInfo as preserved anyway because some regression tests depend on it. Differential Revision: https://reviews.llvm.org/D37010 llvm-svn: 311888
* [Polly] [PPCGCodeGeneration] Mild refactoring of checking validity of ↵Siddharth Bhat2017-08-241-9/+10
| | | | | | | | | | | | functions in a kernel. This is a stylistic change to make the function a little more readable. Also add a debug print to show what instruction contains a use of a function we don't understand in the kernel. Differential Revision: https://reviews.llvm.org/D37058 llvm-svn: 311648
* [PPCGCodeGen] Fix compiler warning: '<': signed/unsigned mismatch. NFC.Michael Kruse2017-08-231-6/+6
| | | | | | | | | | | MSVC warns about comparison between a signed and unsigned integer. The rules of C(++) define that an unsigned comparison has to be carried-out in this case. This is unlikely to be intended. Fix by assigning the loop's upper bound to a signed integer first. This also avoids repeated evaluation of the invariant upper bound. llvm-svn: 311548
* [PPCGCodeGeneration] Enable `polly-codegen-perf-monitoring` for PPCGCodegen.Siddharth Bhat2017-08-211-0/+19
| | | | | | | | | | This feature was not enabled for `PPCGCodeGeneration`. Now that this is enabled, we can benchmark Scops that have been optimised with `-polly-codegen-ppcg` with the `-polly-codegen-perf-monitoring` option. Differential Revision: https://reviews.llvm.org/D36934 llvm-svn: 311328
* [GPGPU] Add llvm.powi to the libdevice supported functionsTobias Grosser2017-08-211-1/+1
| | | | | | These intrinsics are used in COSMO. llvm-svn: 311324
* [GPGPU] Add log / logf to the libdevice supported functionsTobias Grosser2017-08-211-2/+2
| | | | | | These two functions are used in COSMO llvm-svn: 311322
* Revert "[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]"Tobias Grosser2017-08-191-79/+3
| | | | | | | | | We still see some issues with parameter space mismatches. Revert this to get a clean baseline. We will recommit after these issues have been resolved. This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49. llvm-svn: 311268
* [GPGPU] Correctly initialize array order and fixed_element informationTobias Grosser2017-08-191-7/+7
| | | | | | | | | | | | | | | | | | | | | | Summary: This information is necessary for PPCG to perform correct life range reordering. With these changes applied we can live-range reorder some of the important kernels in COSMO. We also update and rename one test case, which previously could not be optimized and now is optimized thanks to live-range reordering. To preserve test coverage we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises our automatic abort in case of scalar writes in the kernel. Reviewers: Meinersbur, bollu, singam-sanjay Subscribers: nemanjai, pollydev, llvm-commits, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D36929 llvm-svn: 311259
* [PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtimePhilipp Schaad2017-08-191-14/+28
| | | | | | | | Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen. Differential revision: D36925 llvm-svn: 311248
* [GPGPU] Collect parameter dimension used in MemoryAccessesTobias Grosser2017-08-191-5/+17
| | | | | | | | | | | When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory we add parameter dimensions lazily to the domains, which results in PPCG not including parameter dimensions that are only used in memory accesses in the kernel space. To make sure these parameters are still passed to the kernel, we collect these parameter dimensions and align the kernel's parameter space before code-generating it. llvm-svn: 311239
* [GPGPU] Simplify PPCGSCop to reduce compile time [NFC]Tobias Grosser2017-08-181-3/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Drop unused parameter dimensions to reduce the size of the sets we are working with. Especially the computed dependences tend to accumulate a lot of parameters that are present in the input memory accesses, but often not necessary to express the actual dependences. As isl represents maps and sets with dense matrices, reducing the dimensionality of isl sets commonly reduces code generation performance. This reduces compile time from 17 to 11 seconds for our test case. While this is not impressive, this patch helped me to identify the previous two performance improvements and additionally also increases readability of the isl data structures we use. Reviewers: Meinersbur, bollu, singam-sanjay Reviewed By: bollu Subscribers: nemanjai, pollydev, llvm-commits, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D36869 llvm-svn: 311161
* [Polly] [PPCGCodeGeneration] Print current Scop and loop depth in ↵Siddharth Bhat2017-08-181-0/+3
| | | | | | | | PPCGCodeGen. [NFC] Differential Revision: https://reviews.llvm.org/D36871 llvm-svn: 311158
* [GPGPU] Do not create copy statements when targetting managed memoryTobias Grosser2017-08-181-1/+2
| | | | | | | | | | | | | | | | | | Summary: They are not used and consequently do not even need to be computed. This reduces the overall compile time for our kernel from 1m33s to 17s. Reviewers: Meinersbur, bollu, singam-sanjay Reviewed By: bollu Subscribers: nemanjai, pollydev, llvm-commits, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D36868 llvm-svn: 311157
* [GPGPU] Synchronize after each kernel, not each copy outTobias Grosser2017-08-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change reduces the overall number of synchronize calls for kernels with a lot of output data at the cost of additional synchronize calls for kernels launched in sequence without any device to host transfers in between. As the latter pattern is a lot less frequent, this seems a better tradeoff. Even though the above motivation would be motivation enough, this is just a step towards enabling ppcg to not compute to and from device copy calls at all, which would be incorrect in case we still relied on these calls to place our synchronization statements. Reviewers: Meinersbur, bollu, singam-sanjay Reviewed By: bollu Subscribers: nemanjai, kbarton, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D36867 llvm-svn: 311155
* [GPGPU] Only collect the access that belong to an array [NFC]Tobias Grosser2017-08-171-6/+5
| | | | | | | | This avoid the construction of very large sets and in many cases also keeps the number of parameters low. As a result, we see a compile time reduction from 5 minutes to only slightly above 1 minute for one of our larger test cases. llvm-svn: 311127
* [GPGPU] Move getExtend to C++ [NFC]Tobias Grosser2017-08-171-54/+35
| | | | llvm-svn: 311123
* [GPGPU] Make the ast_build available to block generatorTobias Grosser2017-08-101-0/+2
| | | | | | This is necessary for partial writes (as used by delicm) to work. llvm-svn: 310553
* [ManagedMemoryRewrite] Introduce a new pass to rewrite modules to use ↵Siddharth Bhat2017-08-091-30/+37
| | | | | | | | | | | | | | | | managed memory. This pass is useful to automatically convert a codebase that uses malloc/free to use their managed memory counterparts. Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants. A future patch will teach ManagedMemoryRewrite to rewrite global arrays as pointers to globally allocated managed memory. Differential Revision: https://reviews.llvm.org/D36513 llvm-svn: 310471
* [PPCGCodeGeneration] Compute element size in bytes for arrays correctly.Siddharth Bhat2017-08-091-1/+14
| | | | | | | | | | | | Previously, we used to compute this with `elementSizeInBits / 8`. This would yield an element size of 0 when the array had element size < 8 in bits. To fix this, ask data layout what the size in bytes should be. Differential Revision: https://reviews.llvm.org/D36459 llvm-svn: 310448
* [Polly] [PPCGCodeGeneration] Handle failing of invariant load hoisting ↵Siddharth Bhat2017-08-081-9/+26
| | | | | | | | | | | gracefully. To do this, we replicate what `CodeGeneration` does. We expose `markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`. Differential Revision: https://reviews.llvm.org/D36457 llvm-svn: 310350
* [GPGPU] Remove redundant constructorsTobias Grosser2017-08-071-4/+4
| | | | llvm-svn: 310284
* [ScopInfo] Move Scop::getPwAffOnly to isl++ [NFC]Tobias Grosser2017-08-061-1/+1
| | | | llvm-svn: 310231
* [ScopInfo] Move Scop::getDomains to isl++ [NFC]Tobias Grosser2017-08-061-2/+3
| | | | llvm-svn: 310230
* [ScopInfo] Translate Scop::getParamSpace to isl++ [NFC]Tobias Grosser2017-08-061-9/+7
| | | | llvm-svn: 310224
* [ScopInfo] Translate Scop::getContext to isl++ [NFC]Tobias Grosser2017-08-061-9/+8
| | | | llvm-svn: 310221
* [ScopInfo] Translate Scop::getIdForParam to isl++ [NFC]Tobias Grosser2017-08-061-1/+1
| | | | llvm-svn: 310220
* [ScopInfo] Move get*Writes/getReads/getAccesses to isl++Tobias Grosser2017-08-061-4/+4
| | | | llvm-svn: 310219
* Move ScopInfo::getDomain(), getDomainSpace(), getDomainId() to isl++Tobias Grosser2017-08-061-4/+5
| | | | llvm-svn: 310209
* [GPGPU] Make sure managed arrays are prepared at the beginning of the scopTobias Grosser2017-08-061-32/+41
| | | | | | | | | | | | | | | Summary: This resolves some "instruction does not dominate use" errors, as we used to prepare the arrays at the location of the first kernel, which not necessarily dominated all other kernel calls. Reviewers: Meinersbur, bollu, singam-sanjay Subscribers: nemanjai, pollydev, llvm-commits, kbarton Differential Revision: https://reviews.llvm.org/D36372 llvm-svn: 310196
* [GPGPU] Rename all, not only the first libdevice functionTobias Grosser2017-08-061-2/+3
| | | | llvm-svn: 310194
* [Polly] [PPCGCodeGeneration] Deal with loops outside the Scop correctly in ↵Siddharth Bhat2017-08-061-11/+29
| | | | | | | | | | | | | | | PPCGCodeGeneration. A Scop with a loop outside it is not handled currently by PPCGCodeGeneration. The test case is such that the Scop has only one inner loop that is detected. This currently breaks codegen. The fix is to reuse the existing mechanism in `IslNodeBuilder` within `GPUNodeBuilder. Differential Revision: https://reviews.llvm.org/D36290 llvm-svn: 310193
* [PPCGCodeGeneration] [NFC] Log every location from which PPCGCodegen bails.Siddharth Bhat2017-08-041-5/+23
| | | | | | | | This is useful when trying to understand why no GPU code was produced. Differential Revision: https://reviews.llvm.org/D36318 llvm-svn: 310103
* Make sure that all parameter dimensions are set in scheduleTobias Grosser2017-08-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | Summary: In case the option -polly-ignore-parameter-bounds is set, not all parameters will be added to context and domains. This is useful to keep the size of the sets and maps we work with small. Unfortunately, for AST generation it is necessary to ensure all parameters are part of the schedule tree. Hence, we modify the GPGPU code generation to make sure this is the case. To obtain the necessary information we expose a new function Scop::getFullParamSpace(). We also make a couple of functions const to be able to make SCoP::getFullParamSpace() const. Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg Subscribers: nemanjai, kbarton, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D36243 llvm-svn: 309939
* [PPCGCodeGeneration] Construct `isl_multi_pw_aff` of PPCGArray.bounds even ↵Siddharth Bhat2017-08-031-16/+75
| | | | | | | | | | | | | | | | | | when polly-ignore-parameter-bounds is turned on. When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain all the paramters present in the program. The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff` to have the same parameter dimensions. To achieve this, we used to realign every `pw_aff` with `Scop::Context`. However, in conjunction with `-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context` does not contain all parameters. We set this up correctly by creating a space that has all the parameters used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space. llvm-svn: 309934
* [PPCGCodeGeneration] Correct usage of llvm::Value with getLatestValue.Siddharth Bhat2017-08-011-0/+2
| | | | | | | | | | It is possible that the `HostPtr` that coresponds to an array could be invariant load hoisted. Make sure we use the invariant load hoisted value by using `IslNodeBuilder::getLatestValue`. Differential Revision: https://reviews.llvm.org/D36001 llvm-svn: 309681
* [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getGridSizes to isl++.Siddharth Bhat2017-08-011-5/+7
| | | | llvm-svn: 309671
* [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getArrayOffset to isl++.Siddharth Bhat2017-08-011-24/+17
| | | | llvm-svn: 309669
* [GPGPU] Add support for NVIDIA libdeviceTobias Grosser2017-07-311-12/+98
| | | | | | | | | | | | | | | | | | | | | Summary: This allows us to map functions such as exp, expf, expl, for which no LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides high-performance implementations of a wide range of (math) functions. We currently link only a small subset, the exp, cos and copysign functions. Other functions will be enabled as needed. Reviewers: bollu, singam-sanjay Reviewed By: bollu Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D35703 llvm-svn: 309560
* [PPCGCodeGeneration] Check that invariant load hoisting succeeded.Siddharth Bhat2017-07-281-1/+4
| | | | | | If we fail, throw an error for now. We can gracefully handle this later. llvm-svn: 309387
* [GPGPU] Do not require the Scop::Context to have information about all ↵Tobias Grosser2017-07-281-4/+2
| | | | | | parameters llvm-svn: 309368
* [GPGPU] Fix compilation issue with latest CUDA upgrade to i128Tobias Grosser2017-07-281-4/+4
| | | | llvm-svn: 309366
* [PPCGCodeGeneration] Skip arrays with empty extent.Siddharth Bhat2017-07-251-4/+19
| | | | | | | | | | | | | | | | | Invariant load hoisted scalars, and arrays whose size we can statically compute to be 0 do not need to be allocated as arrays. Invariant load hoisted scalars are sent to the kernel directly as parameters. Earlier, we used to allocate `0` bytes of memory for these because our computation of size from `PPCGCodeGeneration::getArraySize` would result in `0`. Now, since we don't invariant loads as arrays in PPCGCodeGeneration, this problem does not occur anymore. Differential Revision: https://reviews.llvm.org/D35795 llvm-svn: 308971
OpenPOWER on IntegriCloud