summaryrefslogtreecommitdiffstats
path: root/polly
Commit message (Collapse)AuthorAgeFilesLines
...
* Make sure that all parameter dimensions are set in scheduleTobias Grosser2017-08-034-45/+123
| | | | | | | | | | | | | | | | | | | | | | | Summary: In case the option -polly-ignore-parameter-bounds is set, not all parameters will be added to context and domains. This is useful to keep the size of the sets and maps we work with small. Unfortunately, for AST generation it is necessary to ensure all parameters are part of the schedule tree. Hence, we modify the GPGPU code generation to make sure this is the case. To obtain the necessary information we expose a new function Scop::getFullParamSpace(). We also make a couple of functions const to be able to make SCoP::getFullParamSpace() const. Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg Subscribers: nemanjai, kbarton, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D36243 llvm-svn: 309939
* [test] Fix test case without Polly-ACC.Michael Kruse2017-08-031-0/+2
| | | | llvm-svn: 309938
* [PPCGCodeGeneration] Construct `isl_multi_pw_aff` of PPCGArray.bounds even ↵Siddharth Bhat2017-08-032-16/+128
| | | | | | | | | | | | | | | | | | when polly-ignore-parameter-bounds is turned on. When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain all the paramters present in the program. The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff` to have the same parameter dimensions. To achieve this, we used to realign every `pw_aff` with `Scop::Context`. However, in conjunction with `-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context` does not contain all parameters. We set this up correctly by creating a space that has all the parameters used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space. llvm-svn: 309934
* Enable simplify and forward-op-tree by defaultTobias Grosser2017-08-021-2/+2
| | | | | | | | | These passes have been tested over the last month and should generally help to remove scalar data dependences in Polly. We enable them to give them even wider test coverage. Large performance regressions and any kind of correctness regressions are not expected. llvm-svn: 309878
* Move setNewAccessRelation to isl++Tobias Grosser2017-08-026-32/+27
| | | | llvm-svn: 309871
* Move ScopStmt::setAccessRelation to isl++Tobias Grosser2017-08-022-8/+8
| | | | llvm-svn: 309870
* Replace asserts with llvm_unreachable to clarify intentTobias Grosser2017-08-021-2/+2
| | | | llvm-svn: 309856
* Fix r309826: Appease clang-format check.Philip Pfaffe2017-08-021-1/+1
| | | | llvm-svn: 309853
* Fix code format on r309826Singapuram Sanjay Srivallabh2017-08-022-3/+2
| | | | | | | | | | | | | | | | | Summary: Fix code format on r309826 / D35458 Reviewers: grosser, bollu Reviewed By: grosser Subscribers: pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D36232 llvm-svn: 309845
* Fix r309826: Move intantiation and specialization of ↵Philip Pfaffe2017-08-021-6/+11
| | | | | | | | | | OwningScopAnalysisManagerFunctionProxy to the polly namespace. When compiling with clang, explicit instantiation of the OwningScopAnalysisManagerFunctionProxy needs to happen within the polly namespace. Same goes with the specialization of its run method. llvm-svn: 309835
* [Polly][PM][WIP] Polly pass registrationPhilip Pfaffe2017-08-028-2/+343
| | | | | | | | | | | | | | | | | | | | | Summary: This patch is a first attempt at registering Polly passes with the LLVM tools. Tool plugins are still unsupported, but this registration is usable from the tools if Polly is linked into them (albeit requiring minimal patches to those tools). Registration requires a small amount of machinery (the owning analysis proxies), necessary for injecting ScopAnalysisManager objects into the calling tools. This patch is marked WIP because the registration is incomplete. Parsing manual pipelines is fully supported, but default pass injection into the O3 pipeline is lacking, mostly because there is opportunity for some redesign here, I believe. The first point of order would be insertion points. I think it makes sense to run before the vectorizers. Running Polly Early, however, is weird. Mostly because it actually is the default (which to me is unexpected), and because Polly runs it's own O1 pipeline. Why not instead insert it at an appropriate place somewhere after simplification happend? Running after the loop optimizers seems intuitive, but it also seems wasteful, since multiple consecutive loops might well be a single scop, and we don't need to run for all of them. My second request for comments would be regarding all those smallish helper passes we have, like PollyViewer, PollyPrinter, PollyImportJScop. Right now these are controlled by command line options, deciding whether they should be part of the Polly pipeline. What is your opinion on treating them like real passes, and have the user write an appropriate pipeline if they want to use any of them? Reviewers: grosser, Meinersbur, bollu Reviewed By: grosser Subscribers: llvm-commits, pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D35458 llvm-svn: 309826
* Remove debug metadata from copied instruction to prevent GPUModule ↵Singapuram Sanjay Srivallabh2017-08-022-0/+114
| | | | | | | | | | | | | | | | | | | | | | | | | verification failure Summary: **Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.** When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string. The only debug metadata of the instruction, the DebugLoc, is unset by this patch. This patch reattempts https://reviews.llvm.org/D35630 by targeting only those instructions that are to end up in a Module meant for the GPU. Reviewers: grosser, bollu Reviewed By: grosser Subscribers: pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D36161 llvm-svn: 309822
* [PM] Fix proxy invalidationPhilip Pfaffe2017-08-022-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I made a mistake in handling transitive invalidation of analysis results. I've updated the list of preserved analyses as well as the correct result dependences. The Invalidator passed through the invalidate() path can be used to transitively invalidate analyses. It frequently happens that analysis results depend on other analyses, and thus store references to their results. When the dependee now gets invalidated, the depender needs to be invalidated as well. This is the purpose of the Invalidator object, which can be used to check whether some dependee analysis is in the process of being invalidated. I originally was checking the wrong dependee analyses, which is an actual error, you can only check analysis results that are in the cache (which they are if you've captured their reference). The invalidation I'm handling inside the proxy deals with the standard analyses the proxy passes into the Scop pipeline, since I'm capturing their reference. This checking allows us to actually preserve a couple of results outside of the proxy, since the Scop pipeline shouldn't break those, or otherwise should update them accordingly. Reviewers: grosser, Meinersbur, bollu Reviewed By: grosser Subscribers: pollydev, llvm-commits Differential Revision: https://reviews.llvm.org/D36216 llvm-svn: 309811
* [GPUJIT] Add GPUJIT APIs for allocating and freeing managed memory.Siddharth Bhat2017-08-022-0/+66
| | | | | | | | | | | | | | | | | We introduce `polly_mallocManaged` and `polly_freeManaged` as proxies for `cudaMallocManaged` / `cudaFree`. This is currently not used by Polly. It is auxiliary code that is used in `COSMO`. This is useful because `polly_mallocManaged` matches the signature of `malloc`, while `cudaMallocManaged` does not. We introduce `polly_freeManaged` for symmetry. We use this in COSMO to use the unified memory feature of the newer CUDA APIs (>= 6). Differential Revision: https://reviews.llvm.org/D35991 llvm-svn: 309808
* [SI][NewPM] Collect loop count statisticsPhilip Pfaffe2017-08-021-0/+3
| | | | llvm-svn: 309807
* [SD] Set PollyUseRuntimeAliasChecks correctlyPhilip Pfaffe2017-08-022-0/+6
| | | | llvm-svn: 309805
* [GPUJIT] Teach GPUJIT to use a pre-existing CUDA context if available.Siddharth Bhat2017-08-021-1/+33
| | | | | | | | | | | | | | | | | | | On mixing the driver and runtime APIs, it is quite possible that a context already exists due to runtime API usage. In this case, Polly should try to use the same context. This patch teaches GPUJIT to detect that a context exists and how to pick up this context. Without this, calling `cudaMallocManaged`, for example, before a polly-generated kernel launch causes P100 to *hang*. This is a part of (https://reviews.llvm.org/D35991) that was extracted out. Differential Revision: https://reviews.llvm.org/D36162 llvm-svn: 309802
* [ForwardOpTree] Execute canForwardTree also in release builds.Michael Kruse2017-08-011-7/+9
| | | | | | | | | | | | Commit r309730 moved the call to canForwardTree into an assert(), even though this function has side-effects if its DoIt parameter is true. To avoid a warning in release builds, do an (void)Execution of its result instead. To avoid such confusion in the future, rename canForwardTree() to forwardTree(). llvm-svn: 309753
* [Simplify] Rewrite redundant write detection algorithm.Michael Kruse2017-08-0116-141/+675
| | | | | | | | | | | | | | | | | | | | | | The previous algorithm was to search a writes and the sours of its value operand, and see whether the write just stores the same read value back, which includes a search whether there is another write access between them. This is O(n^2) in the max number of accesses in a statement (+ the complexity of isl comparing the access functions). The new algorithm is more similar to the one used for searching for overwrites and coalescable writes. It scans over all accesses in order of execution while tracking which array elements still have the same value since it was read. This is O(n), not counting the complexity within isl. It should be more reliable than trying to catch all non-conforming cases in the previous approach. It is also less code. We now also support if the write is a partial write of the read's domain, and to some extent non-affine subregions. Differential Revision: https://reviews.llvm.org/D36137 llvm-svn: 309734
* Silence -Wunused-variable warning in NDEBUG buildsReid Kleckner2017-08-011-3/+2
| | | | llvm-svn: 309730
* [Simplify] Improve scalability.Michael Kruse2017-08-013-11/+372
| | | | | | | | | | | | | | | | | | | | | | With a lot of reads and writes to the same array in a statement, some isl sets that capture the state between access can become complex such that isl takes more considerable time and memory for operations on them. The problems identified were: - is_subset() takes considerable time with many disjoints in the arguments. We limit the number of disjoints to 4, any additional information is thrown away. - subtract() can lead to many disjoints. We instead assume that any array element is possibly accessed, which removes all disjoints. - subtract_domain() may lead to considerable processing, even if all elements are are to be removed. Instead, we remove determine and remove the affected spaces manually. No behaviour is changed. llvm-svn: 309728
* Update to isl-0.18-809-gd5b4535Tobias Grosser2017-08-019-98/+228
| | | | | | This fixes some undefined behavior in the isl schedule tree code. llvm-svn: 309727
* [NFC] Add 'REQUIRES: pollyacc' on ↵Siddharth Bhat2017-08-011-0/+2
| | | | | | | | 'test/GPGPU/invariant-load-hoisting-of-array.ll' - Should fix broken build due to `r309681`. llvm-svn: 309686
* [GPUJIT] Call `cuProfilerStop` before destroying the context to flush ↵Siddharth Bhat2017-08-011-0/+7
| | | | | | | | | | | profiler cache. This is necessary to get accurate traces from `nvprof` / `nvcc`. Otherwise, we lose some profiling information. Differential Revision: https://reviews.llvm.org/D35940 llvm-svn: 309682
* [PPCGCodeGeneration] Correct usage of llvm::Value with getLatestValue.Siddharth Bhat2017-08-012-0/+102
| | | | | | | | | | It is possible that the `HostPtr` that coresponds to an array could be invariant load hoisted. Make sure we use the invariant load hoisted value by using `IslNodeBuilder::getLatestValue`. Differential Revision: https://reviews.llvm.org/D36001 llvm-svn: 309681
* [NFC] [IslNodeBuilder, GPUNodeBuilder] Unify mechanism for looking up ↵Siddharth Bhat2017-08-012-5/+16
| | | | | | | | | | | | | | | | | | replacement Values. We populate `IslNodeBuilder::ValueMap` which contains replacements for `llvm::Value`s. There was no simple method to pick up a replacement if it exists, otherwise fall back to the original. Create a method `IslNodeBuilder::getLatestValue` which provides this functionality. This will be used in a later patch to fix bugs in `PPCGCodeGeneration` where the latest value is not being used. Differential Revision: https://reviews.llvm.org/D36000 llvm-svn: 309674
* [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getGridSizes to isl++.Siddharth Bhat2017-08-011-5/+7
| | | | llvm-svn: 309671
* [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getArrayOffset to isl++.Siddharth Bhat2017-08-011-24/+17
| | | | llvm-svn: 309669
* [ForwardOpTree] Support synthesizable values.Michael Kruse2017-07-316-53/+302
| | | | | | | | | | | | | | | | | | | | | | | | | This allows -polly-optree to move instructions that depend on synthesizable values. The difficulty for synthesizable values is that their value depends on the location. When it is moved over a loop header, and the SCEV expression depends on the loop induction variable (SCEVAddRecExpr), it would use the current induction variable instead of the last one. At the moment we cannot forward PHI nodes such that crossing the header of loops referenced by SCEVAddRecExpr is not possible (assuming the loop header has at least two incoming blocks: for entering the loop and the backedge, such any instruction to be forwarded must have a phi between use and definition). A remaining issue is when the forwarded value is used after the loop, but is only synthesizable inside the loop. This happens e.g. if ScalarEvolution is unable to determine the number of loop iterations or the initial loop value. We do not forward in this situation. Differential Revision: https://reviews.llvm.org/D36102 llvm-svn: 309609
* [Simplify] Remove all kinds of redundant scalar writes.Michael Kruse2017-07-315-8/+138
| | | | | | | | In addition to array and PHI writes, also allow scalar value writes. The only kind of write not allowed are writes by functions (including memcpy/memmove/memset). llvm-svn: 309582
* [GPGPU] Add support for NVIDIA libdeviceTobias Grosser2017-07-315-12/+182
| | | | | | | | | | | | | | | | | | | | | Summary: This allows us to map functions such as exp, expf, expl, for which no LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides high-performance implementations of a wide range of (math) functions. We currently link only a small subset, the exp, cos and copysign functions. Other functions will be enabled as needed. Reviewers: bollu, singam-sanjay Reviewed By: bollu Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D35703 llvm-svn: 309560
* Revert "Remove Debug metadata from copied instruction to prevent Module ↵Tobias Grosser2017-07-312-112/+0
| | | | | | | | | | | verification failure" This reverts commit r309490 as it triggers on our AOSP buildbut error messages of the form: inlinable function call in a function with debug info must have a !dbg location llvm-svn: 309556
* [IslNodeBuilder] Remove unused instructionTobias Grosser2017-07-311-1/+0
| | | | | Suggested-by: Maximilian Falkenstein <falkensm@student.ethz.ch> llvm-svn: 309533
* Remove Debug metadata from copied instruction to prevent Module verification ↵Singapuram Sanjay Srivallabh2017-07-292-0/+112
| | | | | | | | | | | | | | | | | | | | | | | failure Summary: **Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.** When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string. The only debug metadata of the instruction, the DebugLoc, is unset by this patch. Reviewers: grosser, bollu, Meinersbur Reviewed By: grosser, bollu Subscribers: pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D35630 llvm-svn: 309490
* [Simplify] Implement write accesses coalescing.Michael Kruse2017-07-2926-13/+1037
| | | | | | | | | | | | | | | | | Write coalescing combines write accesses that - Write the same llvm::Value. - Write to the same array. - Unless they do not write anything in a statement instance (partial writes), write to the same element. - There is no other access between them that accesses the same element. This is particularly useful after DeLICM, which leaves partial writes to disjoint domains. Differential Revision: https://reviews.llvm.org/D36010 llvm-svn: 309489
* [test] Add test case for -polly-simplify. NFC.Michael Kruse2017-07-291-0/+44
| | | | llvm-svn: 309458
* [Simplify] Do not remove dependencies of phis within region stmts.Michael Kruse2017-07-282-1/+52
| | | | | | | These were wrongly assumed to be phi nodes that require MemoryKind::PHI accesses. llvm-svn: 309454
* [VirtualInstruction] Do not iterate over a region statement's instruction ↵Michael Kruse2017-07-281-0/+1
| | | | | | | | | list. NFC. It should be empty anyways. In this case it would even be redundant because we just all all instructions in region statements. llvm-svn: 309453
* Remove offset parameter from llvm.dbg.value intrinsics in testcaseAdrian Prantl2017-07-281-5/+5
| | | | llvm-svn: 309433
* [VirtualInstruction] Remove assertion. NFC.Michael Kruse2017-07-281-2/+0
| | | | | | | | | | | | ScopStmt::contains is currently implemented on the basis of BasicBlock and does not take the instruction list into account. Therefore any instruction copied by -polly-optree into another statement currently triggers that assertion. Remove that assertion for now. We might re-enable it when the implementation of ScopStmt::contains changes. llvm-svn: 309421
* [test] Fix typo in filename. NFC.Michael Kruse2017-07-281-0/+0
| | | | llvm-svn: 309403
* [Simplify] Fix typo in statistics output. NFC.Michael Kruse2017-07-282-2/+2
| | | | llvm-svn: 309402
* [Simplify] Remove empty partial accesses first. NFC.Michael Kruse2017-07-282-4/+4
| | | | | | So follow-up cleanup do not need special handling for such accesses. llvm-svn: 309401
* [PPCGCodeGeneration] Check that invariant load hoisting succeeded.Siddharth Bhat2017-07-281-1/+4
| | | | | | If we fail, throw an error for now. We can gracefully handle this later. llvm-svn: 309387
* [ScopDetect] add `-polly-ignore-func` flag to ignore functions by name.Siddharth Bhat2017-07-282-6/+145
| | | | | | | | | | Ignore all functions whose name match a regex. Useful because creating a regex that does *not* match a string is somewhat hard. Example: https://stackoverflow.com/questions/1240275/how-to-negate-specific-word-in-regex llvm-svn: 309377
* Add missing namespace commentTobias Grosser2017-07-281-1/+1
| | | | llvm-svn: 309373
* [GPGPU] Do not require the Scop::Context to have information about all ↵Tobias Grosser2017-07-282-4/+43
| | | | | | parameters llvm-svn: 309368
* [GPGPU] Fix compilation issue with latest CUDA upgrade to i128Tobias Grosser2017-07-283-7/+7
| | | | llvm-svn: 309366
* Tiny docs fixHans Wennborg2017-07-271-1/+1
| | | | llvm-svn: 309300
* Update isl to isl-0.18-800-g4018f45Tobias Grosser2017-07-277-17/+114
| | | | | | | | | This fixes a bug in isl_flow where triggering the compute out could result in undefined or unexpected behavior. This fixes some recent regressions we saw in the android buildbots. Thanks Eli Friedman for reducing the corresponding test cases. llvm-svn: 309274
OpenPOWER on IntegriCloud