summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
Commit message (Collapse)AuthorAgeFilesLines
* PowerPC: Remove implicit ilist iterator conversions, NFCDuncan P. N. Exon Smith2015-10-201-2/+2
| | | | llvm-svn: 250787
* [PM] Port ScalarEvolution to the new pass manager.Chandler Carruth2015-08-171-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change makes ScalarEvolution a stand-alone object and just produces one from a pass as needed. Making this work well requires making the object movable, using references instead of overwritten pointers in a number of places, and other refactorings. I've also wired it up to the new pass manager and added a RUN line to a test to exercise it under the new pass manager. This includes basic printing support much like with other analyses. But there is a big and somewhat scary change here. Prior to this patch ScalarEvolution was never *actually* invalidated!!! Re-running the pass just re-wired up the various other analyses and didn't remove any of the existing entries in the SCEV caches or clear out anything at all. This might seem OK as everything in SCEV that can uses ValueHandles to track updates to the values that serve as SCEV keys. However, this still means that as we ran SCEV over each function in the module, we kept accumulating more and more SCEVs into the cache. At the end, we would have a SCEV cache with every value that we ever needed a SCEV for in the entire module!!! Yowzers. The releaseMemory routine would dump all of this, but that isn't realy called during normal runs of the pipeline as far as I can see. To make matters worse, there *is* actually a key that we don't update with value handles -- there is a map keyed off of Loop*s. Because LoopInfo *does* release its memory from run to run, it is entirely possible to run SCEV over one function, then over another function, and then lookup a Loop* from the second function but find an entry inserted for the first function! Ouch. To make matters still worse, there are plenty of updates that *don't* trip a value handle. It seems incredibly unlikely that today GVN or another pass that invalidates SCEV can update values in *just* such a way that a subsequent run of SCEV will incorrectly find lookups in a cache, but it is theoretically possible and would be a nightmare to debug. With this refactoring, I've fixed all this by actually destroying and recreating the ScalarEvolution object from run to run. Technically, this could increase the amount of malloc traffic we see, but then again it is also technically correct. ;] I don't actually think we're suffering from tons of malloc traffic from SCEV because if we were, the fact that we never clear the memory would seem more likely to have come up as an actual problem before now. So, I've made the simple fix here. If in fact there are serious issues with too much allocation and deallocation, I can work on a clever fix that preserves the allocations (while clearing the data) between each run, but I'd prefer to do that kind of optimization with a test case / benchmark that shows why we need such cleverness (and that can test that we actually make it faster). It's possible that this will make some things faster by making the SCEV caches have higher locality (due to being significantly smaller) so until there is a clear benchmark, I think the simple change is best. Differential Revision: http://reviews.llvm.org/D12063 llvm-svn: 245193
* Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)Alexander Kornienko2015-06-231-2/+2
| | | | | | Apparently, the style needs to be agreed upon first. llvm-svn: 240390
* Fixed/added namespace ending comments using clang-tidy. NFCAlexander Kornienko2015-06-191-2/+2
| | | | | | | | | | | | | The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137
* [PowerPC] Really iterate over all loops in PPCLoopDataPrefetch/PPCLoopPreIncPrepHal Finkel2015-04-121-7/+3
| | | | | | | | When I fixed these a couple of days ago to iterate over all loops, not just depth == 1 loops, I inadvertently made it such that we'd only look at the first top-level loop. Make sure that we really look at all of them. llvm-svn: 234705
* [PowerPC] Fix PPCLoopPreIncPrep for depth > 1 loopsHal Finkel2015-04-111-10/+27
| | | | | | | | | This pass had the same problem as the data-prefetching pass: it was only checking for depth == 1 loops in practice. Fix that, add some debugging statements, and make sure that, when we grab an AddRec, it is for the loop we expect. llvm-svn: 234670
* [opaque pointer type] gep API migrationDavid Blaikie2015-03-141-5/+6
| | | | llvm-svn: 232279
* DataLayout is mandatory, update the API to reflect it with references.Mehdi Amini2015-03-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Now that the DataLayout is a mandatory part of the module, let's start cleaning the codebase. This patch is a first attempt at doing that. This patch is not exactly NFC as for instance some places were passing a nullptr instead of the DataLayout, possibly just because there was a default value on the DataLayout argument to many functions in the API. Even though it is not purely NFC, there is no change in the validation. I turned as many pointer to DataLayout to references, this helped figuring out all the places where a nullptr could come up. I had initially a local version of this patch broken into over 30 independant, commits but some later commit were cleaning the API and touching part of the code modified in the previous commits, so it seemed cleaner without the intermediate state. Test Plan: Reviewers: echristo Subscribers: llvm-commits From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231740
* Make DataLayout Non-Optional in the ModuleMehdi Amini2015-03-041-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231270
* Re-sort #include lines using my handy dandy ./utils/sort_includes.pyChandler Carruth2015-02-131-2/+2
| | | | | | script. This is in preparation for changes to lots of include lines. llvm-svn: 229088
* [PowerPC] Handle loop predecessor invokesHal Finkel2015-02-071-4/+12
| | | | | | | | | | If a loop predecessor has an invoke as its terminator, and the return value from that invoke is used to determine the loop iteration space, then we can't insert a computation based on that value in the loop predecessor prior to the terminator (oops). If there's such an invoke, or just no predecessor for that matter, insert a new loop preheader. llvm-svn: 228488
* Make helper functions/classes/globals static. NFC.Benjamin Kramer2015-02-061-1/+1
| | | | llvm-svn: 228410
* [PowerPC] Prepare loops for pre-increment loads/storesHal Finkel2015-02-051-0/+374
PowerPC supports pre-increment load/store instructions (except for Altivec/VSX vector load/stores). Using these on embedded cores can be very important, but most loops are not naturally set up to use them. We can often change that, however, by placing loops into a non-canonical form. Generically, this means transforming loops like this: for (int i = 0; i < n; ++i) array[i] = c; to look like this: T *p = array[-1]; for (int i = 0; i < n; ++i) *++p = c; the key point is that addresses accessed are pulled into dedicated PHIs and "pre-decremented" in the loop preheader. This allows the use of pre-increment load/store instructions without loop peeling. A target-specific late IR-level pass (running post-LSR), PPCLoopPreIncPrep, is introduced to perform this transformation. I've used this code out-of-tree for generating code for the PPC A2 for over a year. Somewhat to my surprise, running the test suite + externals on a P7 with this transformation enabled showed no performance regressions, and one speedup: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk -2.32514% +/- 1.03736% So I'm going to enable it on everything for now. I was surprised by this because, on the POWER cores, these pre-increment load/store instructions are cracked (and, thus, harder to schedule effectively). But seeing no regressions, and feeling that it is generally easier to split instructions apart late than it is to combine them late, this might be the better approach regardless. In the future, we might want to integrate this functionality into LSR (but currently LSR does not create new PHI nodes, so (for that and other reasons) significant work would need to be done). llvm-svn: 228328
OpenPOWER on IntegriCloud