summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Scalar/SROA.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Teach the AllocaPromoter which is wrapped around the SSAUpdaterChandler Carruth2013-07-291-15/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | infrastructure to do promotion without a domtree the same smarts about looking through GEPs, bitcasts, etc., that I just taught mem2reg about. This way, if SROA chooses to promote an alloca which still has some noisy instructions this code can cope with them. I've not used as principled of an approach here for two reasons: 1) This code doesn't really need it as we were already set up to zip through the instructions used by the alloca. 2) I view the code here as more of a hack, and hopefully a temporary one. The SSAUpdater path in SROA is a real sore point for me. It doesn't make a lot of architectural sense for many reasons: - We're likely to end up needing the domtree anyways in a subsequent pass, so why not compute it earlier and use it. - In the future we'll likely end up needing the domtree for parts of the inliner itself. - If we need to we could teach the inliner to preserve the domtree. Part of the re-work of the pass manager will allow this to be very powerful even in large SCCs with many functions. - Ultimately, computing a domtree has gotten significantly faster since the original SSAUpdater-using code went into ScalarRepl. We no longer use domfrontiers, and much of domtree is lazily done based on queries rather than eagerly. - At this point keeping the SSAUpdater-based promotion saves a total of 0.7% on a build of the 'opt' tool for me. That's not a lot of performance given the complexity! So I'm leaving this a bit ugly in the hope that eventually we just remove all of this nonsense. I can't even readily test this because this code isn't reachable except through SROA. When I re-instate the patch that fast-tracks allocas already suitable for promotion, I'll add a testcase there that failed before this change. Before that, SROA will fix any test case I give it. llvm-svn: 187347
* Temporarily revert r187323 until I update SSAUpdater to match mem2reg.Chandler Carruth2013-07-281-81/+12
| | | | | | I forgot that we had two totally independent things here. :: sigh :: llvm-svn: 187327
* Now that mem2reg understands how to cope with a slightly wider set ofChandler Carruth2013-07-281-12/+81
| | | | | | | | | | | | uses of an alloca, we can pre-compute promotability while analyzing an alloca for splitting in SROA. That lets us short-circuit the common case of a bunch of trivially promotable allocas. This cuts 20% to 30% off the run time of SROA for typical frontend-generated IR sequneces I'm seeing. It gets the new SROA to within 20% of ScalarRepl for such code. My current benchmark for these numbers is PR15412, but it fits the general pattern of IR emitted by Clang so it should be widely applicable. llvm-svn: 187323
* Thread DataLayout through the callers and into mem2reg. This will beChandler Carruth2013-07-281-1/+1
| | | | | | | useful in a subsequent patch, but causes an unfortunate amount of noise, so I pulled it out into a separate patch. llvm-svn: 187322
* Don't use all the #ifdefs to hide the stats counters and instead rely onChandler Carruth2013-07-271-18/+0
| | | | | | | | | their being optimized out in debug mode. Realistically, this just isn't going to be the slow part anyways. This also fixes unused variable warnings that are breaking LLD build bots. =/ I didn't see these at first, and kept losing track of the fact that they were broken. llvm-svn: 187297
* Fix a problem I introduced in r187029 where we would over-eagerlyChandler Carruth2013-07-241-3/+9
| | | | | | | | schedule an alloca for another iteration in SROA. This only showed up with a mixture of promotable and unpromotable selects and phis. Added a test case for this. llvm-svn: 187031
* Fix PR16687 where we were incorrectly promoting an alloca that hadChandler Carruth2013-07-241-12/+29
| | | | | | | | | | | | | | | | | | | | | | pending speculation for a phi node. The problem here is that we were using growth of the specluation set as an indicator of whether speculation would occur, and if the phi node is already in the set we don't see it grow. This is a symptom of the fact that this signal is a total hack. Unfortunately, I couldn't really come up with a non-hacky way of signaling that promotion remains valid *after* speculation occurs, such that we only speculate when all else looks good for promotion. In the end, I went with at least a much more explicit approach of doing the work of queuing inside the phi and select processing and setting a preposterously named flag to convey that we're in the special state of requiring speculating before promotion. Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing a testcase for this from a pretty giant, nasty assert in a big application. =] The testcase was excellent. llvm-svn: 187029
* Remove extraneous null statement. No functionality change!Nick Lewycky2013-07-221-1/+1
| | | | llvm-svn: 186893
* OldPtr is llvm::Instruction. Remove unneeded cast<>.Jakub Staszak2013-07-221-1/+1
| | | | llvm-svn: 186880
* SROA: Microoptimization: Remove dead entries first, then sort.Benjamin Kramer2013-07-201-9/+4
| | | | | | While there replace an explicit struct with std::mem_fun. llvm-svn: 186761
* Cleanup the stats counters for the new implementation. These actuallyChandler Carruth2013-07-191-12/+36
| | | | | | count the right things and have the right names. llvm-svn: 186667
* Fix another assert failure very similar to PR16651's test case. ThisChandler Carruth2013-07-191-0/+2
| | | | | | | test case came from Benjamin and found the parallel bug in the vector promotion code. llvm-svn: 186666
* Try to move to a more reasonable set of naming conventions given the newChandler Carruth2013-07-191-322/+305
| | | | | | | | | | | | | | | | implementation of the SROA algorithm. We were using the term 'partition' in many places that no longer ever represented an actual partition, but rather just an arbitrary slice of an alloca. No functionality change intended here. Mostly just renaming of types, functions, variables, and rewording of comments. Several comments were rewritten to make a lot more sense in the new structure of things. The stats are still weird and not reflective of how this really works. I'll fix those up in a separate patch as it is a touch more semantic of a change... llvm-svn: 186659
* A long overdue cleanup in SROA to use 'DL' instead of 'TD' for theChandler Carruth2013-07-191-123/+123
| | | | | | DataLayout variables. llvm-svn: 186656
* Fix PR16651, an assert introduced in my recent re-work of the innards ofChandler Carruth2013-07-191-9/+3
| | | | | | | | | | | | | | SROA. The crux of the issue is that now we track uses of a partition of the alloca in two places: the iterators over the partitioning uses and the previously collected split uses vector. We weren't accounting for the fact that the split uses might invalidate integer widening in ways other than due to their width (in this case due to being volatile). Further reduced testcase added to the tests. llvm-svn: 186655
* Reapply r186316 with a fix for one bug where the code could walk off theChandler Carruth2013-07-181-1255/+976
| | | | | | | | | | | | end of a vector. This was found with ASan. I've had one other report of a crasher, but thus far been unable to reproduce the crash. It may well be fixed with this version, and if not I'd like to get more information from the build bots about what is happening. See r186316 for the full commit log for the new implementation of the SROA algorithm. llvm-svn: 186565
* Revert r186316 while I track down an ASan failure and an assert fromChandler Carruth2013-07-151-972/+1255
| | | | | | | | | | | a bot. This reverts the commit which introduced a new implementation of the fancy SROA pass designed to reduce its overhead. I'll skip the huge commit log here, refer to r186316 if you're looking for how this all works and why it works that way. llvm-svn: 186332
* Reimplement SROA yet again. Same fundamental principle, but a totallyChandler Carruth2013-07-151-1255/+972
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | different core implementation strategy. Previously, SROA would build a relatively elaborate partitioning of an alloca, associate uses with each partition, and then rewrite the uses of each partition in an attempt to break apart the alloca into chunks that could be promoted. This was very wasteful in terms of memory and compile time because regardless of how complex the alloca or how much we're able to do in breaking it up, all of the datastructure work to analyze the partitioning was done up front. The new implementation attempts to form partitions of the alloca lazily and on the fly, rewriting the uses that make up that partition as it goes. This has a few significant effects: 1) Much simpler data structures are used throughout. 2) No more double walk of the recursive use graph of the alloca, only walk it once. 3) No more complex algorithms for associating a particular use with a particular partition. 4) PHI and Select speculation is simplified and happens lazily. 5) More precise information is available about a specific use of the alloca, removing the need for some side datastructures. Ultimately, I think this is a much better implementation. It removes about 300 lines of code, but arguably removes more like 500 considering that some code grew in the process of being factored apart and cleaned up for this all to work. I've re-used as much of the old implementation as possible, which includes the lion's share of code in the form of the rewriting logic. The interesting new logic centers around how the uses of a partition are sorted, and split into actual partitions. Each instruction using a pointer derived from the alloca gets a 'Partition' entry. This name is totally wrong, but I'll do a rename in a follow-up commit as there is already enough churn here. The entry describes the offset range accessed and the nature of the access. Once we have all of these entries we sort them in a very specific way: increasing order of begin offset, followed by whether they are splittable uses (memcpy, etc), followed by the end offset or whatever. Sorting by splittability is important as it simplifies the collection of uses into a partition. Once we have these uses sorted, we walk from the beginning to the end building up a range of uses that form a partition of the alloca. Overlapping unsplittable uses are merged into a single partition while splittable uses are broken apart and carried from one partition to the next. A partition is also introduced to bridge splittable uses between the unsplittable regions when necessary. I've looked at the performance PRs fairly closely. PR15471 no longer will even load (the module is invalid). Not sure what is up there. PR15412 improves by between 5% and 10%, however it is nearly impossible to know what is holding it up as SROA (the entire pass) takes less time than reading the IR for that test case. The analysis takes the same time as running mem2reg on the final allocas. I suspect (without much evidence) that the new implementation will scale much better however, and it is just the small nature of the test cases that makes the changes small and noisy. Either way, it is still simpler and cleaner I think. llvm-svn: 186316
* Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid ↵Craig Topper2013-07-031-2/+2
| | | | | | specifying the vector size. llvm-svn: 185540
* Fix SROA to avoid unnecessary scalar conversions for 1-element vectors.Bob Wilson2013-06-251-15/+16
| | | | | | | | | | | When a 1-element vector alloca is promoted, a store instruction can often be rewritten without converting the value to a scalar and using an insertelement instruction to stuff it into the new alloca. This patch just adds a check to skip that conversion when it is unnecessary. This turns out to be really important for some ARM Neon operations where <1 x i64> is used to get around the fact that i64 is not a legal type. llvm-svn: 184870
* SROA: Generate selects instead of shuffles when blending values because this ↵Nadav Rotem2013-05-011-8/+6
| | | | | | | | is the cannonical form. Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often. llvm-svn: 180875
* SROA: Don't crash on a select with two identical operands.Benjamin Kramer2013-04-211-8/+8
| | | | | | | This is an edge case that can happen if we modify a chain of multiple selects. Update all operands in that case and remove the assert. PR15805. llvm-svn: 179982
* Fix PR15674 (and PR15603): a SROA think-o.Chandler Carruth2013-04-071-0/+1
| | | | | | | | | | | | | | The fix for PR14972 in r177055 introduced a real think-o in the *store* side, likely because I was much more focused on the load side. While we can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily widen a value to be stored, as that changes the width of memory access! Lock down the code path in the store rewriting which would do this to only handle the intended circumstance. All of the existing tests continue to pass, and I've added a test from the PR. llvm-svn: 178974
* Minor cleanups. No functionality change.Jakub Staszak2013-03-241-6/+7
| | | | llvm-svn: 177837
* [SROA] Prefix names using a custom IRBuilder inserter.Chandler Carruth2013-03-211-88/+108
| | | | | | | | | | | | | | | | | | | The key part of this is ensuring that name prefixes remain in a Twine form until we get to a point where we can nuke them under NDEBUG. This is tricky using the old APIs as they played fast and loose with Twine, which is prone to serious error. The inserter is much cleaner as it is actually in the call stack leading to the setName call, and so has a good opportunity to prepend the prefix. This matters more than you might imagine because most runs over an alloca find a single partition, and rewrite 3 or 4 instructions referring to it. As a consequence doing this lazily and exclusively with Twine allows the optimizer to delete more of it and shaves another 2% to 3% off of the release build's SROA run time for PR15412. I also think the APIs are cleaner, and the use of Twine is more reliable, so I consider it a win-win despite the churn required to reach this state. llvm-svn: 177631
* Fix a silly search-and-replace goof with r177495 that only brokeChandler Carruth2013-03-201-1/+1
| | | | | | non-release builds. llvm-svn: 177498
* [SROA] Don't preserve the IR names in release builds.Chandler Carruth2013-03-201-28/+37
| | | | | | | | | | | This is espcially important because the new SROA pass goes to great lengths to provide helpful names for debugging, and as a consequence they can become very slow to render. Good for between 5% and 15% of the SROA runtime on some slow test cases such as the one in PR15412. llvm-svn: 177495
* Move the endif to the correct line so we don't have warnings aboutChandler Carruth2013-03-201-1/+1
| | | | | | unused statistics variables. llvm-svn: 177494
* Introduce some new statistics to help track the exact behavior of theChandler Carruth2013-03-201-4/+20
| | | | | | new SROA pass. llvm-svn: 177493
* Mark internal classes as POD-like to get better behavior out ofChandler Carruth2013-03-181-102/+109
| | | | | | | | SmallVector and DenseMap. This speeds up SROA by 25% on PR15412. llvm-svn: 177259
* PR14972: SROA vs. GVN exposed a really bad bug in SROA.Chandler Carruth2013-03-141-117/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fundamental problem is that SROA didn't allow for overly wide loads where the bits past the end of the alloca were masked away and the load was sufficiently aligned to ensure there is no risk of page fault, or other trapping behavior. With such widened loads, SROA would delete the load entirely rather than clamping it to the size of the alloca in order to allow mem2reg to fire. This was exposed by a test case that neatly arranged for GVN to run first, widening certain loads, followed by an inline step, and then SROA which miscompiles the code. However, I see no reason why this hasn't been plaguing us in other contexts. It seems deeply broken. Diagnosing all of the above took all of 10 minutes of debugging. The really annoying aspect is that fixing this completely breaks the pass. ;] There was an implicit reliance on the fact that no loads or stores extended past the alloca once we decided to rewrite them in the final stage of SROA. This was used to encode information about whether the loads and stores had been split across multiple partitions of the original alloca. That required threading explicit tracking of whether a *use* of a partition is split across multiple partitions. Once that was done, another problem arose: we allowed splitting of integer loads and stores iff they were loads and stores to the entire alloca. This is a really arbitrary limitation, and splitting at least some integer loads and stores is crucial to maximize promotion opportunities. My first attempt was to start removing the restriction entirely, but currently that does Very Bad Things by causing *many* common alloca patterns to be fully decomposed into i8 operations and lots of or-ing together to produce larger integers on demand. The code bloat is terrifying. That is still the right end-goal, but substantial work must be done to either merge partitions or ensure that small i8 values are eagerly merged in some other pass. Sadly, figuring all this out took essentially all the time and effort here. So the end result is that we allow splitting only when the load or store at least covers the alloca. That ensures widened loads and stores don't hurt SROA, and that we don't rampantly decompose operations more than we have previously. All of this was already fairly well tested, and so I've just updated the tests to cover the wide load behavior. I can add a test that crafts the pass ordering magic which caused the original PR, but that seems really brittle and to provide little benefit. The fundamental problem is that widened loads should Just Work. llvm-svn: 177055
* Keep coding stanard.Jakub Staszak2013-03-071-4/+3
| | | | llvm-svn: 176661
* Don't create IRBuilder if we can return from the method earlier.Jakub Staszak2013-03-071-2/+2
| | | | llvm-svn: 176660
* Remove unused variable.Jakub Staszak2013-02-191-2/+1
| | | | llvm-svn: 175568
* Minor cleanups. No functionality change.Jakub Staszak2013-02-191-10/+7
| | | | llvm-svn: 175567
* Remove unneeded #includes.Jakub Staszak2013-02-191-2/+0
| | | | llvm-svn: 175565
* Fix typos.Jakub Staszak2013-02-191-10/+10
| | | | llvm-svn: 175562
* Fixing warnings revealed by gcc release buildEdwin Vane2013-01-291-0/+1
| | | | | | | Fixed set-but-not-used warnings. Reviewer: gribozavr llvm-svn: 173810
* Move all of the header files which are involved in modelling the LLVM IRChandler Carruth2013-01-021-10/+10
| | | | | | | | | | | | | | | | | | | | | into their new header subdirectory: include/llvm/IR. This matches the directory structure of lib, and begins to correct a long standing point of file layout clutter in LLVM. There are still more header files to move here, but I wanted to handle them in separate commits to make tracking what files make sense at each layer easier. The only really questionable files here are the target intrinsic tablegen files. But that's a battle I'd rather not fight today. I've updated both CMake and Makefile build systems (I think, and my tests think, but I may have missed something). I've also re-sorted the includes throughout the project. I'll be committing updates to Clang, DragonEgg, and Polly momentarily. llvm-svn: 171366
* Add IRBuilder::CreateVectorSplat and use it to simplify code.Benjamin Kramer2013-01-011-12/+1
| | | | llvm-svn: 171349
* SROA: Clean up unused assignment warnings from clang's analyzer.Benjamin Kramer2013-01-011-5/+4
| | | | | | No functionality change. llvm-svn: 171348
* convert a bunch of callers from DataLayout::getIndexedOffset() to ↵Nuno Lopes2012-12-301-39/+1
| | | | | | | | | GEP::accumulateConstantOffset(). The later API is nicer than the former, and is correct regarding wrap-around offsets (if anyone cares). There are a few more places left with duplicated code, which I'll remove soon. llvm-svn: 171259
* SROA: Replace calls to getScalarSizeInBits to DataLayout's API becauseNadav Rotem2012-12-181-6/+6
| | | | | | getScalarSizeInBits could not handle vectors of pointers. llvm-svn: 170412
* Fix another SROA crasher, PR14601.Chandler Carruth2012-12-171-1/+1
| | | | | | | | This was a silly oversight, we weren't pruning allocas which were used by variable-length memory intrinsics from the set that could be widened and promoted as integers. Fix that. llvm-svn: 170353
* Teach the rewriting of memcpy calls to support subvector copies.Chandler Carruth2012-12-171-40/+41
| | | | | | | | | | | | | | | | | | This also cleans up a bit of the memcpy call rewriting by sinking some irrelevant code further down and making the call-emitting code a bit more concrete. Previously, memcpy of a subvector would actually miscompile (!!!) the copy into a single vector element copy. I have no idea how this ever worked. =/ This is the memcpy half of PR14478 which we probably weren't noticing previously because it didn't actually assert. The rewrite relies on the newly refactored insert- and extractVector functions to do the heavy lifting, and those are the same as used for loads and stores which makes the test coverage a bit more meaningful here. llvm-svn: 170338
* Fix a secondary bug I introduced while fixing the first part of PR14478.Chandler Carruth2012-12-171-6/+2
| | | | | | | | | | | | The first half of fixing this bug was actually in r170328, but was entirely coincidental. It did however get me to realize the nature of the bug, and adapt the test case to test more interesting behavior. In turn, that uncovered the rest of the bug which I've fixed here. This should fix two new asserts that showed up in the vectorize nightly tester. llvm-svn: 170333
* Hoist a convertValue call to the two paths where it is needed.Chandler Carruth2012-12-171-3/+4
| | | | | | | | I noticed this while looking at r170328. We only ever do a vector rewrite when the alloca *is* the vector type, so it's good to not paper over bugs here by doing a convertValue that isn't needed. llvm-svn: 170331
* Hoist the insertVector helper to be a static helper.Chandler Carruth2012-12-171-49/+62
| | | | | | | | | | | | | | | This will allow its use inside of memcpy rewriting as well. This routine is more complex than extractVector, and some of its uses are not 100% where I want them to be so there is still some work to do here. While this can technically change the output in some cases, it shouldn't be a change that matters -- IE, it can leave some dead code lying around that prior versions did not, etc. Yet another step in the refactorings leading up to the solution to the last component of PR14478. llvm-svn: 170328
* Lift the extractVector helper all the way out to a static helper function.Chandler Carruth2012-12-171-30/+32
| | | | | | | | | | | | | | | | | The method helpers all implicitly act upon the alloca, and what we really want is a fully generic helper. Doing memcpy rewrites is more special than all other rewrites because we are at times rewriting instructions which touch pointers *other* than the alloca. As a consequence all of the helpers needed by memcpy rewriting of sub-vector copies will need to be generalized fully. Note that all of these helpers ({insert,extract}{Integer,Vector}) are woefully uncommented. I'm going to go back through and document them once I get the factoring correct. No functionality changed. llvm-svn: 170325
* Factor the vector load rewriting into a more generic form.Chandler Carruth2012-12-171-16/+27
| | | | | | | | | This makes it suitable for use in rewriting memcpy in the presence of subvector memcpy intrinsics. No functionality changed. llvm-svn: 170324
OpenPOWER on IntegriCloud