summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/SROA
Commit message (Collapse)AuthorAgeFilesLines
...
* [tests] Cleanup initialization of test suffixes.Daniel Dunbar2013-08-161-1/+0
| | | | | | | | | | | | | | | | | - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513
* Fix a problem I introduced in r187029 where we would over-eagerlyChandler Carruth2013-07-241-0/+37
| | | | | | | | schedule an alloca for another iteration in SROA. This only showed up with a mixture of promotable and unpromotable selects and phis. Added a test case for this. llvm-svn: 187031
* Fix PR16687 where we were incorrectly promoting an alloca that hadChandler Carruth2013-07-241-0/+37
| | | | | | | | | | | | | | | | | | | | | | pending speculation for a phi node. The problem here is that we were using growth of the specluation set as an indicator of whether speculation would occur, and if the phi node is already in the set we don't see it grow. This is a symptom of the fact that this signal is a total hack. Unfortunately, I couldn't really come up with a non-hacky way of signaling that promotion remains valid *after* speculation occurs, such that we only speculate when all else looks good for promotion. In the end, I went with at least a much more explicit approach of doing the work of queuing inside the phi and select processing and setting a preposterously named flag to convey that we're in the special state of requiring speculating before promotion. Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing a testcase for this from a pretty giant, nasty assert in a big application. =] The testcase was excellent. llvm-svn: 187029
* Fix another assert failure very similar to PR16651's test case. ThisChandler Carruth2013-07-191-2/+22
| | | | | | | test case came from Benjamin and found the parallel bug in the vector promotion code. llvm-svn: 186666
* Fix PR16651, an assert introduced in my recent re-work of the innards ofChandler Carruth2013-07-191-0/+20
| | | | | | | | | | | | | | SROA. The crux of the issue is that now we track uses of a partition of the alloca in two places: the iterators over the partitioning uses and the previously collected split uses vector. We weren't accounting for the fact that the split uses might invalidate integer widening in ways other than due to their width (in this case due to being volatile). Further reduced testcase added to the tests. llvm-svn: 186655
* Reapply r186316 with a fix for one bug where the code could walk off theChandler Carruth2013-07-181-1/+1
| | | | | | | | | | | | end of a vector. This was found with ASan. I've had one other report of a crasher, but thus far been unable to reproduce the crash. It may well be fixed with this version, and if not I'd like to get more information from the build bots about what is happening. See r186316 for the full commit log for the new implementation of the SROA algorithm. llvm-svn: 186565
* Revert r186316 while I track down an ASan failure and an assert fromChandler Carruth2013-07-151-1/+1
| | | | | | | | | | | a bot. This reverts the commit which introduced a new implementation of the fancy SROA pass designed to reduce its overhead. I'll skip the huge commit log here, refer to r186316 if you're looking for how this all works and why it works that way. llvm-svn: 186332
* Reimplement SROA yet again. Same fundamental principle, but a totallyChandler Carruth2013-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | different core implementation strategy. Previously, SROA would build a relatively elaborate partitioning of an alloca, associate uses with each partition, and then rewrite the uses of each partition in an attempt to break apart the alloca into chunks that could be promoted. This was very wasteful in terms of memory and compile time because regardless of how complex the alloca or how much we're able to do in breaking it up, all of the datastructure work to analyze the partitioning was done up front. The new implementation attempts to form partitions of the alloca lazily and on the fly, rewriting the uses that make up that partition as it goes. This has a few significant effects: 1) Much simpler data structures are used throughout. 2) No more double walk of the recursive use graph of the alloca, only walk it once. 3) No more complex algorithms for associating a particular use with a particular partition. 4) PHI and Select speculation is simplified and happens lazily. 5) More precise information is available about a specific use of the alloca, removing the need for some side datastructures. Ultimately, I think this is a much better implementation. It removes about 300 lines of code, but arguably removes more like 500 considering that some code grew in the process of being factored apart and cleaned up for this all to work. I've re-used as much of the old implementation as possible, which includes the lion's share of code in the form of the rewriting logic. The interesting new logic centers around how the uses of a partition are sorted, and split into actual partitions. Each instruction using a pointer derived from the alloca gets a 'Partition' entry. This name is totally wrong, but I'll do a rename in a follow-up commit as there is already enough churn here. The entry describes the offset range accessed and the nature of the access. Once we have all of these entries we sort them in a very specific way: increasing order of begin offset, followed by whether they are splittable uses (memcpy, etc), followed by the end offset or whatever. Sorting by splittability is important as it simplifies the collection of uses into a partition. Once we have these uses sorted, we walk from the beginning to the end building up a range of uses that form a partition of the alloca. Overlapping unsplittable uses are merged into a single partition while splittable uses are broken apart and carried from one partition to the next. A partition is also introduced to bridge splittable uses between the unsplittable regions when necessary. I've looked at the performance PRs fairly closely. PR15471 no longer will even load (the module is invalid). Not sure what is up there. PR15412 improves by between 5% and 10%, however it is nearly impossible to know what is holding it up as SROA (the entire pass) takes less time than reading the IR for that test case. The analysis takes the same time as running mem2reg on the final allocas. I suspect (without much evidence) that the new implementation will scale much better however, and it is just the small nature of the test cases that makes the changes small and noisy. Either way, it is still simpler and cleaner I think. llvm-svn: 186316
* Update Transforms tests to use CHECK-LABEL for easier debugging. No ↵Stephen Lin2013-07-146-70/+70
| | | | | | | | | | | | | | | | | | | | | | functionality change. This update was done with the following bash script: find test/Transforms -name "*.ll" | \ while read NAME; do echo "$NAME" if ! grep -q "^; *RUN: *llc" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \ while read FUNC; do sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP done mv $TEMP $NAME fi done llvm-svn: 186268
* SROA: Generate selects instead of shuffles when blending values because this ↵Nadav Rotem2013-05-011-14/+14
| | | | | | | | is the cannonical form. Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often. llvm-svn: 180875
* SROA: Don't crash on a select with two identical operands.Benjamin Kramer2013-04-211-0/+11
| | | | | | | This is an edge case that can happen if we modify a chain of multiple selects. Update all operands in that case and remove the assert. PR15805. llvm-svn: 179982
* Fix PR15674 (and PR15603): a SROA think-o.Chandler Carruth2013-04-071-0/+63
| | | | | | | | | | | | | | The fix for PR14972 in r177055 introduced a real think-o in the *store* side, likely because I was much more focused on the load side. While we can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily widen a value to be stored, as that changes the width of memory access! Lock down the code path in the store rewriting which would do this to only handle the intended circumstance. All of the existing tests continue to pass, and I've added a test from the PR. llvm-svn: 178974
* PR14972: SROA vs. GVN exposed a really bad bug in SROA.Chandler Carruth2013-03-142-16/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fundamental problem is that SROA didn't allow for overly wide loads where the bits past the end of the alloca were masked away and the load was sufficiently aligned to ensure there is no risk of page fault, or other trapping behavior. With such widened loads, SROA would delete the load entirely rather than clamping it to the size of the alloca in order to allow mem2reg to fire. This was exposed by a test case that neatly arranged for GVN to run first, widening certain loads, followed by an inline step, and then SROA which miscompiles the code. However, I see no reason why this hasn't been plaguing us in other contexts. It seems deeply broken. Diagnosing all of the above took all of 10 minutes of debugging. The really annoying aspect is that fixing this completely breaks the pass. ;] There was an implicit reliance on the fact that no loads or stores extended past the alloca once we decided to rewrite them in the final stage of SROA. This was used to encode information about whether the loads and stores had been split across multiple partitions of the original alloca. That required threading explicit tracking of whether a *use* of a partition is split across multiple partitions. Once that was done, another problem arose: we allowed splitting of integer loads and stores iff they were loads and stores to the entire alloca. This is a really arbitrary limitation, and splitting at least some integer loads and stores is crucial to maximize promotion opportunities. My first attempt was to start removing the restriction entirely, but currently that does Very Bad Things by causing *many* common alloca patterns to be fully decomposed into i8 operations and lots of or-ing together to produce larger integers on demand. The code bloat is terrifying. That is still the right end-goal, but substantial work must be done to either merge partitions or ensure that small i8 values are eagerly merged in some other pass. Sadly, figuring all this out took essentially all the time and effort here. So the end result is that we allow splitting only when the load or store at least covers the alloca. That ensures widened loads and stores don't hurt SROA, and that we don't rampantly decompose operations more than we have previously. All of this was already fairly well tested, and so I've just updated the tests to cover the wide load behavior. I can add a test that crafts the pass ordering magic which caused the original PR, but that seems really brittle and to provide little benefit. The fundamental problem is that widened loads should Just Work. llvm-svn: 177055
* Rename the test so that we can add additional vectors-of-pointers testsNadav Rotem2012-12-181-0/+0
| | | | | | into the same file in the future. llvm-svn: 170414
* SROA: Replace calls to getScalarSizeInBits to DataLayout's API becauseNadav Rotem2012-12-181-0/+25
| | | | | | getScalarSizeInBits could not handle vectors of pointers. llvm-svn: 170412
* Fix another SROA crasher, PR14601.Chandler Carruth2012-12-171-0/+15
| | | | | | | | This was a silly oversight, we weren't pruning allocas which were used by variable-length memory intrinsics from the set that could be widened and promoted as integers. Fix that. llvm-svn: 170353
* Teach the rewriting of memcpy calls to support subvector copies.Chandler Carruth2012-12-171-0/+48
| | | | | | | | | | | | | | | | | | This also cleans up a bit of the memcpy call rewriting by sinking some irrelevant code further down and making the call-emitting code a bit more concrete. Previously, memcpy of a subvector would actually miscompile (!!!) the copy into a single vector element copy. I have no idea how this ever worked. =/ This is the memcpy half of PR14478 which we probably weren't noticing previously because it didn't actually assert. The rewrite relies on the newly refactored insert- and extractVector functions to do the heavy lifting, and those are the same as used for loads and stores which makes the test coverage a bit more meaningful here. llvm-svn: 170338
* Fix a secondary bug I introduced while fixing the first part of PR14478.Chandler Carruth2012-12-171-17/+17
| | | | | | | | | | | | The first half of fixing this bug was actually in r170328, but was entirely coincidental. It did however get me to realize the nature of the bug, and adapt the test case to test more interesting behavior. In turn, that uncovered the rest of the bug which I've fixed here. This should fix two new asserts that showed up in the vectorize nightly tester. llvm-svn: 170333
* Fix the first part of PR14478: memset now works.Chandler Carruth2012-12-171-0/+35
| | | | | | | | | | | | | | | | | | | PR14478 highlights a serious problem in SROA that simply wasn't being exercised due to a lack of vector input code mixed with C-library function calls. Part of SROA was written carefully to handle subvector accesses via memset and memcpy, but the rewriter never grew support for this. Fixing it required refactoring the subvector access code in other parts of SROA so it could be shared, and then fixing the splat formation logic and using subvector insertion (this patch). The PR isn't quite fixed yet, as memcpy is still broken in the same way. I'm starting on that series of patches now. Hopefully this will be enough to bring the bullet benchmark back to life with the bb-vectorizer enabled, but that may require fixing memcpy as well. llvm-svn: 170301
* Add a corollary test for PR14572. We got this code path correct already.Chandler Carruth2012-12-151-2/+18
| | | | llvm-svn: 170271
* Relax an overly aggressive assert to fix PR14572.Chandler Carruth2012-12-151-0/+16
| | | | | | The alloca width is based on the alloc size, not the type size. llvm-svn: 170270
* Fix typo in test-case.Jakub Staszak2012-12-121-8/+8
| | | | llvm-svn: 170015
* Fix typo.Jakub Staszak2012-12-121-4/+4
| | | | llvm-svn: 170006
* Fix PR14548: SROA was crashing on a mixture of i1 and i8 loads and stores.Chandler Carruth2012-12-102-7/+31
| | | | | | | | | | | | | | | | | | | When SROA was evaluating a mixture of i1 and i8 loads and stores, in just a particular case, it would tickle a latent bug where we compared bits to bytes rather than bits to bits. As a consequence of the latent bug, we would allow integers through which were not byte-size multiples, a situation the later rewriting code was never intended to handle. In release builds this could trigger all manner of oddities, but the reported issue in PR14548 was forming invalid bitcast instructions. The only downside of this fix is that it makes it more clear that SROA in its current form is not capable of handling mixed i1 and i8 loads and stores. Sometimes with the previous code this would work by luck, but usually it would crash, so I'm not terribly worried. I'll watch the LNT numbers just to be sure. llvm-svn: 169719
* Fix typos in CHECK lines.Dmitri Gribenko2012-12-061-2/+2
| | | | | | Patch by Alexander Zinenko. llvm-svn: 169547
* SROA: Avoid struct and array types early to avoid creating an overly large ↵Benjamin Kramer2012-12-011-0/+13
| | | | | | | | | | integer type. Fixes PR14465. Differential Revision: http://llvm-reviews.chandlerc.com/D148 llvm-svn: 169084
* PR14055: Implement support for sub-vector operations in SROA.Chandler Carruth2012-11-211-2/+75
| | | | | | | | | | Now if we can transform an alloca into a single vector value, but it has subvector, non-element accesses, we form the appropriate shufflevectors to allow SROA to proceed. This fixes PR14055 which pointed out a very common pattern that SROA couldn't handle -- mixed vec3 and vec4 operations on a single alloca. llvm-svn: 168418
* Fix PR14132 and handle OOB loads speculated throuh PHI nodes.Chandler Carruth2012-11-201-0/+35
| | | | | | | | | | | | The issue is that we may end up with newly OOB loads when speculating a load into the predecessors of a PHI node, and this confuses the new integer splitting logic in some cases, triggering an assertion failure. In fact, the branch in question must be dead code as it loads from a too-narrow alloca. Add code to handle this gracefully and leave the requisite FIXMEs for both optimizing more aggressively and doing more to aid sanitizing invalid code which triggers these patterns. llvm-svn: 168361
* Rework the rewriting of loads and stores for vector and integer allocasChandler Carruth2012-11-203-26/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | to properly handle the combinations of these with split integer loads and stores. This essentially replaces Evan's r168227 by refactoring the code in a different way, and trynig to mirror that refactoring in both the load and store sides of the rewriting. Generally speaking there was some really problematic duplicated code here that led to poorly founded assumptions and then subtle bugs. Now much of the code actually flows through and follows a more consistent style and logical path. There is still a tiny bit of duplication on the store side of things, but it is much less bad. This also changes the logic to never re-use a load or store instruction as that was simply too error prone in practice. I've added a few tests (one a reduction of the one in Evan's original patch, which happened to be the same as the report in PR14349). I'm going to look at adding a few more tests for things I found and fixed in passing (such as the volatile tests in the vectorizable predicate). This patch has survived bootstrap, and modulo one bugfix survived Duncan's test suite, but let me know if anything else explodes. llvm-svn: 168346
* Teach SROA rewriteVectorizedStoreInst to handle cases when the loaded value ↵Evan Cheng2012-11-171-0/+25
| | | | | | is narrower than the stored value. rdar://12713675 llvm-svn: 168227
* Fix PR14212: For some strange reason I treated vectors differently fromChandler Carruth2012-10-301-0/+15
| | | | | | | | | integers in that the code to handle split alloca-wide integer loads or stores doesn't come first. It should, for the same reasons as with integers, and the PR attests to that. Also had to fix a busted assert in that this test case also covers. llvm-svn: 167051
* Teach SROA how to split whole-alloca integer loads and stores intoChandler Carruth2012-10-252-23/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | smaller integer loads and stores. The high-level motivation is that the frontend sometimes generates a single whole-alloca integer load or store during ABI lowering of splittable allocas. We need to be able to break this apart in order to see the underlying elements and properly promote them to SSA values. The hope is that this fixes some performance regressions on x86-32 with the new SROA pass. Unfortunately, this causes quite a bit of churn in the test cases, and bloats some IR that comes out. When we see an alloca that consists soley of bits and bytes being extracted and re-inserted, we now do some splitting first, before building widened integer "bucket of bits" representations. These are always well folded by instcombine however, so this shouldn't actually result in missed opportunities. If this splitting of all-integer allocas does cause problems (perhaps due to smaller SSA values going into the RA), we could potentially go to some extreme measures to only do this integer splitting trick when there are non-integer component accesses of an alloca, but discovering this is quite expensive: it adds yet another complete walk of the recursive use tree of the alloca. Either way, I will be watching build bots and LNT bots to see what fallout there is here. If anyone gets x86-32 numbers before & after this change, I would be very interested. llvm-svn: 166662
* This just in, it is a *bad idea* to use 'udiv' on an offset ofChandler Carruth2012-10-171-0/+20
| | | | | | | | | | | a pointer. A very bad idea. Let's not do that. Fixes PR14105. Note that this wasn't *that* glaring of an oversight. Originally, these routines were only called on offsets within an alloca, which are intrinsically positive. But over the evolution of the pass, they ended up being called for arbitrary offsets, and things went downhill... llvm-svn: 166095
* Update the memcpy rewriting to fully support widened int rewriting. ThisChandler Carruth2012-10-151-1/+5
| | | | | | | | includes extracting ints for copying elsewhere and inserting ints when copying into the alloca. This should fix the CanSROA assertion coming out of Clang's regression test suite. llvm-svn: 165931
* Follow-up fix to r165928: handle memset rewriting for widened integers,Chandler Carruth2012-10-151-0/+13
| | | | | | | and generally clean up the memset handling. It had rotted a bit as the other rewriting logic got polished more. llvm-svn: 165930
* First major step toward addressing PR14059. This teaches SROA to handleChandler Carruth2012-10-152-23/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cases where we have partial integer loads and stores to an otherwise promotable alloca to widen[1] those loads and stores to cover the entire alloca and bitcast them into the appropriate type such that promotion can proceed. These partial loads and stores stem from an annoying confluence of ARM's calling convention and ABI lowering and the FCA pre-splitting which takes place in SROA. Clang lowers a { double, double } in-register function argument as a [4 x i32] function argument to ensure it is placed into integer 32-bit registers (a really unnerving implicit contract between Clang and the ARM backend I would add). This results in a FCA load of [4 x i32]* from the { double, double } alloca, and SROA decomposes this into a sequence of i32 loads and stores. Inlining proceeds, code gets folded, but at the end of the day, we still have i32 stores to the low and high halves of a double alloca. Widening these to be i64 operations, and bitcasting them to double prior to loading or storing allows promotion to proceed for these allocas. I looked quite a bit changing the IR which Clang produces for this case to be more friendly, but small changes seem unlikely to help. I think the best representation we could use currently would be to pass 4 i32 arguments thereby avoiding any FCAs, but that would still require this fix. It seems like it might eventually be nice to somehow encode the ABI register selection choices outside of the parameter type system so that the parameter can be a { double, double }, but the CC register annotations indicate that this should be passed via 4 integer registers. This patch does not address the second problem in PR14059, which is the reverse: when a struct alloca is loaded as a *larger* single integer. This patch also does not address some of the code quality issues with the FCA-splitting. Those don't actually impede any optimizations really, but they're on my list to clean up. [1]: Pedantic footnote: for those concerned about memory model issues here, this is safe. For the alloca to be promotable, it cannot escape or have any use of its address that could allow these loads or stores to be racing. Thus, widening is always safe. llvm-svn: 165928
* Teach SROA to cope with wrapper aggregates. These show up a lot in ABIChandler Carruth2012-10-132-31/+44
| | | | | | | | | | | | | | | | | | | | | | | type coercion code, especially when targetting ARM. Things like [1 x i32] instead of i32 are very common there. The goal of this logic is to ensure that when we are picking an alloca type, we look through such wrapper aggregates and across any zero-length aggregate elements to find the simplest type possible to form a type partition. This logic should (generally speaking) rarely fire. It only ends up kicking in when an alloca is accessed using two different types (for instance, i32 and float), and the underlying alloca type has wrapper aggregates around it. I noticed a significant amount of this occurring looking at stepanov_abstraction generated code for arm, and suspect it happens elsewhere as well. Note that this doesn't yet address truly heinous IR productions such as PR14059 is concerning. Those result in mismatched *sizes* of types in addition to mismatched access and alloca types. llvm-svn: 165870
* Add the testcase from pr13254 (the old scalarreply pass handles this wrong;Duncan Sands2012-10-101-0/+16
| | | | | | the new sroa pass handles it right). llvm-svn: 165644
* Fix PR14034, an infloop / heap corruption / crash bug in the new SROA.Chandler Carruth2012-10-091-0/+20
| | | | | | | Thanks to Benjamin for the raw test case. This one took about 50 times longer to reduce than to fix. =/ llvm-svn: 165476
* Teach the new SROA a new trick. Now we zap any memcpy or memmoves whichChandler Carruth2012-10-051-4/+2
| | | | | | | | | | | | are in fact identity operations. We detect these and kill their partitions so that even splitting is unaffected by them. This is particularly important because Clang relies on emitting identity memcpy operations for struct copies, and these fold away to constants very often after inlining. Fixes the last big performance FIXME I have on my plate. llvm-svn: 165285
* Fix PR13969, a mini-phase-ordering issue with the new SROA pass.Chandler Carruth2012-10-041-0/+24
| | | | | | | | | | | | | | | | | | | | | Currently, we re-visit allocas when something changes about the way they might be *split* to allow better scalarization to take place. However, we weren't handling the case when the *promotion* is what would change the behavior of SROA. When an address derived from an alloca is stored into another alloca, we consider the first to have escaped. If the second is ever promoted to an SSA value, we will suddenly be able to run the SROA pass on the first alloca. This patch adds explicit support for this form if iteration. When we detect a store of a pointer derived from an alloca, we flag the underlying alloca for reprocessing after promotion. The logic works hard to only do this when there is definitely going to be promotion and it might remove impediments to the analysis of the alloca. Thanks to Nick for the great test case and Benjamin for some sanity check review. llvm-svn: 165223
* Teach the integer-promotion rewrite strategy to be endianness aware.Chandler Carruth2012-10-046-5/+113
| | | | | | | | | | | | | | | | | | | | | | | Sorry for this being broken so long. =/ As part of this, switch all of the existing tests to be Little Endian, which is the behavior I was asserting in them anyways! Add in a new big-endian test that checks the interesting behavior there. Another part of this is to tighten the rules abotu when we perform the full-integer promotion. This logic now rejects cases where there fully promoted integer is a non-multiple-of-8 bitwidth or cases where the loads or stores touch bits which are in the allocated space of the alloca but are not loaded or stored when accessing the integer. Sadly, these aren't really observable today as the rest of the pass will already ensure the invariants hold. However, the latter situation is likely to become a potential concern in the future. Thanks to Benjamin and Duncan for early review of this patch. I'm still looking into whether there are further endianness issues, please let me know if anyone sees BE failures persisting past this. llvm-svn: 165219
* Fix an issue where we failed to adjust the alignment constraint onChandler Carruth2012-10-031-0/+31
| | | | | | | | a memcpy to reflect that '0' has a different meaning when applied to a load or store. Now we correctly use underaligned loads and stores for the test case added. llvm-svn: 165101
* Try to use a better set of abstractions for computing the alignmentChandler Carruth2012-10-031-4/+59
| | | | | | | | | | | | | | | | | | | | necessary during rewriting. As part of this, fix a real think-o here where we might have left off an alignment specification when the address is in fact underaligned. I haven't come up with any way to trigger this, as there is always some other factor that reduces the alignment, but it certainly might have been an observable bug in some way I can't think of. This also slightly changes the strategy for placing explicit alignments on loads and stores to only do so when the alignment does not match that required by the ABI. This causes a few redundant alignments to go away from test cases. I've also added a couple of tests that really push on the alignment that we end up with on loads and stores. More to come here as I try to fix an underlying bug I have conjectured and produced test cases for, although it's not clear if this bug is the one currently hitting dragonegg's gcc47 bootstrap. llvm-svn: 165100
* Teach the new SROA to handle cases where an alloca that has already beenChandler Carruth2012-10-021-0/+29
| | | | | | | | | | | | | | | | scheduled for processing on the worklist eventually gets deleted while we are processing another alloca, fixing the original test case in PR13990. To facilitate this, add a remove_if helper to the SetVector abstraction. It's not easy to use the standard abstractions for this because of the specifics of SetVectors types and implementation. Finally, a nice small test case is included. Thanks to Benjamin for the fantastic reduced test case here! All I had to do was delete some empty basic blocks! llvm-svn: 165065
* Fix more misspellings found by Duncan during review.Chandler Carruth2012-10-011-2/+2
| | | | llvm-svn: 164940
* Fix several issues with alignment. We weren't always accounting for typeChandler Carruth2012-10-011-0/+31
| | | | | | | | | | alignment requirements of the new alloca. As one consequence which was reported as a bug by Duncan, we overaligned memcpy calls to ranges of allocas after they were rewritten to types with lower alignment requirements. Other consquences are possible, but I don't have any test cases for them. llvm-svn: 164937
* Refactor the PartitionUse structure to actually use the Use* instead ofChandler Carruth2012-10-011-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a pair of instructions, one for the used pointer and the second for the user. This simplifies the representation and also makes it more dense. This was noticed because of the miscompile in PR13926. In that case, we were running up against a fundamental "bad idea" in the speculation of PHI and select instructions: the speculation and rewriting are interleaved, which requires phi speculation to also perform load rewriting! This is bad, and causes us to miss opportunities to do (for example) vector rewriting only exposed after PHI speculation, etc etc. It also, in the old system, required us to insert *new* load uses into the current partition's use list, which would then be ignored during rewriting because we had already extracted an end iterator for the use list. The appending behavior (and much of the other oddities) stem from the strange de-duplication strategy in the PartitionUse builder. Amusingly, all this went without notice for so long because it could only be triggered by having *different* GEPs into the same partition of the same alloca, where both different GEPs were operands of a single PHI, and where the GEP which was not encountered first also had multiple uses within that same PHI node... Hence the insane steps required to reproduce. So, step one in fixing this fundamental bad idea is to make the PartitionUse actually contain a Use*, and to make the builder do proper deduplication instead of funky de-duplication. This is enough to remove the appending behavior, and fix the miscompile in PR13926, but there is more work to be done here. Subsequent commits will lift the speculation into its own visitor. It'll be a useful step toward potentially extracting all of the speculation logic into a generic utility transform. The existing PHI test case for repeated operands has been made more extreme to catch even these issues. This test case, run through the old pass, will exactly reproduce the miscompile from PR13926. ;] We were so close here! llvm-svn: 164925
* Fix a somewhat surprising miscompile where code relying on an ABIChandler Carruth2012-09-291-1/+22
| | | | | | | | | | | | | | | alignment could lose it due to the alloca type moving down to a much smaller alignment guarantee. Now SROA will actively compute a proper alignment, factoring the target data, any explicit alignment, and the offset within the struct. This will in some cases lower the alignment requirements, but when we lower them below those of the type, we drop the alignment entirely to give freedom to the code generator to align it however is convenient. Thanks to Duncan for the lovely test case that pinned this down. =] llvm-svn: 164891
* When rewriting the pointer operand to a load or store which hasChandler Carruth2012-09-261-0/+18
| | | | | | | alignment guarantees attached, re-compute the alignment so that we consider offsets which impact alignment. llvm-svn: 164690
OpenPOWER on IntegriCloud