summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Scalar
Commit message (Collapse)AuthorAgeFilesLines
* Fix PR16687 where we were incorrectly promoting an alloca that hadChandler Carruth2013-07-241-12/+29
| | | | | | | | | | | | | | | | | | | | | | pending speculation for a phi node. The problem here is that we were using growth of the specluation set as an indicator of whether speculation would occur, and if the phi node is already in the set we don't see it grow. This is a symptom of the fact that this signal is a total hack. Unfortunately, I couldn't really come up with a non-hacky way of signaling that promotion remains valid *after* speculation occurs, such that we only speculate when all else looks good for promotion. In the end, I went with at least a much more explicit approach of doing the work of queuing inside the phi and select processing and setting a preposterously named flag to convey that we're in the special state of requiring speculating before promotion. Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing a testcase for this from a pretty giant, nasty assert in a big application. =] The testcase was excellent. llvm-svn: 187029
* Remove extraneous null statement. No functionality change!Nick Lewycky2013-07-221-1/+1
| | | | llvm-svn: 186893
* Use switch instead of if. No functionality change.Jakub Staszak2013-07-221-14/+17
| | | | llvm-svn: 186892
* OldPtr is llvm::Instruction. Remove unneeded cast<>.Jakub Staszak2013-07-221-1/+1
| | | | llvm-svn: 186880
* Change tabs to spaces.Jakub Staszak2013-07-221-2/+2
| | | | llvm-svn: 186877
* Fix spelling and grammarMatt Arsenault2013-07-221-12/+12
| | | | llvm-svn: 186858
* SROA: Microoptimization: Remove dead entries first, then sort.Benjamin Kramer2013-07-201-9/+4
| | | | | | While there replace an explicit struct with std::mem_fun. llvm-svn: 186761
* Cleanup the stats counters for the new implementation. These actuallyChandler Carruth2013-07-191-12/+36
| | | | | | count the right things and have the right names. llvm-svn: 186667
* Fix another assert failure very similar to PR16651's test case. ThisChandler Carruth2013-07-191-0/+2
| | | | | | | test case came from Benjamin and found the parallel bug in the vector promotion code. llvm-svn: 186666
* Try to move to a more reasonable set of naming conventions given the newChandler Carruth2013-07-191-322/+305
| | | | | | | | | | | | | | | | implementation of the SROA algorithm. We were using the term 'partition' in many places that no longer ever represented an actual partition, but rather just an arbitrary slice of an alloca. No functionality change intended here. Mostly just renaming of types, functions, variables, and rewording of comments. Several comments were rewritten to make a lot more sense in the new structure of things. The stats are still weird and not reflective of how this really works. I'll fix those up in a separate patch as it is a touch more semantic of a change... llvm-svn: 186659
* A long overdue cleanup in SROA to use 'DL' instead of 'TD' for theChandler Carruth2013-07-191-123/+123
| | | | | | DataLayout variables. llvm-svn: 186656
* Fix PR16651, an assert introduced in my recent re-work of the innards ofChandler Carruth2013-07-191-9/+3
| | | | | | | | | | | | | | SROA. The crux of the issue is that now we track uses of a partition of the alloca in two places: the iterators over the partitioning uses and the previously collected split uses vector. We weren't accounting for the fact that the split uses might invalidate integer widening in ways other than due to their width (in this case due to being volatile). Further reduced testcase added to the tests. llvm-svn: 186655
* Reapply r186316 with a fix for one bug where the code could walk off theChandler Carruth2013-07-181-1255/+976
| | | | | | | | | | | | end of a vector. This was found with ASan. I've had one other report of a crasher, but thus far been unable to reproduce the crash. It may well be fixed with this version, and if not I'd like to get more information from the build bots about what is happening. See r186316 for the full commit log for the new implementation of the SROA algorithm. llvm-svn: 186565
* Add 'const' qualifiers to static const char* variables.Craig Topper2013-07-161-1/+1
| | | | llvm-svn: 186371
* Remove trailing whitespaceStephen Lin2013-07-151-36/+36
| | | | llvm-svn: 186333
* Revert r186316 while I track down an ASan failure and an assert fromChandler Carruth2013-07-151-972/+1255
| | | | | | | | | | | a bot. This reverts the commit which introduced a new implementation of the fancy SROA pass designed to reduce its overhead. I'll skip the huge commit log here, refer to r186316 if you're looking for how this all works and why it works that way. llvm-svn: 186332
* Reimplement SROA yet again. Same fundamental principle, but a totallyChandler Carruth2013-07-151-1255/+972
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | different core implementation strategy. Previously, SROA would build a relatively elaborate partitioning of an alloca, associate uses with each partition, and then rewrite the uses of each partition in an attempt to break apart the alloca into chunks that could be promoted. This was very wasteful in terms of memory and compile time because regardless of how complex the alloca or how much we're able to do in breaking it up, all of the datastructure work to analyze the partitioning was done up front. The new implementation attempts to form partitions of the alloca lazily and on the fly, rewriting the uses that make up that partition as it goes. This has a few significant effects: 1) Much simpler data structures are used throughout. 2) No more double walk of the recursive use graph of the alloca, only walk it once. 3) No more complex algorithms for associating a particular use with a particular partition. 4) PHI and Select speculation is simplified and happens lazily. 5) More precise information is available about a specific use of the alloca, removing the need for some side datastructures. Ultimately, I think this is a much better implementation. It removes about 300 lines of code, but arguably removes more like 500 considering that some code grew in the process of being factored apart and cleaned up for this all to work. I've re-used as much of the old implementation as possible, which includes the lion's share of code in the form of the rewriting logic. The interesting new logic centers around how the uses of a partition are sorted, and split into actual partitions. Each instruction using a pointer derived from the alloca gets a 'Partition' entry. This name is totally wrong, but I'll do a rename in a follow-up commit as there is already enough churn here. The entry describes the offset range accessed and the nature of the access. Once we have all of these entries we sort them in a very specific way: increasing order of begin offset, followed by whether they are splittable uses (memcpy, etc), followed by the end offset or whatever. Sorting by splittability is important as it simplifies the collection of uses into a partition. Once we have these uses sorted, we walk from the beginning to the end building up a range of uses that form a partition of the alloca. Overlapping unsplittable uses are merged into a single partition while splittable uses are broken apart and carried from one partition to the next. A partition is also introduced to bridge splittable uses between the unsplittable regions when necessary. I've looked at the performance PRs fairly closely. PR15471 no longer will even load (the module is invalid). Not sure what is up there. PR15412 improves by between 5% and 10%, however it is nearly impossible to know what is holding it up as SROA (the entire pass) takes less time than reading the IR for that test case. The analysis takes the same time as running mem2reg on the final allocas. I suspect (without much evidence) that the new implementation will scale much better however, and it is just the small nature of the test cases that makes the changes small and noisy. Either way, it is still simpler and cleaner I think. llvm-svn: 186316
* Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector ↵Craig Topper2013-07-145-35/+39
| | | | | | size. llvm-svn: 186274
* LFTR improvement to avoid truncation.Andrew Trick2013-07-121-6/+32
| | | | | | This is a reimplemntation of the patch originally in r186107. llvm-svn: 186215
* Cleanup LFTR logic.Andrew Trick2013-07-121-28/+9
| | | | llvm-svn: 186214
* Cleanup: rename a variable to make the logic easier to follow.Andrew Trick2013-07-121-7/+7
| | | | llvm-svn: 186213
* Revert "indvars: Improve LFTR by eliminating truncation when comparingChandler Carruth2013-07-121-23/+4
| | | | | | | | | | | | | | | | | | | against a constant." This reverts commit r186107. It didn't handle wrapping arithmetic in the loop correctly and thus caused the following C program to count from 0 to UINT64_MAX instead of from 0 to 255 as intended: #include <stdio.h> int main() { unsigned char first = 0, last = 255; do { printf("%d\n", first); } while (first++ != last); } Full test case and instructions to reproduce with just the -indvars pass sent to the original review thread rather than to r186107's commit. llvm-svn: 186152
* indvars: Improve LFTR by eliminating truncation when comparing against a ↵Andrew Trick2013-07-111-4/+23
| | | | | | | | | | | | | | | | | constant. Patch by Michele Scandale! Adds a special handling of the case where, during the loop exit condition rewriting, the exit value is a constant of bitwidth lower than the type of the induction variable: instead of introducing a trunc operation in order to match correctly the operand types, it allows to convert the constant value to an equivalent constant, depending on the initial value of the induction variable and the trip count, in order have an equivalent comparison between the induction variable and the new constant. llvm-svn: 186107
* Teach TailRecursionElimination to handle certain cases of nocapture escaping ↵Michael Gottesman2013-07-111-64/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | allocas. Without the changes introduced into this patch, if TRE saw any allocas at all, TRE would not perform TRE *or* mark callsites with the tail marker. Because TRE runs after mem2reg, this inadequacy is not a death sentence. But given a callsite A without escaping alloca argument, A may not be able to have the tail marker placed on it due to a separate callsite B having a write-back parameter passed in via an argument with the nocapture attribute. Assume that B is the only other callsite besides A and B only has nocapture escaping alloca arguments (*NOTE* B may have other arguments that are not passed allocas). In this case not marking A with the tail marker is unnecessarily conservative since: 1. By assumption A has no escaping alloca arguments itself so it can not access the caller's stack via its arguments. 2. Since all of B's escaping alloca arguments are passed as parameters with the nocapture attribute, we know that B does not stash said escaping allocas in a manner that outlives B itself and thus could be accessed indirectly by A. With the changes introduced by this patch: 1. If we see any escaping allocas passed as a capturing argument, we do nothing and bail early. 2. If we do not see any escaping allocas passed as captured arguments but we do see escaping allocas passed as nocapture arguments: i. We do not perform TRE to avoid PR962 since the code generator produces significantly worse code for the dynamic allocas that would be created by the TRE algorithm. ii. If we do not return twice, mark call sites without escaping allocas with the tail marker. *NOTE* This excludes functions with escaping nocapture allocas. 3. If we do not see any escaping allocas at all (whether captured or not): i. If we do not have usage of setjmp, mark all callsites with the tail marker. ii. If there are no dynamic/variable sized allocas in the function, attempt to perform TRE on all callsites in the function. Based off of a patch by Nick Lewycky. rdar://14324281. llvm-svn: 186057
* Reassociate: Remove unnecessary default operator=.Benjamin Kramer2013-07-061-10/+0
| | | | llvm-svn: 185757
* Remove a useless declarations (found by scan-build)Sylvestre Ledru2013-07-051-1/+0
| | | | llvm-svn: 185709
* Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid ↵Craig Topper2013-07-043-4/+4
| | | | | | specifying the vector size. llvm-svn: 185606
* Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid ↵Craig Topper2013-07-033-8/+8
| | | | | | specifying the vector size. llvm-svn: 185540
* dbgs() << Instruction doesn't print a newline on the end any more. Update theseNick Lewycky2013-06-261-5/+5
| | | | | | | debug statements to add a missing newline. Also canonicalize to '\n' instead of "\n"; the latter calls a function with a loop the former does not. llvm-svn: 184897
* Fix SROA to avoid unnecessary scalar conversions for 1-element vectors.Bob Wilson2013-06-251-15/+16
| | | | | | | | | | | When a 1-element vector alloca is promoted, a store instruction can often be rewritten without converting the value to a scalar and using an insertelement instruction to stuff it into the new alloca. This patch just adds a check to skip that conversion when it is unnecessary. This turns out to be really important for some ARM Neon operations where <1 x i64> is used to get around the fact that i64 is not a legal type. llvm-svn: 184870
* Remove the simplify-libcalls pass (finally)Meador Inge2013-06-203-250/+1
| | | | | | | | | | | This commit completely removes what is left of the simplify-libcalls pass. All of the functionality has now been migrated to the instcombine and functionattrs passes. The following C API functions are now NOPs: 1. LLVMAddSimplifyLibCallsPass 2. LLVMPassManagerBuilderSetDisableSimplifyLibCalls llvm-svn: 184459
* Access the TargetLoweringInfo from the TargetMachine object instead of ↵Bill Wendling2013-06-192-11/+13
| | | | | | caching it. The TLI may change between functions. No functionality change. llvm-svn: 184352
* Move StructurizeCFG out of R600 to generic Transforms.Matt Arsenault2013-06-193-0/+881
| | | | | | Register it with PassManager llvm-svn: 184343
* LSR: Fix the parameters used to compute the scaling factor cost.Quentin Colombet2013-06-191-5/+13
| | | | | | | | | | | | | Prior to this change, the considered addressing modes may be invalid since the maximum and minimum offsets were not taking into account. This was causing an assertion failure. The added test case exercices that behavior. <rdar://problem/14199725> Assertion failed: (CurScaleCost >= 0 && "Legal addressing mode has an illegal cost!") llvm-svn: 184341
* Use 0 instead of NULL.Jakub Staszak2013-06-151-5/+5
| | | | llvm-svn: 184044
* Fix a potential bug in r183584.Shuxin Yang2013-06-081-4/+8
| | | | | | | | | | | | | r183584 tries to derive some info from the code *AFTER* a call and apply these derived info to the code *BEFORE* the call, which is not always safe as the call in question may never return, and in this case, the derived info is invalid. Thank Duncan for pointing out this potential bug. rdar://14073661 llvm-svn: 183606
* Fix an assertion in MemCpyOpt pass.Shuxin Yang2013-06-071-2/+4
| | | | | | | | | | | | | | | | | | | | | The MemCpyOpt pass is capable of optimizing: callee(&S); copy N bytes from S to D. into: callee(&D); subject to some legality constraints. Assertion is triggered when the compiler tries to evalute "sizeof(typeof(D))", while D is an opaque-typed, 'sret' formal argument of function being compiled. i.e. the signature of the func being compiled is something like this: T caller(...,%opaque* noalias nocapture sret %D, ...) The fix is that when come across such situation, instead of calling some utility functions to get the size of D's type (which will crash), we simply assume D has at least N bytes as implified by the copy-instruction. rdar://14073661 llvm-svn: 183584
* IndVarSimplify: check if loop invariant expansion can trapDavid Majnemer2013-06-041-1/+1
| | | | | | | | | | | | | | | IndVarSimplify is willing to move divide instructions outside of their loop bodies if they are invariant of the loop. However, it may not be safe to expand them if we do not know if they can trap. Instead, check to see if it is not safe to expand the instruction and skip the expansion. This fixes PR16041. Testcase by Rafael Ávila de Espíndola. llvm-svn: 183239
* Loop Strength Reduce: Scaling factor cost.Quentin Colombet2013-05-311-3/+43
| | | | | | | | | | | | Account for the cost of scaling factor in Loop Strength Reduce when rating the formulae. This uses a target hook. The default implementation of the hook is: if the addressing mode is legal, the scaling factor is free. <rdar://problem/13806271> llvm-svn: 183045
* Modify how the formulae are rated in Loop Strength Reduce.Quentin Colombet2013-05-311-6/+45
| | | | | | | | | | | | | | Namely, check if the target allows to fold more that one register in the addressing mode and if yes, adjust the cost accordingly. Prior to this commit, reg1 + scale * reg2 accesses were artificially preferred to reg1 + reg2 accesses. Indeed, the cost model wrongly assumed that reg1 + reg2 needs a temporary register for the computation, whereas it was correctly estimated for reg1 + scale * reg2. <rdar://problem/13973908> llvm-svn: 183021
* Replace Count{Leading,Trailing}Zeros_{32,64} with count{Leading,Trailing}Zeros.Michael J. Spencer2013-05-241-1/+1
| | | | llvm-svn: 182680
* [GVN] Split critical-edge on the fly, instead of postpone edge-splitting to nextShuxin Yang2013-05-091-13/+39
| | | | | | | | | | iteration. This on step toward non-iterative GVN. My local hack suggests that getting rid of iteration will speedup GVN by 30%+ on a medium sized input (2k LOC, C++). I cannot explain why not 2x or more at this moment. llvm-svn: 181532
* Fix a bug in codegenprep where it was losing track of values OptimizeMemoryInstNick Lewycky2013-05-081-5/+2
| | | | | | by switching to a ValueMap. Patch by Andrea DiBiagio! llvm-svn: 181397
* Rotate multi-exit loops even if the latch was simplified.Andrew Trick2013-05-061-14/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test case by Michele Scandale! Fixes PR10293: Load not hoisted out of loop with multiple exits. There are few regressions with this patch, now tracked by rdar:13817079, and a roughly equal number of improvements. The regressions are almost certainly back luck because LoopRotate has very little idea of whether rotation is profitable. Doing better requires a more comprehensive solution. This checkin is a quick fix that lacks generality (PR10293 has a counter-example). But it trivially fixes the case in PR10293 without interfering with other cases, and it does satify the criteria that LoopRotate is a loop canonicalization pass that should avoid heuristics and special cases. I can think of two approaches that would probably be better in the long run. Ultimately they may both make sense. (1) LoopRotate should check that the current header would make a good loop guard, and that the loop does not already has a sufficient guard. The artifical SimplifiedLoopLatch check would be unnecessary, and the design would be more general and canonical. Two difficulties: - We need a strong guarantee that we won't endlessly rotate, so the analysis would need to be precise in order to avoid the SimplifiedLoopLatch precondition. - Analysis like this are usually based on SCEV, which we don't want to rely on. (2) Rotate on-demand in late loop passes. This could even be done by shoving the loop back on the queue after the optimization that needs it. This could work well when we find LICM opportunities in multi-branch loops. This requires some work, and it doesn't really solve the problem of SCEV wanting a loop guard before the analysis. llvm-svn: 181230
* Decompose GVN::processNonLocalLoad() (about 400 LOC) into smaller helper ↵Shuxin Yang2013-05-031-169/+194
| | | | | | | | | | | | | | functions. No function change. This function consists of following steps: 1. Collect dependent memory accesses. 2. Analyze availability. 3. Perform fully redundancy elimination, or 4. Perform PRE, depending on the availability Step 2, 3 and 4 are now moved to three helper routines. llvm-svn: 181047
* [GV] Remove dead code which is really difficult to decipher.Shuxin Yang2013-05-021-26/+2
| | | | | | | | | | | Actually it took me couple of hours trying to make sense of them and only to find they are dead code. I guess the original author used "allSingleSucc" to indicate if there are any critial edge emanating from some blocks, and tried to perform code motion (actually speculation) in the presence of these critical edges; but later on he/she changed mind and decided to perform edge-splitting first. llvm-svn: 180951
* This patch breaks up Wrap.h so that it does not have to include all of Filip Pizlo2013-05-011-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | the things, and renames it to CBindingWrapping.h. I also moved CBindingWrapping.h into Support/. This new file just contains the macros for defining different wrap/unwrap methods. The calls to those macros, as well as any custom wrap/unwrap definitions (like for array of Values for example), are put into corresponding C++ headers. Doing this required some #include surgery, since some .cpp files relied on the fact that including Wrap.h implicitly caused the inclusion of a bunch of other things. This also now means that the C++ headers will include their corresponding C API headers; for example Value.h must include llvm-c/Core.h. I think this is harmless, since the C API headers contain just external function declarations and some C types, so I don't believe there should be any nasty dependency issues here. llvm-svn: 180881
* SROA: Generate selects instead of shuffles when blending values because this ↵Nadav Rotem2013-05-011-8/+6
| | | | | | | | is the cannonical form. Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often. llvm-svn: 180875
* Fix a XOR reassociation bug. Shuxin Yang2013-04-271-3/+6
| | | | | | | | | | When Reassociator optimize "(x | C1)" ^ "(X & C2)", it may swap the two subexpressions, however, it forgot to swap cached constants (of C1 and C2) accordingly. rdar://13739160 llvm-svn: 180676
* Move C++ code out of the C headers and into either C++ headersEric Christopher2013-04-221-0/+1
| | | | | | | or the C++ files themselves. This enables people to use just a C compiler to interoperate with LLVM. llvm-svn: 180063
OpenPOWER on IntegriCloud