summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Reapply r186316 with a fix for one bug where the code could walk off theChandler Carruth2013-07-181-1255/+976
| | | | | | | | | | | | end of a vector. This was found with ASan. I've had one other report of a crasher, but thus far been unable to reproduce the crash. It may well be fixed with this version, and if not I'd like to get more information from the build bots about what is happening. See r186316 for the full commit log for the new implementation of the SROA algorithm. llvm-svn: 186565
* SLPVectorizer: Speedup isConsecutive (that checks if two addresses are ↵Nadav Rotem2013-07-181-12/+31
| | | | | | consecutive in memory) by checking for additional patterns that don't need to go through SCEV. llvm-svn: 186563
* Add comparison operators for DIDescriptors to fix c++98 falloutEric Christopher2013-07-171-1/+1
| | | | | | | | of operator bool change. Also convert a variable in DebugIR. llvm-svn: 186544
* Fix a comment.Nadav Rotem2013-07-171-1/+1
| | | | llvm-svn: 186541
* Restore r181216, which was partially reverted in r182499.Stephen Lin2013-07-172-43/+29
| | | | llvm-svn: 186533
* Add a micro optimization to catch cases where the PtrA equals PtrB.Nadav Rotem2013-07-171-1/+1
| | | | llvm-svn: 186531
* Fix comparisons of alloca alignment in inliner mergingHal Finkel2013-07-171-3/+12
| | | | | | | | Duncan pointed out a mistake in my fix in r186425 when only one of the allocas being compared had the target-default alignment. This is essentially his suggested solution. Thanks! llvm-svn: 186510
* Mark a method 'const' and another 'static'.Craig Topper2013-07-171-2/+2
| | | | llvm-svn: 186485
* Make a few more static string pointers constant.Craig Topper2013-07-171-8/+8
| | | | llvm-svn: 186484
* SLPVectorizer: Accelerate the isConsecutive check by replacing the ↵Nadav Rotem2013-07-171-10/+5
| | | | | | subtraction of the two values with a simple SCEV expression that adds the offset to one of the pointers that we compare. llvm-svn: 186479
* flip the scev minus direction to simplify the code.Nadav Rotem2013-07-161-3/+3
| | | | llvm-svn: 186466
* SLPVectorizer: Improve the compile time of isConsecutive by adding a simple ↵Nadav Rotem2013-07-161-0/+18
| | | | | | | | constant-gep check before using SCEV. This check does not always work because not all of the GEPs use a constant offset, but it happens often enough to reduce the number of times we use SCEV. llvm-svn: 186465
* Add a wrapper for open.Rafael Espindola2013-07-161-1/+1
| | | | | | | This centralizes the handling of O_BINARY and opens the way for hiding more differences (like how open behaves with directories). llvm-svn: 186447
* Make SpecialCaseList match full strings, as documented, using anchors.Peter Collingbourne2013-07-161-1/+1
| | | | | | Differential Revision: http://llvm-reviews.chandlerc.com/D1149 llvm-svn: 186431
* When the inliner merges allocas, it must keep the larger alignmentHal Finkel2013-07-161-2/+16
| | | | | | | | | | | | For safety, the inliner cannot decrease the allignment on an alloca when merging it with another. I've included two variants of the test case for this: one with DataLayout available, and one without. When DataLayout is not available, if only one of the allocas uses the default alignment (getAlignment() == 0), then they cannot be safely merged. llvm-svn: 186425
* SLPVectorizer: Reduce the compile time of the consecutive store lookup.Nadav Rotem2013-07-161-5/+13
| | | | | | Process groups of stores in chunks of 16. llvm-svn: 186420
* Add 'const' qualifiers to static const char* variables.Craig Topper2013-07-162-20/+21
| | | | llvm-svn: 186371
* PR16628: Fix a bug in the code that merges compares.Nadav Rotem2013-07-151-1/+3
| | | | | | Compares return i1 but they compare different types. llvm-svn: 186359
* Remove trailing whitespaceStephen Lin2013-07-151-36/+36
| | | | llvm-svn: 186333
* Revert r186316 while I track down an ASan failure and an assert fromChandler Carruth2013-07-151-972/+1255
| | | | | | | | | | | a bot. This reverts the commit which introduced a new implementation of the fancy SROA pass designed to reduce its overhead. I'll skip the huge commit log here, refer to r186316 if you're looking for how this all works and why it works that way. llvm-svn: 186332
* Reimplement SROA yet again. Same fundamental principle, but a totallyChandler Carruth2013-07-151-1255/+972
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | different core implementation strategy. Previously, SROA would build a relatively elaborate partitioning of an alloca, associate uses with each partition, and then rewrite the uses of each partition in an attempt to break apart the alloca into chunks that could be promoted. This was very wasteful in terms of memory and compile time because regardless of how complex the alloca or how much we're able to do in breaking it up, all of the datastructure work to analyze the partitioning was done up front. The new implementation attempts to form partitions of the alloca lazily and on the fly, rewriting the uses that make up that partition as it goes. This has a few significant effects: 1) Much simpler data structures are used throughout. 2) No more double walk of the recursive use graph of the alloca, only walk it once. 3) No more complex algorithms for associating a particular use with a particular partition. 4) PHI and Select speculation is simplified and happens lazily. 5) More precise information is available about a specific use of the alloca, removing the need for some side datastructures. Ultimately, I think this is a much better implementation. It removes about 300 lines of code, but arguably removes more like 500 considering that some code grew in the process of being factored apart and cleaned up for this all to work. I've re-used as much of the old implementation as possible, which includes the lion's share of code in the form of the rewriting logic. The interesting new logic centers around how the uses of a partition are sorted, and split into actual partitions. Each instruction using a pointer derived from the alloca gets a 'Partition' entry. This name is totally wrong, but I'll do a rename in a follow-up commit as there is already enough churn here. The entry describes the offset range accessed and the nature of the access. Once we have all of these entries we sort them in a very specific way: increasing order of begin offset, followed by whether they are splittable uses (memcpy, etc), followed by the end offset or whatever. Sorting by splittability is important as it simplifies the collection of uses into a partition. Once we have these uses sorted, we walk from the beginning to the end building up a range of uses that form a partition of the alloca. Overlapping unsplittable uses are merged into a single partition while splittable uses are broken apart and carried from one partition to the next. A partition is also introduced to bridge splittable uses between the unsplittable regions when necessary. I've looked at the performance PRs fairly closely. PR15471 no longer will even load (the module is invalid). Not sure what is up there. PR15412 improves by between 5% and 10%, however it is nearly impossible to know what is holding it up as SROA (the entire pass) takes less time than reading the IR for that test case. The analysis takes the same time as running mem2reg on the final allocas. I suspect (without much evidence) that the new implementation will scale much better however, and it is just the small nature of the test cases that makes the changes small and noisy. Either way, it is still simpler and cleaner I think. llvm-svn: 186316
* Add 'const' qualifier to some arrays.Craig Topper2013-07-151-1/+1
| | | | llvm-svn: 186312
* Use llvm::array_lengthof to replace sizeof(array)/sizeof(array[0]).Craig Topper2013-07-151-1/+2
| | | | llvm-svn: 186301
* SLPVectorizer: change the order in which we search for vectorization ↵Nadav Rotem2013-07-141-4/+4
| | | | | | candidates. Do stores first and PHIs second. llvm-svn: 186277
* Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector ↵Craig Topper2013-07-149-52/+57
| | | | | | size. llvm-svn: 186274
* LoopVectorizer: Disallow reductions whose header phi is used outside the loopArnold Schwaighofer2013-07-131-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an outside loop user of the reduction value uses the header phi node we cannot just reduce the vectorized phi value in the vector code epilog because we would loose VF-1 reductions. lp: p = phi (0, lv) lv = lv + 1 ... brcond , lp, outside outside: usr = add 0, p (Say the loop iterates two times, the value of p coming out of the loop is one). We cannot just transform this to: vlp: p = phi (<0,0>, lv) lv = lv + <1,1> .. brcond , lp, outside outside: p_reduced = p[0] + [1]; usr = add 0, p_reduced (Because the original loop iterated two times the vectorized loop would iterate one time, but p_reduced ends up being zero instead of one). We would have to execute VF-1 iterations in the scalar remainder loop in such cases. For now, just disable vectorization. PR16522 llvm-svn: 186256
* LoopVectorize fix: LoopInfo must be valid when invoking utils like SCEVExpander.Andrew Trick2013-07-131-18/+18
| | | | | | | | | | | In general, one should always complete CFG modifications first, update CFG-based analyses, like Dominatores and LoopInfo, then generate instruction sequences. LoopVectorizer was creating a new loop, calling SCEVExpander to generate checks, then updating LoopInfo. I just changed the order. llvm-svn: 186241
* Add a microoptimization for urem.Nick Lewycky2013-07-131-0/+7
| | | | llvm-svn: 186235
* Fix a crash in EvaluateInDifferentElementOrder where it would generate anJoey Gouly2013-07-121-1/+3
| | | | | | | | undef vector of the wrong type. LGTM'd by Nick Lewycky on IRC. llvm-svn: 186224
* LFTR improvement to avoid truncation.Andrew Trick2013-07-121-6/+32
| | | | | | This is a reimplemntation of the patch originally in r186107. llvm-svn: 186215
* Cleanup LFTR logic.Andrew Trick2013-07-121-28/+9
| | | | llvm-svn: 186214
* Cleanup: rename a variable to make the logic easier to follow.Andrew Trick2013-07-121-7/+7
| | | | llvm-svn: 186213
* TargetTransformInfo: address calculation parameter for gather/scatherArnold Schwaighofer2013-07-121-1/+56
| | | | | | | | | | | Address calculation for gather/scather in vectorized code can incur a significant cost making vectorization unbeneficial. Add infrastructure to add cost. Tests and cost model for targets will be in follow-up commits. radar://14351991 llvm-svn: 186187
* Revert "indvars: Improve LFTR by eliminating truncation when comparingChandler Carruth2013-07-121-23/+4
| | | | | | | | | | | | | | | | | | | against a constant." This reverts commit r186107. It didn't handle wrapping arithmetic in the loop correctly and thus caused the following C program to count from 0 to UINT64_MAX instead of from 0 to 255 as intended: #include <stdio.h> int main() { unsigned char first = 0, last = 255; do { printf("%d\n", first); } while (first++ != last); } Full test case and instructions to reproduce with just the -indvars pass sent to the original review thread rather than to r186107's commit. llvm-svn: 186152
* SLPVectorizer: Sink and enable CSE for ExtractElements.Nadav Rotem2013-07-121-11/+25
| | | | llvm-svn: 186145
* SLPVectorize: Replace the code that checks for vectorization candidates in ↵Nadav Rotem2013-07-121-25/+22
| | | | | | | | successor blocks with code that scans PHINodes. Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler. llvm-svn: 186139
* Remove an argument that we dont use anymore.Nadav Rotem2013-07-111-15/+12
| | | | llvm-svn: 186116
* indvars: Improve LFTR by eliminating truncation when comparing against a ↵Andrew Trick2013-07-111-4/+23
| | | | | | | | | | | | | | | | | constant. Patch by Michele Scandale! Adds a special handling of the case where, during the loop exit condition rewriting, the exit value is a constant of bitwidth lower than the type of the induction variable: instead of introducing a trunc operation in order to match correctly the operand types, it allows to convert the constant value to an equivalent constant, depending on the initial value of the induction variable and the trip count, in order have an equivalent comparison between the induction variable and the new constant. llvm-svn: 186107
* Don't use a potentially expensive shift if all we want is one set bit.Benjamin Kramer2013-07-112-2/+2
| | | | | | No functionality change. llvm-svn: 186095
* LoopVectorize: Vectorize all accesses in address space zero with unit strideArnold Schwaighofer2013-07-111-8/+16
| | | | | | | | | | | We can vectorize them because in the case where we wrap in the address space the unvectorized code would have had to access a pointer value of zero which is undefined behavior in address space zero according to the LLVM IR semantics. (Thank you Duncan, for pointing this out to me). Fixes PR16592. llvm-svn: 186088
* TryToSimplifyUncondBranchFromEmptyBlock was checking that any commonDuncan Sands2013-07-111-23/+147
| | | | | | | | | | predecessors of the two blocks it is attempting to merge supply the same incoming values to any phi in the successor block. This change allows merging in the case where there is one or more incoming values that are undef. The undef values are rewritten to match the non-undef value that flows from the other edge. Patch by Mark Lacey. llvm-svn: 186069
* Fix a warning.Nadav Rotem2013-07-111-2/+1
| | | | llvm-svn: 186064
* SLPVectorizer: refactor the code that places extracts. Place the code that ↵Nadav Rotem2013-07-111-41/+131
| | | | | | decides where to put extracts in the build-tree phase. This allows us to take the cost of the extracts into account. llvm-svn: 186058
* Teach TailRecursionElimination to handle certain cases of nocapture escaping ↵Michael Gottesman2013-07-111-64/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | allocas. Without the changes introduced into this patch, if TRE saw any allocas at all, TRE would not perform TRE *or* mark callsites with the tail marker. Because TRE runs after mem2reg, this inadequacy is not a death sentence. But given a callsite A without escaping alloca argument, A may not be able to have the tail marker placed on it due to a separate callsite B having a write-back parameter passed in via an argument with the nocapture attribute. Assume that B is the only other callsite besides A and B only has nocapture escaping alloca arguments (*NOTE* B may have other arguments that are not passed allocas). In this case not marking A with the tail marker is unnecessarily conservative since: 1. By assumption A has no escaping alloca arguments itself so it can not access the caller's stack via its arguments. 2. Since all of B's escaping alloca arguments are passed as parameters with the nocapture attribute, we know that B does not stash said escaping allocas in a manner that outlives B itself and thus could be accessed indirectly by A. With the changes introduced by this patch: 1. If we see any escaping allocas passed as a capturing argument, we do nothing and bail early. 2. If we do not see any escaping allocas passed as captured arguments but we do see escaping allocas passed as nocapture arguments: i. We do not perform TRE to avoid PR962 since the code generator produces significantly worse code for the dynamic allocas that would be created by the TRE algorithm. ii. If we do not return twice, mark call sites without escaping allocas with the tail marker. *NOTE* This excludes functions with escaping nocapture allocas. 3. If we do not see any escaping allocas at all (whether captured or not): i. If we do not have usage of setjmp, mark all callsites with the tail marker. ii. If there are no dynamic/variable sized allocas in the function, attempt to perform TRE on all callsites in the function. Based off of a patch by Nick Lewycky. rdar://14324281. llvm-svn: 186057
* [objc-arc] Changed 'mode: c++' => 'C++' at Nick Lewycky's suggestion. Also ↵Michael Gottesman2013-07-107-7/+7
| | | | | | removed unnecessary mode: c++ lines from .cpp files. llvm-svn: 186026
* Implement categories for special case lists.Peter Collingbourne2013-07-092-27/+93
| | | | | | | | | | | | | | | | | | | | | | | | A special case list can now specify categories for specific globals, which can be used to instruct an instrumentation pass to treat certain functions or global variables in a specific way, such as by omitting certain aspects of instrumentation while keeping others, or informing the instrumentation pass that a specific uninstrumentable function has certain semantics, thus allowing the pass to instrument callers according to those semantics. For example, AddressSanitizer now uses the "init" category instead of global-init prefixes for globals whose initializers should not be instrumented, but which in all other respects should be instrumented. The motivating use case is DataFlowSanitizer, which will have a number of different categories for uninstrumentable functions, such as "functional" which specifies that a function has pure functional semantics, or "discard" which indicates that a function's return value should not be labelled. Differential Revision: http://llvm-reviews.chandlerc.com/D1092 llvm-svn: 185978
* Introduce a SpecialCaseList ctor which takes a MemoryBuffer to makePeter Collingbourne2013-07-091-1/+9
| | | | | | | | it more unit testable, and fix memory leak in the other ctor. Differential Revision: http://llvm-reviews.chandlerc.com/D1090 llvm-svn: 185976
* Rename BlackList class to SpecialCaseList and move it to Transforms/Utils.Peter Collingbourne2013-07-096-21/+21
| | | | | | Differential Revision: http://llvm-reviews.chandlerc.com/D1089 llvm-svn: 185975
* Fix PR16571, which is a bug in the code that checks that all of the types in ↵Nadav Rotem2013-07-091-1/+3
| | | | | | the bundle are uniform. llvm-svn: 185970
* Set the default insert point to the first instruction, and not to end()Nadav Rotem2013-07-091-1/+1
| | | | llvm-svn: 185953
OpenPOWER on IntegriCloud