summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Scalar
Commit message (Collapse)AuthorAgeFilesLines
* [EarlyCSE] less else, more auto; NFCSanjay Patel2017-01-031-2/+2
| | | | llvm-svn: 290848
* NewGVN: Clean up after removing possibility of null expressions.Daniel Berlin2017-01-021-17/+14
| | | | llvm-svn: 290828
* [NewGVN] Fold single-use variable inside the assertion.Davide Italiano2017-01-021-5/+3
| | | | | | | It placates some bots which complain because they compile the assertion out and think the variable is unused. llvm-svn: 290825
* [NewGVN] Restore old code to placate buildbots.Davide Italiano2017-01-021-2/+6
| | | | | | | | | Apparently my suggestion of using ternary doesn't really work as clang complains about incompatible types on LHS and RHS. Some GCC versions happen to accept the code but clang behaviour is correct here. llvm-svn: 290822
* NewGVN: Fix some formatting and comment issuesDaniel Berlin2017-01-021-18/+8
| | | | llvm-svn: 290820
* NewGVN: Add UnknownExpression and create them for things we can't symbolize. ↵Daniel Berlin2017-01-021-19/+19
| | | | | | | | | | | | | | | | | Kill fragile machinery for handling null expressions. Summary: This avoids the very fragile code for null expressions. We could also use a denseset that tracks which things have null expressions instead, but that seems pretty fragile and premature optimization. This resolves a number of infinite loop cases, test reductions coming. Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28193 llvm-svn: 290816
* NewGVN: Fix PR31480, PR31483, PR31499, by rewriting how memory congruence ↵Daniel Berlin2017-01-021-20/+144
| | | | | | | | | | | | | | handling works. Summary: Previously, we tried to fix up the equivalences during symbolic evaluation. This does not work. Now, we change the equivalences during congruence finding, where it belongs. We also initialize the equivalence table to give a maximal answer. Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28192 llvm-svn: 290815
* [CVP] Adjust iteration order to reduce the amount of work requiredPhilip Reames2016-12-301-3/+8
| | | | | | | | CVP doesn't care about the order of blocks visited, but by using a pre-order traversal over the graph we can a) not visit unreachable blocks and b) optimize as we go so that analysis of later blocks produce slightly more precise results. I noticed this via inspection and don't have a concrete example which points to the issue. llvm-svn: 290760
* [NewGVN] Remove unneeded newline from assertion message.Davide Italiano2016-12-301-1/+1
| | | | llvm-svn: 290755
* [LICM] When promoting scalars, allow inserting stores to thread-local allocas.Michael Kuperstein2016-12-301-1/+2
| | | | | | | | | This is similar to the allocfn case - if an alloca is not captured, then it's necessarily thread-local. Differential Revision: https://reviews.llvm.org/D28170 llvm-svn: 290738
* Use continuous boosting factor for complete unroll.Dehao Chen2016-12-301-75/+32
| | | | | | | | | | | | | | | | | | | | Summary: The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously. The patch also simplified the code by reducing one parameter in UP. The patch only affects code-gen of two speccpu2006 benchmark: 445.gobmk binary size decreases 0.08%, no performance change. 464.h264ref binary size increases 0.24%, no performance change. Reviewers: mzolotukhin, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26989 llvm-svn: 290737
* [LICM] Remove unneeded tracking of whether changes were made. NFC.Michael Kuperstein2016-12-301-9/+7
| | | | | | | | "Changed" doesn't actually change within the loop, so there's no reason to keep track of it - we always return false during analysis and true after the transformation is made. llvm-svn: 290735
* [LICM] Make logic in promoteLoopAccessesToScalars easier to follow. NFC.Michael Kuperstein2016-12-301-40/+47
| | | | llvm-svn: 290734
* [LICM] Compute exit blocks for promotion eagerly. NFC.Michael Kuperstein2016-12-291-35/+36
| | | | | | | | | | | This moves the exit block and insertion point computation to be eager, instead of after seeing the first scalar we can promote. The cost is relatively small (the computation happens anyway, see discussion on D28147), and the code is easier to follow, and can bail out earlier if there's a catchswitch present. llvm-svn: 290729
* [LICM] Don't try to promote in loops where we have no chance to promote. NFC.Michael Kuperstein2016-12-291-10/+6
| | | | | | | | | | We would check whether we have a prehader *or* dedicated exit blocks, and go into the promotion loop. Then, for each alias set we'd check if we have a preheader *and* dedicated exit blocks, and bail if not. Instead, bail immediately if we don't have both. llvm-svn: 290728
* [LICM] Only recompute LCSSA when we actually promoted something.Michael Kuperstein2016-12-291-3/+6
| | | | | | | | | | | | We want to recompute LCSSA only when we actually promoted a value. This means we only need to look at changes made by promotion when deciding whether to recompute it or not, not at regular sinking/hoisting. (This was what the code was documented as doing, just not what it did) Hopefully NFC. llvm-svn: 290726
* NewGVN: Fix PR 31491 by ensuring that we touch the right instructions. ↵Daniel Berlin2016-12-291-11/+21
| | | | | | Change to one based numbering so we can assert we don't cause the same bug again. llvm-svn: 290724
* NewGVN: Sort Dominator Tree in RPO order, and use that for generating order.Daniel Berlin2016-12-291-4/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The optimal iteration order for this problem is RPO order. We want to process as many preds of a backedge as we can before we process the backedge. At the same time, as we add predicate handling, we want to be able to touch instructions that are dominated by a given block by ranges (because a change in value numbering a predicate possibly affects all users we dominate that are using that predicate). If we don't do it this way, we can't do value inference over backedges (the paper covers this in depth). The newgvn branch currently overshoots the last part, and guarantees that it will touch *at least* the right set of instructions, but it does touch more. This is because the bitvector instruction ranges are currently generated in RPO order (so we take the max and the min of the ranges of dominated blocks, which means there are some in the middle we didn't have to touch that we did). We can do better by sorting the dominator tree, and then just using dominator tree order. As a preliminary, the dominator tree has some RPO guarantees, but not enough. It guarantees that for a given node, your idom must come before you in the RPO ordering. It guarantees no relative RPO ordering for siblings. We add siblings in whatever order they appear in the module. So that is what we fix. We sort the children array of the domtree into RPO order, and then use the dominator tree for ordering, instead of RPO, since the dominator tree is now a valid RPO ordering. Note: This would help any other pass that iterates a forward problem in dominator tree order. Most of them are single pass. It will still maximize whatever result they compute. We could also build the dominator tree in this order, but our incremental updates would still put it out of sort order, and recomputing the sort order is almost as hard as general incremental updates of the domtree. Also note that the sorting does not affect any tests, etc. Nothing depends on domtree order, including the verifier, the equals functions for domtree nodes, etc. How much could this matter, you ask? Here are the current numbers. This is generated by running NewGVN over all files in LLVM. Note that once we propagate equalities, the differences go up by an order of magnitude or two (IE instead of 29, the max ends up in the thousands, since the worst case we add a factor of N, where N is the number of branch predicates). So while it doesn't look that stark for the default ordering, it gets *much much* worse. There are also programs in the wild where the difference is already pretty stark (2 iterations vs hundreds). RPO ordering: 759040 Number of iterations is 1 112908 Number of iterations is 2 Default dominator tree ordering: 755081 Number of iterations is 1 116234 Number of iterations is 2 603 Number of iterations is 3 27 Number of iterations is 4 2 Number of iterations is 5 1 Number of iterations is 7 Dominator tree sorted: 759040 Number of iterations is 1 112908 Number of iterations is 2 <yay!> Really bad ordering (sort domtree siblings in postorder. not quite the worst possible, but yeah): 754008 Number of iterations is 1 21 Number of iterations is 10 8 Number of iterations is 11 6 Number of iterations is 12 5 Number of iterations is 13 2 Number of iterations is 14 2 Number of iterations is 15 3 Number of iterations is 16 1 Number of iterations is 17 2 Number of iterations is 18 96642 Number of iterations is 2 1 Number of iterations is 20 2 Number of iterations is 21 1 Number of iterations is 22 1 Number of iterations is 29 17266 Number of iterations is 3 2598 Number of iterations is 4 798 Number of iterations is 5 273 Number of iterations is 6 186 Number of iterations is 7 80 Number of iterations is 8 42 Number of iterations is 9 Reviewers: chandlerc, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28129 llvm-svn: 290699
* Update equalsStoreHelper for the fact that only one branch can be trueDaniel Berlin2016-12-291-4/+5
| | | | llvm-svn: 290697
* Revert "[NewGVN] replace emplace_back with push_back"Piotr Padlewski2016-12-281-7/+7
| | | | llvm-svn: 290692
* [NewGVN] replace emplace_back with push_backPiotr Padlewski2016-12-281-7/+7
| | | | | | | | emplace_back is not faster if it is equivalent to push_back. In this cases emplaced value had the same type that the one stored in container. It is ugly and it might be even slower (see Scott Meyers presentation about emplacement). llvm-svn: 290685
* [NewGVN] Simplyfy loop NFCPiotr Padlewski2016-12-281-4/+1
| | | | llvm-svn: 290683
* [NewGVN] replace typedefs with usingsPiotr Padlewski2016-12-281-2/+2
| | | | llvm-svn: 290680
* [NewGVN] NFC fixesPiotr Padlewski2016-12-281-40/+36
| | | | llvm-svn: 290679
* [NewGVN] Global sweep replacing NULL with nullptr. NFCI.Davide Italiano2016-12-281-10/+10
| | | | llvm-svn: 290670
* [NewGVN] Remove redundant code. NFCI.Davide Italiano2016-12-281-2/+0
| | | | llvm-svn: 290669
* [NewGVN] equals() for loads/stores is the same. Unify.Davide Italiano2016-12-281-23/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D28116 llvm-svn: 290667
* [NewGVN] Simplify a bit removing else after return. NFCI.Davide Italiano2016-12-271-3/+3
| | | | llvm-svn: 290615
* [MemCpyOpt] Don't sink LoadInst below possible clobber.Bryant Wong2016-12-271-5/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D26811 llvm-svn: 290611
* Change a std::vector to SmallVector in NewGVNDaniel Berlin2016-12-271-1/+1
| | | | llvm-svn: 290596
* clang-format NewGVN filesDaniel Berlin2016-12-261-67/+60
| | | | llvm-svn: 290551
* Misc cleanups and simplifications for NewGVN.Daniel Berlin2016-12-261-48/+54
| | | | | | | | | | | | | Mostly use a bit more idiomatic C++ where we can, so we can combine some things later. Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28111 llvm-svn: 290550
* Don't use our own incorrect version of isTriviallyDeadInstruction in NewGVN. ↵Daniel Berlin2016-12-261-3/+2
| | | | | | Fixes PR/31472 llvm-svn: 290549
* [NewGVN] Fold lookupOperandLeader() when there's only one use. NFCI.Davide Italiano2016-12-261-4/+2
| | | | llvm-svn: 290543
* Value number stores and memory states so we can detect when memory states ↵Daniel Berlin2016-12-251-20/+139
| | | | | | | | | | | | are equivalent (IE store of same value to memory). Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28084 llvm-svn: 290525
* Rename GVNExpression *ops_ members to *op_* to match conventions in the rest ↵Daniel Berlin2016-12-251-11/+11
| | | | | | of LLVM llvm-svn: 290524
* [NewGVN] Prefer `auto` to explicit type when the latter is obvious.Davide Italiano2016-12-241-1/+1
| | | | llvm-svn: 290499
* Mark isOnlyReachableViaThisEdge as constDaniel Berlin2016-12-241-2/+2
| | | | llvm-svn: 290468
* [LICM] Plug a leak freeing the ASTs before clearing the map.Davide Italiano2016-12-231-0/+2
| | | | llvm-svn: 290433
* [LICM] Work around LICM needs to maintain state across loops.Davide Italiano2016-12-231-1/+6
| | | | | | | | | | | | | | | The pass creates some state which expects to be cleaned up by a later instance of the same pass. opt-bisect happens to expose this not ideal design because calling skipLoop() will result in this state not being cleaned up at times and an assertion firing in `doFinalization()`. Chandler tells me the new pass manager will give us options to avoid these design traps, but until it's not ready, we need a workaround for the current pass infrastructure. Fix provided by Andy Kaylor, see the review for a complete discussion. Differential Revision: https://reviews.llvm.org/D25848 llvm-svn: 290427
* [NewGVN] Remove (for now) unused code. NFCI.Davide Italiano2016-12-231-12/+0
| | | | llvm-svn: 290420
* Enable '-Wstring-conversion' and fix some bad asserts that it helpedChandler Carruth2016-12-231-1/+1
| | | | | | | | find. Notable is the assert in NewGVN which had no effect because of the bug. llvm-svn: 290400
* [GVN] Initial check-in of a new global value numbering algorithm.Davide Italiano2016-12-223-0/+1859
| | | | | | | | | | | | | | | | | | | | | The code have been developed by Daniel Berlin over the years, and the new implementation goal is that of addressing shortcomings of the current GVN infrastructure, i.e. long compile time for large testcases, lack of phi predication, no load/store value numbering etc... The current code just implements the "core" GVN algorithm, although other pieces (load coercion, phi handling, predicate system) are already implemented in a branch out of tree. Once the core is stable, we'll start adding pieces on top of the base framework. The test currently living in test/Transform/NewGVN are a copy of the ones in GVN, with proper `XFAIL` (missing features in NewGVN). A flag will be added in a future commit to enable NewGVN, so that interested parties can exercise this code easily. Differential Revision: https://reviews.llvm.org/D26224 llvm-svn: 290346
* [PM] Introduce a reasonable port of the main per-module pass pipelineChandler Carruth2016-12-221-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from the old pass manager in the new one. I'm not trying to support (initially) the numerous options that are currently available to customize the pass pipeline. If we end up really wanting them, we can add them later, but I suspect many are no longer interesting. The simplicity of omitting them will help a lot as we sort out what the pipeline should look like in the new PM. I've also documented to the best of my ability *why* each pass or group of passes is used so that reading the pipeline is more helpful. In many cases I think we have some questionable choices of ordering and I've left FIXME comments in place so we know what to come back and revisit going forward. But for now, I've left it as similar to the current pipeline as I could. Lastly, I've had to comment out several places where passes are not ported to the new pass manager or where the loop pass infrastructure is not yet ready. I did at least fix a few bugs in the loop pass infrastructure uncovered by running the full pipeline, but I didn't want to go too far in this patch -- I'll come back and re-enable these as the infrastructure comes online. But I'd like to keep the comments in place because I don't want to lose track of which passes need to be enabled and where they go. One thing that seemed like a significant API improvement was to require that we don't build pipelines for O0. It seems to have no real benefit. I've also switched back to returning pass managers by value as at this API layer it feels much more natural to me for composition. But if others disagree, I'm happy to go back to an output parameter. I'm not 100% happy with the testing strategy currently, but it seems at least OK. I may come back and try to refactor or otherwise improve this in subsequent patches but I wanted to at least get a good starting point in place. Differential Revision: https://reviews.llvm.org/D28042 llvm-svn: 290325
* Refactor the DIExpression fragment query interface (NFC)Adrian Prantl2016-12-221-4/+4
| | | | | | ... so it becomes available to DIExpressionCursor. llvm-svn: 290322
* [LDist] Match behavior between invoking via optimization pipeline or opt ↵Adam Nemet2016-12-211-31/+8
| | | | | | | | | | | | | | | | | | | | | | | | -loop-distribute In r267672, where the loop distribution pragma was introduced, I tried it hard to keep the old behavior for opt: when opt is invoked with -loop-distribute, it should distribute the loop (it's off by default when ran via the optimization pipeline). As MichaelZ has discovered this has the unintended consequence of breaking a very common developer work-flow to reproduce compilations using opt: First you print the pass pipeline of clang with -debug-pass=Arguments and then invoking opt with the returned arguments. clang -debug-pass will include -loop-distribute but the pass is invoked with default=off so nothing happens unless the loop carries the pragma. While through opt (default=on) we will try to distribute all loops. This changes opt's default to off as well to match clang. The tests are modified to explicitly enable the transformation. llvm-svn: 290235
* [LoopVersioning] Require loop-simplify form for loop versioning.Florian Hahn2016-12-193-11/+14
| | | | | | | | | | | | | | | | Summary: Requiring loop-simplify form for loop versioning ensures that the runtime check block always dominates the exit block. This patch closes #30958 (https://llvm.org/bugs/show_bug.cgi?id=30958). Reviewers: silviu.baranga, hfinkel, anemet, ashutosh.nema Subscribers: ashutosh.nema, mzolotukhin, efriedma, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D27469 llvm-svn: 290116
* Revert @llvm.assume with operator bundles (r289755-r289757)Daniel Jasper2016-12-1917-75/+170
| | | | | | | This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086
* Remove the AssumptionCacheHal Finkel2016-12-1517-167/+69
| | | | | | | | | After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756
* Make processing @llvm.assume more efficient by using operand bundlesHal Finkel2016-12-151-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | There was an efficiency problem with how we processed @llvm.assume in ValueTracking (and other places). The AssumptionCache tracked all of the assumptions in a given function. In order to find assumptions relevant to computing known bits, etc. we searched every assumption in the function. For ValueTracking, that means that we did O(#assumes * #values) work in InstCombine and other passes (with a constant factor that can be quite large because we'd repeat this search at every level of recursion of the analysis). Several of us discussed this situation at the last developers' meeting, and this implements the discussed solution: Make the values that an assume might affect operands of the assume itself. To avoid exposing this detail to frontends and passes that need not worry about it, I've used the new operand-bundle feature to add these extra call "operands" in a way that does not affect the intrinsic's signature. I think this solution is relatively clean. InstCombine adds these extra operands based on what ValueTracking, LVI, etc. will need and then those passes need only search the users of the values under consideration. This should fix the computational-complexity problem. At this point, no passes depend on the AssumptionCache, and so I'll remove that as a follow-up change. Differential Revision: https://reviews.llvm.org/D27259 llvm-svn: 289755
OpenPOWER on IntegriCloud