summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/MachineBlockPlacement.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [MBP] Use range based for-loops throughout this code. Several hadChandler Carruth2015-03-051-140/+108
| | | | | | | already been added and the inconsistency made choosing names and changing code more annoying. Plus, wow are they better for this code! llvm-svn: 231347
* [MBP] NFC, run clang-format over this code and tweak things to make theChandler Carruth2015-03-051-71/+62
| | | | | | | | | | | result reasonable. This code predated clang-format and so there was a reasonable amount of crufty formatting that had accumulated. This should ensure that neither myself nor others end up with formatting-only changes sneaking into other fixes. llvm-svn: 231341
* [MBP] This is no longer 'block-placement2'. ;] The old variants are longChandler Carruth2015-03-051-3/+3
| | | | | | gone, update this code to reflect that. llvm-svn: 231340
* [MBP] Revert r231238 which attempted to fix a nasty bug where MBP isChandler Carruth2015-03-051-26/+0
| | | | | | | | | | | | | just arbitrarily interleaving unrelated control flows once they get moved "out-of-line" (both outside of natural CFG ordering and with diamonds that cannot be fully laid out by chaining fallthrough edges). This easy solution doesn't work in practice, and it isn't just a small bug. It looks like a very different strategy will be required. I'm working on that now, and it'll again go behind some flag so that everyone can experiment and make sure it is working well for them. llvm-svn: 231332
* [MBP] Fix a really horrible bug in MachineBlockPlacement, but behindChandler Carruth2015-03-041-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a flag for now. First off, thanks to Daniel Jasper for really pointing out the issue here. It's been here forever (at least, I think it was there when I first wrote this code) without getting really noticed or fixed. The key problem is what happens when two reasonably common patterns happen at the same time: we outline multiple cold regions of code, and those regions in turn have diamonds or other CFGs for which we can't just topologically lay them out. Consider some C code that looks like: if (a1()) { if (b1()) c1(); else d1(); f1(); } if (a2()) { if (b2()) c2(); else d2(); f2(); } done(); Now consider the case where a1() and a2() are unlikely to be true. In that case, we might lay out the first part of the function like: a1, a2, done; And then we will be out of successors in which to build the chain. We go to find the best block to continue the chain with, which is perfectly reasonable here, and find "b1" let's say. Laying out successors gets us to: a1, a2, done; b1, c1; At this point, we will refuse to lay out the successor to c1 (f1) because there are still un-placed predecessors of f1 and we want to try to preserve the CFG structure. So we go get the next best block, d1. ... wait for it ... Except that the next best block *isn't* d1. It is b2! d1 is waaay down inside these conditionals. It is much less important than b2. Except that this is exactly what we didn't want. If we keep going we get the entire set of the rest of the CFG *interleaved*!!! a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2; So we clearly need a better strategy here. =] My current favorite strategy is to actually try to place the block whose predecessor is closest. This very simply ensures that we unwind these kinds of CFGs the way that is natural and fitting, and should minimize the number of cache lines instructions are spread across. It also happens to be *dead simple*. It's like the datastructure was specifically set up for this use case or something. We only push blocks onto the work list when the last predecessor for them is placed into the chain. So the back of the worklist *is* the nearest next block. Unfortunately, a change like this is going to cause *soooo* many benchmarks to swing wildly. So for now I'm adding this under a flag so that we and others can validate that this is fixing the problems described, that it seems possible to enable, and hopefully that it fixes more of our problems long term. llvm-svn: 231238
* Add a flag to experiment with outlining optional branches.Daniel Jasper2015-03-041-2/+46
| | | | | | | | | | | | | In a CFG with the edges A->B->C and A->C, B is an optional branch. LLVM's default behavior is to lay the blocks out naturally, i.e. A, B, C, in order to improve code locality and fallthroughs. However, if a function contains many of those optional branches only a few of which are taken, this leads to a lot of unnecessary icache misses. Moving B out of line can work around this. Review: http://reviews.llvm.org/D7719 llvm-svn: 231230
* NFC: Use range-based for loops and more consistent naming.Daniel Jasper2015-02-181-19/+15
| | | | | | | | | No functional changes intended. (I plan on doing some modifications to this function and would like to have as few unrelated changes as possible in the patch) llvm-svn: 229649
* Remove experimental options to control machine block placement.Daniel Jasper2015-02-181-35/+20
| | | | | | | This reverts r226034. Benchmarking with those flags has not revealed anything interesting. llvm-svn: 229648
* CodeGen: Canonicalize access to function attributes, NFCDuncan P. N. Exon Smith2015-02-141-2/+1
| | | | | | | | | | | | | | | | | Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) Also, add `Function::getFnStackAlignment()`, and canonicalize: getAttributes().getStackAlignment(AttributeSet::FunctionIndex) => getFnStackAlignment() llvm-svn: 229208
* [MBP] Add flags to disable the BadCFGConflict check in MachineBlockPlacement.Chandler Carruth2015-01-141-20/+35
| | | | | | | | | | | | | | | | | | | | | | | | Some benchmarks have shown that this could lead to a potential performance benefit, and so adding some flags to try to help measure the difference. A possible explanation. In diamond-shaped CFGs (A followed by either B or C both followed by D), putting B and C both in between A and D leads to the code being less dense than it could be. Always either B or C have to be skipped increasing the chance of cache misses etc. Moving either B or C to after D might be beneficial on average. In the long run, but we should probably do a better job of analyzing the basic block and branch probabilities to move the correct one of B or C to after D. But even if we don't use this in the long run, it is a good baseline for benchmarking. Original patch authored by Daniel Jasper with test tweaks and a second flag added by me. Differential Revision: http://reviews.llvm.org/D6969 llvm-svn: 226034
* [PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preferenceHal Finkel2015-01-031-3/+4
| | | | | | | | | | | | | | | | | The existing code provided for specifying a global loop alignment preference. However, the preferred loop alignment might depend on the loop itself. For recent POWER cores, loops between 5 and 8 instructions should have 32-byte alignment (while the others are better with 16-byte alignment) so that the entire loop will fit in one i-cache line. To support this, getPrefLoopAlignment has been made virtual, and can be provided with an optional MachineLoop* so the target can inspect the loop before answering the query. The default behavior, as before, is to return the value set with setPrefLoopAlignment. MachineBlockPlacement now queries the target for each loop instead of only once per function. There should be no functional change for other targets. llvm-svn: 225117
* Update SetVector to rely on the underlying set's insert to return a ↵David Blaikie2014-11-191-2/+2
| | | | | | | | | | | | | pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334
* Have MachineFunction cache a pointer to the subtarget to make lookupsEric Christopher2014-08-051-2/+2
| | | | | | | | | | | shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-041-2/+3
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* Revert "Introduce a string_ostream string builder facilty"Alp Toker2014-06-261-4/+8
| | | | | | Temporarily back out commits r211749, r211752 and r211754. llvm-svn: 211814
* Introduce a string_ostream string builder faciltyAlp Toker2014-06-261-8/+4
| | | | | | | | | | | | | | | | | | | | string_ostream is a safe and efficient string builder that combines opaque stack storage with a built-in ostream interface. small_string_ostream<bytes> additionally permits an explicit stack storage size other than the default 128 bytes to be provided. Beyond that, storage is transferred to the heap. This convenient class can be used in most places an std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair would previously have been used, in order to guarantee consistent access without byte truncation. The patch also converts much of LLVM to use the new facility. These changes include several probable bug fixes for truncated output, a programming error that's no longer possible with the new interface. llvm-svn: 211749
* [Modules] Remove potential ODR violations by sinking the DEBUG_TYPEChandler Carruth2014-04-221-1/+2
| | | | | | | | | | | | define below all header includes in the lib/CodeGen/... tree. While the current modules implementation doesn't check for this kind of ODR violation yet, it is likely to grow support for it in the future. It also removes one layer of macro pollution across all the included headers. Other sub-trees will follow. llvm-svn: 206837
* [C++11] More 'nullptr' conversion. In some cases just using a boolean check ↵Craig Topper2014-04-141-17/+17
| | | | | | instead of comparing to nullptr. llvm-svn: 206142
* Disable each MachineFunctionPass for 'optnone' functions, unless thatPaul Robinson2014-03-311-0/+3
| | | | | | | pass normally runs at optimization level None, or is part of the register allocation pipeline. llvm-svn: 205228
* [C++11] Add 'override' keyword to virtual methods that override their base ↵Craig Topper2014-03-071-4/+4
| | | | | | class. llvm-svn: 203220
* [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.Benjamin Kramer2014-03-021-13/+13
| | | | | | Remove the old functions. llvm-svn: 202636
* Now that we have C++11, turn simple functors into lambdas and remove a ton ↵Benjamin Kramer2014-03-011-18/+3
| | | | | | | | of boilerplate. No intended functionality change. llvm-svn: 202588
* Add a LLVM_DUMP_METHOD macro.Nico Weber2014-01-031-1/+1
| | | | | | | | | | | | | | The motivation is to mark dump methods as used in debug builds so that they can be called from lldb, but to not do so in release builds so that they can be dead-stripped. There's lots of potential follow-up work suggested in the thread "Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev, but everyone seems to agreen on this subset. Macro name chosen by fair coin toss. llvm-svn: 198456
* [block-freq] Update MachineBlockPlacement and RegAllocGreedy to use the new ↵Michael Gottesman2013-12-141-5/+6
| | | | | | MachineBlockFrequencyInfo methods. llvm-svn: 197290
* Fix gcc warnings.Matt Arsenault2013-12-101-0/+2
| | | | | | Unused variable and unused typedef in release build. llvm-svn: 196947
* Revert part of GCC warning fix to fix debug build.Matt Arsenault2013-12-051-0/+1
| | | | | | | The typedef is used inside the DEBUG(), and apparently can't be moved inside of it. llvm-svn: 196528
* Fix minor GCC warnings.Matt Arsenault2013-12-051-1/+0
| | | | | | Unused typedefs and unused variables. llvm-svn: 196526
* Output a bit more information in the debug printing for MBP. This wasChandler Carruth2013-11-251-3/+4
| | | | | | useful when analyzing parts of zlib's behavior here. llvm-svn: 195588
* MachineBlockPlacement: Strengthen the source order bias when picking an exit ↵Benjamin Kramer2013-11-201-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | block. We now only allow breaking source order if the exit block frequency is significantly higher than the other exit block. The actual bias is currently under a flag so the best cut-off can be found; the flag defaults to the old behavior. The idea is to get some benchmark coverage over different values for the flag and pick the best one. When we require the new frequency to be at least 20% higher than the old frequency I see a 5% speedup on zlib's deflate when compressing a random file on x86_64/westmere. Hal reported a small speedup on Fhourstones on a BG/Q and no regressions in the test suite. The test case is the full long_match function from zlib's deflate. I was reluctant to add it for previous tweaks to branch probabilities because it's large and potentially fragile, but changed my mind since it's an important use case and more likely to break with all the current work going into the PGO infrastructure. Differential Revision: http://llvm-reviews.chandlerc.com/D2202 llvm-svn: 195265
* Fix a defect in code-layout pass, improving Benchmarks/Olden/em3d/em3d by ↵Shuxin Yang2013-06-041-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | about 30% (4.58s vs 3.2s on an oldish Mac Tower). The corresponding src is excerpted bellow. The lopp accounts for about 90% of execution time. -------------------- cat -n test-suite/MultiSource/Benchmarks/Olden/em3d/make_graph.c 90 91 for (k=0; k<j; k++) 92 if (other_node == cur_node->to_nodes[k]) break; The defective layout is sketched bellow, where the two branches need to swap. ------------------------------------------------------------------------ L: ... if (cond) goto out-of-loop goto L While this code sequence is defective, I don't understand why it incurs 1/3 of execution time. CPU-event-profiling indicates the poor laoyout dose not increase in br-misprediction; it dosen't increase stall cycle at all, and it dosen't prevent the CPU detect the loop (i.e. Loop-Stream-Detector seems to be working fine as well)... The root cause of the problem is that the layout pass calls AnalyzeBranch() with basic-block which is not updated to reflect its current layout. rdar://13966341 llvm-svn: 183174
* Don't disable block layout when forcing block alignment.Nadav Rotem2013-04-121-8/+6
| | | | llvm-svn: 179355
* Add a flag to align all basic blocks in the function.Nadav Rotem2013-04-121-0/+14
| | | | | | | | | | When debugging performance regressions we often ask ourselves if the regression that we see is due to poor isel/sched/ra or due to some micro-architetural problem. When comparing two code sequences one good way to rule out front-end bottlenecks (and other the issues) is to force code alignment. This pass adds a flag that forces the alignment of all of the basic blocks in the program. llvm-svn: 179353
* Fix a typoNadav Rotem2013-03-291-1/+1
| | | | llvm-svn: 178346
* Split TargetLowering into a CodeGen and a SelectionDAG part.Benjamin Kramer2013-01-111-1/+1
| | | | | | | | | This fixes some of the cycles between libCodeGen and libSelectionDAG. It's still a complete mess but as long as the edges consist of virtual call it doesn't cause breakage. BasicTTI did static calls and thus broke some build configurations. llvm-svn: 172246
* Remove the Function::getFnAttributes method in favor of using the AttributeSetBill Wendling2012-12-301-2/+2
| | | | | | | | | directly. This is in preparation for removing the use of the 'Attribute' class as a collection of attributes. That will shift to the AttributeSet class instead. llvm-svn: 171253
* Rename the 'Attributes' class to 'Attribute'. It's going to represent a ↵Bill Wendling2012-12-191-1/+1
| | | | | | single attribute in the future. llvm-svn: 170502
* Use the new script to sort the includes of every file under lib.Chandler Carruth2012-12-031-5/+5
| | | | | | | | | | | | | | | | | Sooooo many of these had incorrect or strange main module includes. I have manually inspected all of these, and fixed the main module include to be the nearest plausible thing I could find. If you own or care about any of these source files, I encourage you to take some time and check that these edits were sensible. I can't have broken anything (I strictly added headers, and reordered them, never removed), but they may not be the headers you'd really like to identify as containing the API being implemented. Many forward declarations and missing includes were added to a header files to allow them to parse cleanly when included first. The main module rule does in fact have its merits. =] llvm-svn: 169131
* Create enums for the different attributes.Bill Wendling2012-10-091-1/+2
| | | | | | | We use the enums to query whether an Attributes object has that attribute. The opaque layer is responsible for knowing where that specific attribute is stored. llvm-svn: 165488
* Remove the `hasFnAttr' method from Function.Bill Wendling2012-09-261-1/+1
| | | | | | | The hasFnAttr method has been replaced by querying the Attributes explicitly. No intended functionality change. llvm-svn: 164725
* Remove silly dead store. Patch by Ettl Martin.Duncan Sands2012-09-141-2/+1
| | | | llvm-svn: 163882
* Add a much more conservative strategy for aligning branch targets.Chandler Carruth2012-08-071-15/+49
| | | | | | | | | | | | | Previously, MBP essentially aligned every branch target it could. This bloats code quite a bit, especially non-looping code which has no real reason to prefer aligned branch targets so heavily. As Andy said in review, it's still a bit odd to do this without a real cost model, but this at least has much more plausible heuristics. Fixes PR13265. llvm-svn: 161409
* Reverse order of the two branches at end of a basic block if it is profitable.Manman Ren2012-07-311-1/+15
| | | | | | | | | | | | | | | We branch to the successor with higher edge weight first. Convert from je LBB4_8 --> to outer loop jmp LBB4_14 --> to inner loop to jne LBB4_14 jmp LBB4_8 PR12750 rdar: 11393714 llvm-svn: 161018
* Update a bunch of stale comments that dated from when this folled theChandler Carruth2012-06-261-14/+11
| | | | | | | very first (and worst) placement algorithm. These should now more accurately reflect the reality of the pass. llvm-svn: 159185
* Fix typos found by http://github.com/lyda/misspell-checkBenjamin Kramer2012-06-021-4/+4
| | | | llvm-svn: 157885
* Add a somewhat hacky heuristic to do something different from whole-loopChandler Carruth2012-04-161-3/+78
| | | | | | | | | | | | | rotation. When there is a loop backedge which is an unconditional branch, we will end up with a branch somewhere no matter what. Try placing this backedge in a fallthrough position above the loop header as that will definitely remove at least one branch from the loop iteration, where whole loop rotation may not. I haven't seen any benchmarks where this is important but loop-blocks.ll tests for it, and so this will be covered when I flip the default. llvm-svn: 154812
* Tweak the loop rotation logic to check whether the loop is naturallyChandler Carruth2012-04-161-11/+51
| | | | | | | | | | | laid out in a form with a fallthrough into the header and a fallthrough out of the bottom. In that case, leave the loop alone because any rotation will introduce unnecessary branches. If either side looks like it will require an explicit branch, then the rotation won't add any, do it to ensure the branch occurs outside of the loop (if possible) and maximize the benefit of the fallthrough in the bottom. llvm-svn: 154806
* Rewrite how machine block placement handles loop rotation.Chandler Carruth2012-04-161-66/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a complex change that resulted from a great deal of experimentation with several different benchmarks. The one which proved the most useful is included as a test case, but I don't know that it captures all of the relevant changes, as I didn't have specific regression tests for each, they were more the result of reasoning about what the old algorithm would possibly do wrong. I'm also failing at the moment to craft more targeted regression tests for these changes, if anyone has ideas, it would be welcome. The first big thing broken with the old algorithm is the idea that we can take a basic block which has a loop-exiting successor and a looping successor and use the looping successor as the layout top in order to get that particular block to be the bottom of the loop after layout. This happens to work in many cases, but not in all. The second big thing broken was that we didn't try to select the exit which fell into the nearest enclosing loop (to which we exit at all). As a consequence, even if the rotation worked perfectly, it would result in one of two bad layouts. Either the bottom of the loop would get fallthrough, skipping across a nearer enclosing loop and thereby making it discontiguous, or it would be forced to take an explicit jump over the nearest enclosing loop to earch its successor. The point of the rotation is to get fallthrough, so we need it to fallthrough to the nearest loop it can. The fix to the first issue is to actually layout the loop from the loop header, and then rotate the loop such that the correct exiting edge can be a fallthrough edge. This is actually much easier than I anticipated because we can handle all the hard parts of finding a viable rotation before we do the layout. We just store that, and then rotate after layout is finished. No inner loops get split across the post-rotation backedge because we check for them when selecting the rotation. That fix exposed a latent problem with our exitting block selection -- we should allow the backedge to point into the middle of some inner-loop chain as there is no real penalty to it, the whole point is that it *won't* be a fallthrough edge. This may have blocked the rotation at all in some cases, I have no idea and no test case as I've never seen it in practice, it was just noticed by inspection. Finally, all of these fixes, and studying the loops they produce, highlighted another problem: in rotating loops like this, we sometimes fail to align the destination of these backwards jumping edges. Fix this by actually walking the backwards edges rather than relying on loopinfo. This fixes regressions on heapsort if block placement is enabled as well as lots of other cases where the previous logic would introduce an abundance of unnecessary branches into the execution. llvm-svn: 154783
* Make a somewhat subtle change in the logic of block placement. SometimesChandler Carruth2012-04-101-0/+12
| | | | | | | | | | | | | | | | the loop header has a non-loop predecessor which has been pre-fused into its chain due to unanalyzable branches. In this case, rotating the header into the body of the loop in order to place a loop exit at the bottom of the loop is a Very Bad Idea as it makes the loop non-contiguous. I'm working on a good test case for this, but it's a bit annoynig to craft. I should get one shortly, but I'm submitting this now so I can begin the (lengthy) performance analysis process. An initial run of LNT looks really, really good, but there is too much noise there for me to trust it much. llvm-svn: 154395
* Remove an over zealous assert. The assert was trying to catch placesChandler Carruth2012-04-081-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | where a chain outside of the loop block-set ended up in the worklist for scheduling as part of the contiguous loop. However, asserting the first block in the chain is in the loop-set isn't a valid check -- we may be forced to drag a chain into the worklist due to one block in the chain being part of the loop even though the first block is *not* in the loop. This occurs when we have been forced to form a chain early due to un-analyzable branches. No test case here as I have no idea how to even begin reducing one, and it will be hopelessly fragile. We have to somehow end up with a loop header of an inner loop which is a successor of a basic block with an unanalyzable pair of branch instructions. Ow. Self-host triggers it so it is unlikely it will regress. This at least gets block placement back to passing selfhost and the test suite. There are still a lot of slowdown that I don't like coming out of block placement, although there are now also a lot of speedups. =[ I'm seeing swings in both directions up to 10%. I'm going to try to find time to dig into this and see if we can turn this on for 3.1 as it does a really good job of cleaning up after some loops that degraded with the inliner changes. llvm-svn: 154287
* Add a debug-only 'dump' method to the BlockChain structure to easeChandler Carruth2012-04-081-0/+8
| | | | | | debugging. llvm-svn: 154286
OpenPOWER on IntegriCloud