summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/MachineBlockPlacement.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert r254348: "Replace all weight-based interfaces in MBB with ↵Hans Wennborg2015-12-011-30/+41
| | | | | | | | | | probability-based interfaces, and update all uses of old interfaces." and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction." Asserts were firing in Chromium builds. See PR25687. llvm-svn: 254366
* Fix a bug in MachineBlockPlacement that may cause assertion failure during ↵Cong Hou2015-12-011-3/+7
| | | | | | | | BranchProbability construction. The root cause is the rounding behavior in BranchProbability construction. We may consider to use truncation instead in the future. llvm-svn: 254356
* Replace all weight-based interfaces in MBB with probability-based ↵Cong Hou2015-12-011-41/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | interfaces, and update all uses of old interfaces. The patch in http://reviews.llvm.org/D13745 is broken into four parts: 1. New interfaces without functional changes (http://reviews.llvm.org/D13908). 2. Use new interfaces in SelectionDAG, while in other passes treat probabilities as weights (http://reviews.llvm.org/D14361). 3. Use new interfaces in all other passes. 4. Remove old interfaces. This patch is 3+4 above. In this patch, MBB won't provide weight-based interfaces any more, which are totally replaced by probability-based ones. The interface addSuccessor() is redesigned so that the default probability is unknown. We allow unknown probabilities but don't allow using it together with known probabilities in successor list. That is to say, we either have a list of successors with all known probabilities, or all unknown probabilities. In the latter case, we assume each successor has 1/N probability where N is the number of successors. An assertion checks if the user is attempting to add a successor with the disallowed mixed use as stated above. This can help us catch many misuses. All uses of weight-based interfaces are now updated to use probability-based ones. Differential revision: http://reviews.llvm.org/D14973 llvm-svn: 254348
* Improving edge probabilities computation when choosing the best successor in ↵Cong Hou2015-11-181-13/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | machine block placement. When looking for the best successor from the outer loop for a block belonging to an inner loop, the edge probability computation can be improved so that edges in the inner loop are ignored. For example, suppose we are building chains for the non-loop part of the following code, and looking for B1's best successor. Assume the true body is very hot, then B3 should be the best candidate. However, because of the existence of the back edge from B1 to B0, the probability from B1 to B3 can be very small, preventing B3 to be its successor. In this patch, when computing the probability of the edge from B1 to B3, the weight on the back edge B1->B0 is ignored, so that B1->B3 will have 100% probability. if (...) do { B0; ... // some branches B1; } while(...); else B2; B3; Differential revision: http://reviews.llvm.org/D10825 llvm-svn: 253414
* In MachineBlockPlacement, filter cold blocks off the loop chain when profile ↵Cong Hou2015-11-021-2/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | data is available. In the current BB placement algorithm, a loop chain always contains all loop blocks. This has a drawback that cold blocks in the loop may be inserted on a hot function path, hence increasing branch cost and also reducing icache locality. Consider a simple example shown below: A | B⇆C | D When B->C is quite cold, the best BB-layout should be A,B,D,C. But the current implementation produces A,C,B,D. This patch filters those cold blocks off from the loop chain by comparing the ratio: LoopBBFreq / LoopFreq to 20%: if it is less than 20%, we don't include this BB to the loop chain. Here LoopFreq is the frequency of the loop when we reduce the loop into a single node. In general we have more cold blocks when the loop has few iterations. And vice versa. Differential revision: http://reviews.llvm.org/D11662 llvm-svn: 251833
* Enhance loop rotation with existence of profile data in ↵Cong Hou2015-10-191-3/+184
| | | | | | | | | | | | | | | | MachineBlockPlacement pass. Currently, in MachineBlockPlacement pass the loop is rotated to let the best exit to be the last BB in the loop chain, to maximize the fall-through from the loop to outside. With profile data, we can determine the cost in terms of missed fall through opportunities when rotating a loop chain and select the best rotation. Basically, there are three kinds of cost to consider for each rotation: 1. The possibly missed fall through edge (if it exists) from BB out of the loop to the loop header. 2. The possibly missed fall through edges (if they exist) from the loop exits to BB out of the loop. 3. The missed fall through edge (if it exists) from the last BB to the first BB in the loop chain. Therefore, the cost for a given rotation is the sum of costs listed above. We select the best rotation with the smallest cost. This is only for PGO mode when we have more precise edge frequencies. Differential revision: http://reviews.llvm.org/D10717 llvm-svn: 250754
* CodeGen: Remove implicit iterator conversions from MBB.cppDuncan P. N. Exon Smith2015-10-091-8/+8
| | | | | | | | | | Remove implicit ilist iterator conversions from MachineBasicBlock.cpp. I've also added an overload of `splice()` that takes a pointer, since it's a natural API. This is similar to the overloads I added for `remove()` and `erase()` in r249867. llvm-svn: 249883
* Fix a spelling error in the description of a statistic. NFCCraig Topper2015-09-161-1/+1
| | | | llvm-svn: 247771
* [WinEH] Add some support for code generating catchpadReid Kleckner2015-08-271-1/+1
| | | | | | | We can now run 32-bit programs with empty catch bodies. The next step is to change PEI so that we get funclet prologues and epilogues. llvm-svn: 246235
* Revert r244154 which causes some build failure. See ↵Cong Hou2015-08-061-4/+6
| | | | | | https://llvm.org/bugs/show_bug.cgi?id=24377. llvm-svn: 244239
* Record whether the weights on out-edges from a MBB are normalized.Cong Hou2015-08-051-6/+4
| | | | | | | | | | | | 1. Create a utility function normalizeEdgeWeights() in MachineBranchProbabilityInfo that normalizes a list of edge weights so that the sum of then can fit in uint32_t. 2. Provide an interface in MachineBasicBlock to normalize its successors' weights. 3. Add a flag in MachineBasicBlock that tracks whether its successors' weights are normalized. 4. Provide an overload of getSumForBlock that accepts a non-const pointer to a MBB so that it can force normalizing this MBB's successors' weights. 5. Update several uses of getSumForBlock() by eliminating the once needed weight scale. Differential Revision: http://reviews.llvm.org/D11442 llvm-svn: 244154
* wrap OptSize and MinSize attributes for easier and consistent access (NFCI)Sanjay Patel2015-08-041-0/+1
| | | | | | | | | | | | | | | | | Create wrapper methods in the Function class for the OptimizeForSize and MinSize attributes. We want to hide the logic of "or'ing" them together when optimizing just for size (-Os). Currently, we are not consistent about this and rely on a front-end to always set OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here that should be added as follow-on patches with regression tests. This patch is NFC-intended: it just replaces existing direct accesses of the attributes by the equivalent wrapper call. Differential Revision: http://reviews.llvm.org/D11734 llvm-svn: 243994
* Test commit.Cong Hou2015-07-151-1/+0
| | | | | | This is a test commit (one blank line deleted). llvm-svn: 242308
* Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)Alexander Kornienko2015-06-231-3/+3
| | | | | | Apparently, the style needs to be agreed upon first. llvm-svn: 240390
* Fixed/added namespace ending comments using clang-tidy. NFCAlexander Kornienko2015-06-191-3/+3
| | | | | | | | | | | | | The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137
* [MBP] Spell the conditions the same way through out this if statement.Chandler Carruth2015-04-151-1/+1
| | | | | | NFC. llvm-svn: 235009
* [MBP] Sink a comment into the if block to which it pertains. This makesChandler Carruth2015-04-151-1/+1
| | | | | | the content of the comment make much more sense. llvm-svn: 235007
* [MBP] Fix a really misleading typo in a comment.Chandler Carruth2015-04-151-1/+1
| | | | llvm-svn: 235006
* Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.Benjamin Kramer2015-03-231-0/+1
| | | | llvm-svn: 232998
* [MBP] Don't outline short optional branchesDaniel Jasper2015-03-201-2/+25
| | | | | | | | | | | | | | | | | | With the option -outline-optional-branches, LLVM will place optional branches out of line (more details on r231230). With this patch, this is not done for short optional branches. A short optional branch is a branch containing a single block with an instruction count below a certain threshold (defaulting to 3). Still everything is guarded under -outline-optional-branches). Outlining a short branch can't significantly improve code locality. It can however decrease performance because of the additional jmp and in cases where the optional branch is hot. This fixes a compile time regression I have observed in a benchmark. Review: http://reviews.llvm.org/D8108 llvm-svn: 232802
* [MBP] Use range based for-loops throughout this code. Several hadChandler Carruth2015-03-051-140/+108
| | | | | | | already been added and the inconsistency made choosing names and changing code more annoying. Plus, wow are they better for this code! llvm-svn: 231347
* [MBP] NFC, run clang-format over this code and tweak things to make theChandler Carruth2015-03-051-71/+62
| | | | | | | | | | | result reasonable. This code predated clang-format and so there was a reasonable amount of crufty formatting that had accumulated. This should ensure that neither myself nor others end up with formatting-only changes sneaking into other fixes. llvm-svn: 231341
* [MBP] This is no longer 'block-placement2'. ;] The old variants are longChandler Carruth2015-03-051-3/+3
| | | | | | gone, update this code to reflect that. llvm-svn: 231340
* [MBP] Revert r231238 which attempted to fix a nasty bug where MBP isChandler Carruth2015-03-051-26/+0
| | | | | | | | | | | | | just arbitrarily interleaving unrelated control flows once they get moved "out-of-line" (both outside of natural CFG ordering and with diamonds that cannot be fully laid out by chaining fallthrough edges). This easy solution doesn't work in practice, and it isn't just a small bug. It looks like a very different strategy will be required. I'm working on that now, and it'll again go behind some flag so that everyone can experiment and make sure it is working well for them. llvm-svn: 231332
* [MBP] Fix a really horrible bug in MachineBlockPlacement, but behindChandler Carruth2015-03-041-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a flag for now. First off, thanks to Daniel Jasper for really pointing out the issue here. It's been here forever (at least, I think it was there when I first wrote this code) without getting really noticed or fixed. The key problem is what happens when two reasonably common patterns happen at the same time: we outline multiple cold regions of code, and those regions in turn have diamonds or other CFGs for which we can't just topologically lay them out. Consider some C code that looks like: if (a1()) { if (b1()) c1(); else d1(); f1(); } if (a2()) { if (b2()) c2(); else d2(); f2(); } done(); Now consider the case where a1() and a2() are unlikely to be true. In that case, we might lay out the first part of the function like: a1, a2, done; And then we will be out of successors in which to build the chain. We go to find the best block to continue the chain with, which is perfectly reasonable here, and find "b1" let's say. Laying out successors gets us to: a1, a2, done; b1, c1; At this point, we will refuse to lay out the successor to c1 (f1) because there are still un-placed predecessors of f1 and we want to try to preserve the CFG structure. So we go get the next best block, d1. ... wait for it ... Except that the next best block *isn't* d1. It is b2! d1 is waaay down inside these conditionals. It is much less important than b2. Except that this is exactly what we didn't want. If we keep going we get the entire set of the rest of the CFG *interleaved*!!! a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2; So we clearly need a better strategy here. =] My current favorite strategy is to actually try to place the block whose predecessor is closest. This very simply ensures that we unwind these kinds of CFGs the way that is natural and fitting, and should minimize the number of cache lines instructions are spread across. It also happens to be *dead simple*. It's like the datastructure was specifically set up for this use case or something. We only push blocks onto the work list when the last predecessor for them is placed into the chain. So the back of the worklist *is* the nearest next block. Unfortunately, a change like this is going to cause *soooo* many benchmarks to swing wildly. So for now I'm adding this under a flag so that we and others can validate that this is fixing the problems described, that it seems possible to enable, and hopefully that it fixes more of our problems long term. llvm-svn: 231238
* Add a flag to experiment with outlining optional branches.Daniel Jasper2015-03-041-2/+46
| | | | | | | | | | | | | In a CFG with the edges A->B->C and A->C, B is an optional branch. LLVM's default behavior is to lay the blocks out naturally, i.e. A, B, C, in order to improve code locality and fallthroughs. However, if a function contains many of those optional branches only a few of which are taken, this leads to a lot of unnecessary icache misses. Moving B out of line can work around this. Review: http://reviews.llvm.org/D7719 llvm-svn: 231230
* NFC: Use range-based for loops and more consistent naming.Daniel Jasper2015-02-181-19/+15
| | | | | | | | | No functional changes intended. (I plan on doing some modifications to this function and would like to have as few unrelated changes as possible in the patch) llvm-svn: 229649
* Remove experimental options to control machine block placement.Daniel Jasper2015-02-181-35/+20
| | | | | | | This reverts r226034. Benchmarking with those flags has not revealed anything interesting. llvm-svn: 229648
* CodeGen: Canonicalize access to function attributes, NFCDuncan P. N. Exon Smith2015-02-141-2/+1
| | | | | | | | | | | | | | | | | Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) Also, add `Function::getFnStackAlignment()`, and canonicalize: getAttributes().getStackAlignment(AttributeSet::FunctionIndex) => getFnStackAlignment() llvm-svn: 229208
* [MBP] Add flags to disable the BadCFGConflict check in MachineBlockPlacement.Chandler Carruth2015-01-141-20/+35
| | | | | | | | | | | | | | | | | | | | | | | | Some benchmarks have shown that this could lead to a potential performance benefit, and so adding some flags to try to help measure the difference. A possible explanation. In diamond-shaped CFGs (A followed by either B or C both followed by D), putting B and C both in between A and D leads to the code being less dense than it could be. Always either B or C have to be skipped increasing the chance of cache misses etc. Moving either B or C to after D might be beneficial on average. In the long run, but we should probably do a better job of analyzing the basic block and branch probabilities to move the correct one of B or C to after D. But even if we don't use this in the long run, it is a good baseline for benchmarking. Original patch authored by Daniel Jasper with test tweaks and a second flag added by me. Differential Revision: http://reviews.llvm.org/D6969 llvm-svn: 226034
* [PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preferenceHal Finkel2015-01-031-3/+4
| | | | | | | | | | | | | | | | | The existing code provided for specifying a global loop alignment preference. However, the preferred loop alignment might depend on the loop itself. For recent POWER cores, loops between 5 and 8 instructions should have 32-byte alignment (while the others are better with 16-byte alignment) so that the entire loop will fit in one i-cache line. To support this, getPrefLoopAlignment has been made virtual, and can be provided with an optional MachineLoop* so the target can inspect the loop before answering the query. The default behavior, as before, is to return the value set with setPrefLoopAlignment. MachineBlockPlacement now queries the target for each loop instead of only once per function. There should be no functional change for other targets. llvm-svn: 225117
* Update SetVector to rely on the underlying set's insert to return a ↵David Blaikie2014-11-191-2/+2
| | | | | | | | | | | | | pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334
* Have MachineFunction cache a pointer to the subtarget to make lookupsEric Christopher2014-08-051-2/+2
| | | | | | | | | | | shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-041-2/+3
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* Revert "Introduce a string_ostream string builder facilty"Alp Toker2014-06-261-4/+8
| | | | | | Temporarily back out commits r211749, r211752 and r211754. llvm-svn: 211814
* Introduce a string_ostream string builder faciltyAlp Toker2014-06-261-8/+4
| | | | | | | | | | | | | | | | | | | | string_ostream is a safe and efficient string builder that combines opaque stack storage with a built-in ostream interface. small_string_ostream<bytes> additionally permits an explicit stack storage size other than the default 128 bytes to be provided. Beyond that, storage is transferred to the heap. This convenient class can be used in most places an std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair would previously have been used, in order to guarantee consistent access without byte truncation. The patch also converts much of LLVM to use the new facility. These changes include several probable bug fixes for truncated output, a programming error that's no longer possible with the new interface. llvm-svn: 211749
* [Modules] Remove potential ODR violations by sinking the DEBUG_TYPEChandler Carruth2014-04-221-1/+2
| | | | | | | | | | | | define below all header includes in the lib/CodeGen/... tree. While the current modules implementation doesn't check for this kind of ODR violation yet, it is likely to grow support for it in the future. It also removes one layer of macro pollution across all the included headers. Other sub-trees will follow. llvm-svn: 206837
* [C++11] More 'nullptr' conversion. In some cases just using a boolean check ↵Craig Topper2014-04-141-17/+17
| | | | | | instead of comparing to nullptr. llvm-svn: 206142
* Disable each MachineFunctionPass for 'optnone' functions, unless thatPaul Robinson2014-03-311-0/+3
| | | | | | | pass normally runs at optimization level None, or is part of the register allocation pipeline. llvm-svn: 205228
* [C++11] Add 'override' keyword to virtual methods that override their base ↵Craig Topper2014-03-071-4/+4
| | | | | | class. llvm-svn: 203220
* [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.Benjamin Kramer2014-03-021-13/+13
| | | | | | Remove the old functions. llvm-svn: 202636
* Now that we have C++11, turn simple functors into lambdas and remove a ton ↵Benjamin Kramer2014-03-011-18/+3
| | | | | | | | of boilerplate. No intended functionality change. llvm-svn: 202588
* Add a LLVM_DUMP_METHOD macro.Nico Weber2014-01-031-1/+1
| | | | | | | | | | | | | | The motivation is to mark dump methods as used in debug builds so that they can be called from lldb, but to not do so in release builds so that they can be dead-stripped. There's lots of potential follow-up work suggested in the thread "Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev, but everyone seems to agreen on this subset. Macro name chosen by fair coin toss. llvm-svn: 198456
* [block-freq] Update MachineBlockPlacement and RegAllocGreedy to use the new ↵Michael Gottesman2013-12-141-5/+6
| | | | | | MachineBlockFrequencyInfo methods. llvm-svn: 197290
* Fix gcc warnings.Matt Arsenault2013-12-101-0/+2
| | | | | | Unused variable and unused typedef in release build. llvm-svn: 196947
* Revert part of GCC warning fix to fix debug build.Matt Arsenault2013-12-051-0/+1
| | | | | | | The typedef is used inside the DEBUG(), and apparently can't be moved inside of it. llvm-svn: 196528
* Fix minor GCC warnings.Matt Arsenault2013-12-051-1/+0
| | | | | | Unused typedefs and unused variables. llvm-svn: 196526
* Output a bit more information in the debug printing for MBP. This wasChandler Carruth2013-11-251-3/+4
| | | | | | useful when analyzing parts of zlib's behavior here. llvm-svn: 195588
* MachineBlockPlacement: Strengthen the source order bias when picking an exit ↵Benjamin Kramer2013-11-201-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | block. We now only allow breaking source order if the exit block frequency is significantly higher than the other exit block. The actual bias is currently under a flag so the best cut-off can be found; the flag defaults to the old behavior. The idea is to get some benchmark coverage over different values for the flag and pick the best one. When we require the new frequency to be at least 20% higher than the old frequency I see a 5% speedup on zlib's deflate when compressing a random file on x86_64/westmere. Hal reported a small speedup on Fhourstones on a BG/Q and no regressions in the test suite. The test case is the full long_match function from zlib's deflate. I was reluctant to add it for previous tweaks to branch probabilities because it's large and potentially fragile, but changed my mind since it's an important use case and more likely to break with all the current work going into the PGO infrastructure. Differential Revision: http://llvm-reviews.chandlerc.com/D2202 llvm-svn: 195265
* Fix a defect in code-layout pass, improving Benchmarks/Olden/em3d/em3d by ↵Shuxin Yang2013-06-041-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | about 30% (4.58s vs 3.2s on an oldish Mac Tower). The corresponding src is excerpted bellow. The lopp accounts for about 90% of execution time. -------------------- cat -n test-suite/MultiSource/Benchmarks/Olden/em3d/make_graph.c 90 91 for (k=0; k<j; k++) 92 if (other_node == cur_node->to_nodes[k]) break; The defective layout is sketched bellow, where the two branches need to swap. ------------------------------------------------------------------------ L: ... if (cond) goto out-of-loop goto L While this code sequence is defective, I don't understand why it incurs 1/3 of execution time. CPU-event-profiling indicates the poor laoyout dose not increase in br-misprediction; it dosen't increase stall cycle at all, and it dosen't prevent the CPU detect the loop (i.e. Loop-Stream-Detector seems to be working fine as well)... The root cause of the problem is that the layout pass calls AnalyzeBranch() with basic-block which is not updated to reflect its current layout. rdar://13966341 llvm-svn: 183174
OpenPOWER on IntegriCloud