summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/RegAllocPBQP.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatibleChandler Carruth2015-09-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with the new pass manager, and no longer relying on analysis groups. This builds essentially a ground-up new AA infrastructure stack for LLVM. The core ideas are the same that are used throughout the new pass manager: type erased polymorphism and direct composition. The design is as follows: - FunctionAAResults is a type-erasing alias analysis results aggregation interface to walk a single query across a range of results from different alias analyses. Currently this is function-specific as we always assume that aliasing queries are *within* a function. - AAResultBase is a CRTP utility providing stub implementations of various parts of the alias analysis result concept, notably in several cases in terms of other more general parts of the interface. This can be used to implement only a narrow part of the interface rather than the entire interface. This isn't really ideal, this logic should be hoisted into FunctionAAResults as currently it will cause a significant amount of redundant work, but it faithfully models the behavior of the prior infrastructure. - All the alias analysis passes are ported to be wrapper passes for the legacy PM and new-style analysis passes for the new PM with a shared result object. In some cases (most notably CFL), this is an extremely naive approach that we should revisit when we can specialize for the new pass manager. - BasicAA has been restructured to reflect that it is much more fundamentally a function analysis because it uses dominator trees and loop info that need to be constructed for each function. All of the references to getting alias analysis results have been updated to use the new aggregation interface. All the preservation and other pass management code has been updated accordingly. The way the FunctionAAResultsWrapperPass works is to detect the available alias analyses when run, and add them to the results object. This means that we should be able to continue to respect when various passes are added to the pipeline, for example adding CFL or adding TBAA passes should just cause their results to be available and to get folded into this. The exception to this rule is BasicAA which really needs to be a function pass due to using dominator trees and loop info. As a consequence, the FunctionAAResultsWrapperPass directly depends on BasicAA and always includes it in the aggregation. This has significant implications for preserving analyses. Generally, most passes shouldn't bother preserving FunctionAAResultsWrapperPass because rebuilding the results just updates the set of known AA passes. The exception to this rule are LoopPass instances which need to preserve all the function analyses that the loop pass manager will end up needing. This means preserving both BasicAAWrapperPass and the aggregating FunctionAAResultsWrapperPass. Now, when preserving an alias analysis, you do so by directly preserving that analysis. This is only necessary for non-immutable-pass-provided alias analyses though, and there are only three of interest: BasicAA, GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is preserved when needed because it (like DominatorTree and LoopInfo) is marked as a CFG-only pass. I've expanded GlobalsAA into the preserved set everywhere we previously were preserving all of AliasAnalysis, and I've added SCEVAA in the intersection of that with where we preserve SCEV itself. One significant challenge to all of this is that the CGSCC passes were actually using the alias analysis implementations by taking advantage of a pretty amazing set of loop holes in the old pass manager's analysis management code which allowed analysis groups to slide through in many cases. Moving away from analysis groups makes this problem much more obvious. To fix it, I've leveraged the flexibility the design of the new PM components provides to just directly construct the relevant alias analyses for the relevant functions in the IPO passes that need them. This is a bit hacky, but should go away with the new pass manager, and is already in many ways cleaner than the prior state. Another significant challenge is that various facilities of the old alias analysis infrastructure just don't fit any more. The most significant of these is the alias analysis 'counter' pass. That pass relied on the ability to snoop on AA queries at different points in the analysis group chain. Instead, I'm planning to build printing functionality directly into the aggregation layer. I've not included that in this patch merely to keep it smaller. Note that all of this needs a nearly complete rewrite of the AA documentation. I'm planning to do that, but I'd like to make sure the new design settles, and to flesh out a bit more of what it looks like in the new pass manager first. Differential Revision: http://reviews.llvm.org/D12080 llvm-svn: 247167
* Trace copies when checking for rematerializability in spill weight calculationRobert Lougher2015-08-101-3/+3
| | | | | | | | | | | | | | | PR24139 contains an analysis of poor register allocation. One of the findings was that when calculating the spill weight, a rematerializable interval once split is no longer rematerializable. This is because the isRematerializable check in CalcSpillWeights.cpp does not follow the copies introduced by live range splitting (after splitting, the live interval register definition is a copy which is not rematerializable). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D11686 llvm-svn: 244439
* [PBQP] Use a local bit-matrix to speedup searching an edge in the graph.Arnaud A. de Grandmaison2015-03-051-3/+10
| | | | | | | | | Build time (user time) for building llvm+clang+lldb in release mode: - default allocator: 9086 seconds - with PBQP: 9126 seconds - with PBQP + local bit matrix cache: 9097 seconds llvm-svn: 231360
* [PBQP] Address post-commit style comment for r230904. NFC.Arnaud A. de Grandmaison2015-03-011-2/+2
| | | | | | Thanks David ! llvm-svn: 230908
* [PBQP] Do not add an edge between nodes with totally disjoint allowed registersArnaud A. de Grandmaison2015-03-011-9/+61
| | | | | | | | | | | Such edges are zero matrix, and they bring no additional info to the allocation problem, apart from contributing to nodes' degree. Removing those edges is expected to improve allocation time. Tune the spill cost comparison, as this gives better average performances now that the nodes' degrees has changed. llvm-svn: 230904
* Prefer SmallVector::append/insert over push_back loops.Benjamin Kramer2015-02-171-2/+1
| | | | | | Same functionality, but hoists the vector growth out of the loop. llvm-svn: 229500
* [PBQP] Conservativelly allocatable nodes can be spilled and give a better ↵Arnaud A. de Grandmaison2015-02-131-2/+0
| | | | | | | | | | solution Although such nodes are allocatable, the cost of spilling may be less than allocating to register, so spilling the node may provide a better solution. The assert does not account for this case, so remove it for now. llvm-svn: 229103
* [PBQP] Cautiously update edge costs in the solverArnaud A. de Grandmaison2015-02-111-1/+3
| | | | | | | | | | | | | | | | | | The NodeMetadata are maintained in an incremental way. When an edge between 2 nodes has its cost updated, in the course of graph reduction for example, the NodeMetadata need first to have the old edge cost removed, then the new edge cost added. Only once the NodeMetadata have been fully updated, it becomes safe to consider promoting the nodes to the ConservativelyAllocatable or OptimallyReducible sets. Previously, this promotion was occuring right after the removing the old cost, and this was breaking the assumption that a ConservativelyAllocatable should not be spilled. This patch also adds asserts to: - enforces the invariant that a node's reduction can not be downgraded, - only not provably allocatable or optimally reducible nodes can be spilled. llvm-svn: 228816
* [PBQP] Fix comment wording. NFCArnaud A. de Grandmaison2015-02-061-1/+1
| | | | llvm-svn: 228390
* [PBQP] Provide more information in the debug printsArnaud A. de Grandmaison2015-02-031-1/+74
| | | | | | Based on a patch by Jonas Paulsson llvm-svn: 228068
* [PBQP Regalloc] Pre-spill vregs that have no legal physregs.Lang Hames2015-02-031-26/+57
| | | | | | | | | | | | | The PBQP::RegAlloc::MatrixMetadata class assumes that matrices have at least two rows/columns (for the spill option plus at least one physreg). This patch ensures that that invariant is met by pre-spilling vregs that have no physreg options so that no node (and no corresponding edges) need be added to the PBQP graph. This fixes a bug in an out-of-tree target that was identified by Jonas Paulsson. Thanks for tracking this down Jonas! llvm-svn: 227942
* Have the PBQP register allocator use the subtarget on the MachineFunction.Eric Christopher2015-01-271-8/+5
| | | | | | (and remove an extraneous private). llvm-svn: 227181
* [PBQP] Callee saved regs should have a higher cost than scratch regsArnaud A. de Grandmaison2014-11-041-0/+16
| | | | | | | | | | | Registers are not all equal. Some are not allocatable (infinite cost), some have to be preserved but can be used, and some others are just free to use. Ensure there is a cost hierarchy reflecting this fact, so that the allocator will favor scratch registers over callee-saved registers. llvm-svn: 221293
* [PBQP] Tweak spill costs and coalescing benefitsArnaud A. de Grandmaison2014-11-041-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves how the different costs (register, interference, spill and coalescing) relates together. The assumption is now that: - coalescing (or any other "side effect" of reg alloc) is negative, and instead of being derived from a spill cost, they use the block frequency info. - spill costs are in the [MinSpillCost:+inf( range - register or interference costs are in [0.0:MinSpillCost( or +inf The current MinSpillCost is set to 10.0, which is a random value high enough that the current constraint builders do not need to worry about when settings costs. It would however be worth adding a normalization step for register and interference costs as the last step in the constraint builder chain to ensure they are not greater than SpillMinCost (unless this has some sense for some architectures). This would work well with the current builder pipeline, where all costs are tweaked relatively to each others, but could grow above MinSpillCost if the pipeline is deep enough. The current heuristic is tuned to depend rather on the number of uses of a live interval rather than a density of uses, as used by the greedy allocator. This heuristic provides a few percent improvement on a number of benchmarks (eembc, spec, ...) and will definitely need to change once spill placement is implemented: the current spill placement is really ineficient, so making the cost proportionnal to the number of use is a clear win. llvm-svn: 221292
* [PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of theseLang Hames2014-10-271-29/+50
| | | | | | | | | | | sets as keys into a cache of interference matrice values in the Interference constraint adder. Creating interference matrices was one of the large remaining time-sinks in PBQP. Caching them reduces the total compile time (when using PBQP) on the nightly test suite by ~10%. llvm-svn: 220688
* [PBQP] Fix coalescing benefitsArnaud A. de Grandmaison2014-10-211-2/+2
| | | | | | As coalescing registers is a benefit, the cost should be improved (i.e. made smaller) when coalescing is possible. llvm-svn: 220302
* [PBQP] Replace the interference-constraints algorithm with a faster versionLang Hames2014-10-181-16/+115
| | | | | | | | loosely based on linear scan. On x86-64 this is good for a ~2% drop in compile time on the nightly test suite. llvm-svn: 220143
* Remove unnecessary TargetMachine.h includes.Eric Christopher2014-10-141-1/+0
| | | | llvm-svn: 219672
* [PBQP] Replace PBQPBuilder with composable constraints (PBQPRAConstraint).Lang Hames2014-10-091-355/+307
| | | | | | | | | | | | | | | | This patch removes the PBQPBuilder class and its subclasses and replaces them with a composable constraints class: PBQPRAConstraint. This allows constraints that are only required for optimisation (e.g. coalescing, soft pairing) to be mixed and matched. This patch also introduces support for target writers to supply custom constraints for their targets by overriding a TargetSubtargetInfo method: std::unique_ptr<PBQPRAConstraints> getCustomPBQPConstraints() const; This patch should have no effect on allocations. llvm-svn: 219421
* unique_ptrify PBQPBuilder::buildDavid Blaikie2014-09-021-13/+13
| | | | llvm-svn: 216918
* [PBQP] Only output debug information when requestedArnaud A. de Grandmaison2014-08-281-2/+2
| | | | llvm-svn: 216660
* Modernize raw_fd_ostream's constructor a bit.Rafael Espindola2014-08-251-2/+2
| | | | | | | | | | Take a StringRef instead of a "const char *". Take a "std::error_code &" instead of a "std::string &" for error. A create static method would be even better, but this patch is already a bit too big. llvm-svn: 216393
* Have MachineFunction cache a pointer to the subtarget to make lookupsEric Christopher2014-08-051-2/+1
| | | | | | | | | | | shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-041-4/+6
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* Sure up ownership passing of the PBQPBuilder by passing unique_ptrs by value ↵David Blaikie2014-07-191-7/+7
| | | | | | | | | rather than lvalue reference. Also removes an unnecessary '.release()' that should've been a std::move anyway. (I'm on a hunt for '.release()' calls) llvm-svn: 213464
* Convert more loops to range-based equivalentsAlexey Samsonov2014-04-301-12/+4
| | | | llvm-svn: 207714
* raw_ostream: Forward declare OpenFlags and include FileSystem.h only where ↵Benjamin Kramer2014-04-291-0/+1
| | | | | | necessary. llvm-svn: 207593
* [Modules] Remove potential ODR violations by sinking the DEBUG_TYPEChandler Carruth2014-04-221-2/+2
| | | | | | | | | | | | define below all header includes in the lib/CodeGen/... tree. While the current modules implementation doesn't check for this kind of ODR violation yet, it is likely to grow support for it in the future. It also removes one layer of macro pollution across all the included headers. Other sub-trees will follow. llvm-svn: 206837
* [C++11] More 'nullptr' conversion. In some cases just using a boolean check ↵Craig Topper2014-04-141-1/+1
| | | | | | instead of comparing to nullptr. llvm-svn: 206142
* Make consistent use of MCPhysReg instead of uint16_t throughout the tree.Craig Topper2014-04-041-1/+1
| | | | llvm-svn: 205610
* [C++11] Add 'override' keyword to virtual methods that override their base ↵Craig Topper2014-03-071-3/+3
| | | | | | class. llvm-svn: 203220
* Replace OwningPtr<T> with std::unique_ptr<T>.Ahmed Charles2014-03-061-13/+11
| | | | | | | | | | This compiles with no changes to clang/lld/lldb with MSVC and includes overloads to various functions which are used by those projects and llvm which have OwningPtr's as parameters. This should allow out of tree projects some time to move. There are also no changes to libs/Target, which should help out of tree targets have time to move, if necessary. llvm-svn: 203083
* [C++11] Replace OwningPtr::take() with OwningPtr::release().Ahmed Charles2014-03-051-3/+3
| | | | llvm-svn: 202957
* Re-apply r202551, which introduced new PBQP solver.Lang Hames2014-03-031-38/+38
| | | | llvm-svn: 202735
* [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.Benjamin Kramer2014-03-021-2/+2
| | | | | | Remove the old functions. llvm-svn: 202636
* Jumped the gun with r202551 and broke some bots that weren't yet C++11ified.Lang Hames2014-02-281-38/+38
| | | | | | Reverting until the C++11 switch is complete. llvm-svn: 202554
* New PBQP solver, and updates to the PBQP graph.Lang Hames2014-02-281-38/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous PBQP solver was very robust but consumed a lot of memory, performed a lot of redundant computation, and contained some unnecessarily tight coupling that prevented experimentation with novel solution techniques. This new solver is an attempt to address these shortcomings. Important/interesting changes: 1) The domain-independent PBQP solver class, HeuristicSolverImpl, is gone. It is replaced by a register allocation specific solver, PBQP::RegAlloc::Solver (see RegAllocSolver.h). The optimal reduction rules and the backpropagation algorithm have been extracted into stand-alone functions (see ReductionRules.h), which can be used to build domain specific PBQP solvers. This provides many more opportunities for domain-specific knowledge to inform the PBQP solvers' decisions. In theory this should allow us to generate better solutions. In practice, we can at least test out ideas now. As a side benefit, I believe the new solver is more readable than the old one. 2) The solver type is now a template parameter of the PBQP graph. This allows the graph to notify the solver of any modifications made (e.g. by domain independent rules) without the overhead of a virtual call. It also allows the solver to supply policy information to the graph (see below). 3) Significantly reduced memory overhead. Memory management policy is now an explicit property of the PBQP graph (via the CostAllocator typedef on the graph's solver template argument). Because PBQP graphs for register allocation tend to contain many redundant instances of single values (E.g. the value representing an interference constraint between GPRs), the new RASolver class uses a uniquing scheme. This massively reduces memory consumption for large register allocation problems. For example, looking at the largest interference graph in each of the SPEC2006 benchmarks (the largest graph will always set the memory consumption high-water mark for PBQP), the average memory reduction for the PBQP costs was 400x. That's times, not percent. The highest was 1400x. Yikes. So - this is fixed. "PBQP: No longer feasting upon every last byte of your RAM". Minor details: - Fully C++11'd. Never copy-construct another vector/matrix! - Cute tricks with cost metadata: Metadata that is derived solely from cost matrices/vectors is attached directly to the cost instances themselves. That way if you unique the costs you never have to recompute the metadata. 400x less memory means 400x less cost metadata (re)computation. Special thanks to Arnaud de Grandmaison, who has been the source of much encouragement, and of many very useful test cases. This new solver forms the basis for future work, of which there's plenty to do. I will be adding TODO notes shortly. - Lang. llvm-svn: 202551
* Replace the F_Binary flag with a F_Text one.Rafael Espindola2014-02-241-1/+1
| | | | | | | | | After this I will set the default back to F_None. The advantage is that before this patch forgetting to set F_Binary would corrupt a file on windows. Forgetting to set F_Text produces one that cannot be read in notepad, which is a better failure mode :-) llvm-svn: 202052
* Don't make F_None the default.Rafael Espindola2014-02-241-1/+1
| | | | | | This will make it easier to switch the default to being binary files. llvm-svn: 202042
* [block-freq] Refactor LiveInterals::getSpillWeight to use the new ↵Michael Gottesman2013-12-141-2/+1
| | | | | | | | | | | | | | | | | | | | | | | MachineBlockFrequencyInfo methods. This is slightly more interesting than the previous batch of changes. Specifically: 1. We refactor getSpillWeight to take a MachineBlockFrequencyInfo (MBFI) object. This enables us to completely encapsulate the actual manner we use the MachineBlockFrequencyInfo to get our spill weights. This yields cleaner code since one does not need to fetch the actual block frequency before getting the spill weight if all one wants it the spill weight. It also gives us access to entry frequency which we need for our computation. 2. Instead of having getSpillWeight take a MachineBasicBlock (as one might think) to look up the block frequency via the MBFI object, we instead take in a MachineInstr object. The reason for this is that the method is supposed to return the spill weight for an instruction according to the comments around the function. llvm-svn: 197296
* CalcSpillWeights: give a better describing name to calculateSpillWeightsArnaud A. de Grandmaison2013-11-111-1/+2
| | | | | | | | Besides, this relates it more obviously to the VirtRegAuxInfo::calculateSpillWeightAndHint. No functionnal change. llvm-svn: 194404
* CalculateSpillWeights does not need to be a passArnaud A. de Grandmaison2013-11-101-2/+2
| | | | | | | | | | Based on discussions with Lang Hames and Jakob Stoklund Olesen at the hacker's lab, and in the light of upcoming work on the PBQP register allocator, it was though that CalcSpillWeights does not need to be a pass. This change will enable to customize / tune the spill weight computation depending on the allocator. Update the documentation style while there. No functionnal change. llvm-svn: 194356
* Re-apply r194300 with fixes for warnings.Lang Hames2013-11-091-14/+14
| | | | llvm-svn: 194311
* Revert r194300 which broke the build.Nick Lewycky2013-11-091-14/+14
| | | | llvm-svn: 194308
* Rewrite the PBQP graph data structure.Lang Hames2013-11-091-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The new graph structure replaces the node and edge linked lists with vectors. Free lists (well, free vectors) are used for fast insertion/deletion. The ultimate aim is to make PBQP graphs cheap to clone. The motivation is that the PBQP solver destructively consumes input graphs while computing a solution, forcing the graph to be fully reconstructed for each round of PBQP. This imposes a high cost on large functions, which often require several rounds of solving/spilling to find a final register allocation. If we can cheaply clone the PBQP graph and incrementally update it between rounds then hopefully we can reduce this cost. Further, once we begin pooling matrix/vector values (future work), we can cache some PBQP solver metadata and share it between cloned graphs, allowing the PBQP solver to re-use some of the computation done in earlier rounds. For now this is just a data structure update. The allocator and solver still use the graph the same way as before, fully reconstructing it between each round. I expect no material change from this update, although it may change the iteration order of the nodes, causing ties in the solver to break in different directions, and this could perturb the generated allocations (hopefully in a completely benign way). Thanks very much to Arnaud Allard de Grandmaison for encouraging me to get back to work on this, and for a lot of discussion and many useful PBQP test cases. llvm-svn: 194300
* Revert "CalculateSpillWeights does not need to be a pass"Arnaud A. de Grandmaison2013-11-081-2/+2
| | | | | | Temporarily revert my previous commit until I understand why it breaks 3 target tests. llvm-svn: 194272
* CalculateSpillWeights does not need to be a passArnaud A. de Grandmaison2013-11-081-2/+2
| | | | | | | | | | Based on discussions with Lang Hames and Jakob Stoklund Olesen at the hacker's lab, and in the light of upcoming work on the PBQP register allocator, it was though that CalcSpillWeights does not need to be a pass. This change will enable to customize / tune the spill weight computation depending on the allocator. Update the documentation style while there. No functionnal change. llvm-svn: 194269
* Track new virtual registers by register number.Mark Lacey2013-08-141-4/+5
| | | | | | | | | | Track new virtual registers by register number, rather than by the live interval created for them. This is the first step in separating the creation of new virtual registers and new live intervals. Eventually live intervals will be created and populated on demand after the virtual registers have been created and used in instructions. llvm-svn: 188434
* Make PBQP require/preserve MachineLoopInfo - the spiller requires it.Lang Hames2013-07-011-0/+3
| | | | llvm-svn: 185378
* Switch spill weights from a basic loop depth estimation to BlockFrequencyInfo.Benjamin Kramer2013-06-171-11/+10
| | | | | | | | | | | | | | | | | | The main advantages here are way better heuristics, taking into account not just loop depth but also __builtin_expect and other static heuristics and will eventually learn how to use profile info. Most of the work in this patch is pushing the MachineBlockFrequencyInfo analysis into the right places. This is good for a 5% speedup on zlib's deflate (x86_64), there were some very unfortunate spilling decisions in its hottest loop in longest_match(). Other benchmarks I tried were mostly neutral. This changes register allocation in subtle ways, update the tests for it. 2012-02-20-MachineCPBug.ll was deleted as it's very fragile and the instruction it looked for was gone already (but the FileCheck pattern picked up unrelated stuff). llvm-svn: 184105
OpenPOWER on IntegriCloud