summaryrefslogtreecommitdiffstats
path: root/llvm/tools/llvm-mca/Views/BottleneckAnalysis.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [MC] Add parameter `Address` to MCInstPrinter::printInstFangrui Song2020-01-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | printInst prints a branch/call instruction as `b offset` (there are many variants on various targets) instead of `b address`. It is a convention to use address instead of offset in most external symbolizers/disassemblers. This difference makes `llvm-objdump -d` output unsatisfactory. Add `uint64_t Address` to printInst(), so that it can pass the argument to printInstruction(). `raw_ostream &OS` is moved to the last to be consistent with other print* methods. The next step is to pass `Address` to printInstruction() (generated by tablegen from the instruction set description). We can gradually migrate targets to print addresses instead of offsets. In any case, downstream projects which don't know `Address` can pass 0 as the argument. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72172
* [MCA] Improved cost computation for loop carried dependencies in the ↵Andrea Di Biagio2019-09-191-6/+34
| | | | | | | | | | | | | | | | bottleneck analysis. This patch introduces a cut-off threshold for dependency edge frequences with the goal of simplifying the critical sequence computation. This patch also removes the cost normalization for loop carried dependencies. We didn't really need to artificially amplify the cost of loop-carried dependencies since it is already computed as the integral over time of the delay (in cycle). In the absence of backend stalls there is no need for computing a critical sequence. With this patch we early exit from the critical sequence computation if no bottleneck was reported during the simulation. llvm-svn: 372337
* [MCA][Bottleneck Analysis] Teach how to compute a critical sequence of ↵Andrea Di Biagio2019-06-211-26/+236
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instructions based on the simulation. This patch teaches the bottleneck analysis how to identify and print the most expensive sequence of instructions according to the simulation. Fixes PR37494. The goal is to help users identify the sequence of instruction which is most critical for performance. A dependency graph is internally used by the bottleneck analysis to describe data dependencies and processor resource interferences between instructions. There is one node in the graph for every instruction in the input assembly sequence. The number of nodes in the graph is independent from the number of iterations simulated by the tool. It means that a single node of the graph represents all the possible instances of a same instruction contributed by the simulated iterations. Edges are dynamically "discovered" by the bottleneck analysis by observing instruction state transitions and "backend pressure increase" events generated by the Execute stage. Information from the events is used to identify critical dependencies, and materialize edges in the graph. A dependency edge is uniquely identified by a pair of node identifiers plus an instance of struct DependencyEdge::Dependency (which provides more details about the actual dependency kind). The bottleneck analysis internally ranks dependency edges based on their impact on the runtime (see field DependencyEdge::Dependency::Cost). To this end, each edge of the graph has an associated cost. By default, the cost of an edge is a function of its latency (in cycles). In practice, the cost of an edge is also a function of the number of cycles where the dependency has been seen as 'contributing to backend pressure increases'. The idea is that the higher the cost of an edge, the higher is the impact of the dependency on performance. To put it in another way, the cost of an edge is a measure of criticality for performance. Note how a same edge may be found in multiple iteration of the simulated loop. The logic that adds new edges to the graph checks if an equivalent dependency already exists (duplicate edges are not allowed). If an equivalent dependency edge is found, field DependencyEdge::Frequency of that edge is incremented by one, and the new cost is cumulatively added to the existing edge cost. At the end of simulation, costs are propagated to nodes through the edges of the graph. The goal is to identify a critical sequence from a node of the root-set (composed by node of the graph with no predecessors) to a 'sink node' with no successors. Note that the graph is intentionally kept acyclic to minimize the complexity of the critical sequence computation algorithm (complexity is currently linear in the number of nodes in the graph). The critical path is finally computed as a sequence of dependency edges. For edges describing processor resource interferences, the view also prints a so-called "interference probability" value (by dividing field DependencyEdge::Frequency by the total number of iterations). Examples of critical sequence computations can be found in tests added/modified by this patch. On output streams that support colored output, instructions from the critical sequence are rendered with a different color. Strictly speaking the analysis conducted by the bottleneck analysis view is not a critical path analysis. The cost of an edge doesn't only depend on the dependency latency. More importantly, the cost of a same edge may be computed differently by different iterations. The number of dependencies is discovered dynamically based on the events generated by the simulator. However, their number is not fixed. This is especially true for edges that model processor resource interferences; an interference may not occur in every iteration. For that reason, it makes sense to also print out a "probability of interference". By construction, the accuracy of this analysis (as always) is strongly dependent on the simulation (and therefore the quality of the information available in the scheduling model). That being said, the critical sequence effectively identifies a performance criticality. Instructions from that sequence are expected to have a very big impact on performance. So, users can take advantage of this information to focus their attention on specific interactions between instructions. In my experience, it works quite well in practice, and produces useful output (in a reasonable amount time). Differential Revision: https://reviews.llvm.org/D63543 llvm-svn: 364045
* [MCA] Slightly refactor the bottleneck analysis view. NFCIAndrea Di Biagio2019-06-181-50/+54
| | | | | | | | | | | This patch slightly refactors data structures internally used by the bottleneck analysis to track data and resource dependencies. This patch also updates methods used to print out information about dependency edges when in debug mode. This is the last of a sequence of commits done in preparation for an upcoming patch that fixes PR37494. No functional change intended. llvm-svn: 363677
* [MCA] Fix -Wunused-private-field warning after r362933. NFCAndrea Di Biagio2019-06-101-1/+1
| | | | | | This should unbreak the buildbots. llvm-svn: 362935
* [MCA] Further refactor the bottleneck analysis view. NFCI.Andrea Di Biagio2019-06-101-83/+133
| | | | llvm-svn: 362933
* [MCA] Remove unused fields from BottleneckAnalysis. NFCAndrea Di Biagio2019-05-311-7/+4
| | | | | | This should appease the buildbots. llvm-svn: 362251
* [MCA] Refactor class BottleneckAnalysis. NFCIAndrea Di Biagio2019-05-311-41/+262
| | | | | | | | | | | | | | | | | | | | The resource pressure distribution computation is now delegated by class BottleneckAnalysis to an instance of class PressureTracker. Class PressureTracker is also responsible for: - tracking users of processor resource units. - tracking the number of delay cycles caused by increases in backpressure. BottleneckAnalysis internally initializes a dependency graph. Each nodes represents an instruction in the input code sequence. Edges of the dependency graph are critical register/memory/resource dependencies. Dependencies are only added to the graph if they are seen as critical by backend pressure events. The DependencyGraph is currently unused. It is possible to print the dependency graph (see method DependencyGraph::dump()) for debugging purposes. The long term goal is to use the information stored by the dependency graph in order to do critical path computation. llvm-svn: 362246
* [MCA] Moved the bottleneck analysis to its own file. NFCIAndrea Di Biagio2019-04-171-0/+142
llvm-svn: 358554
OpenPOWER on IntegriCloud