summaryrefslogtreecommitdiffstats
path: root/llvm/tools/llvm-mca/Views
Commit message (Collapse)AuthorAgeFilesLines
* [MC] Add parameter `Address` to MCInstPrinter::printInstFangrui Song2020-01-064-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | printInst prints a branch/call instruction as `b offset` (there are many variants on various targets) instead of `b address`. It is a convention to use address instead of offset in most external symbolizers/disassemblers. This difference makes `llvm-objdump -d` output unsatisfactory. Add `uint64_t Address` to printInst(), so that it can pass the argument to printInstruction(). `raw_ostream &OS` is moved to the last to be consistent with other print* methods. The next step is to pass `Address` to printInstruction() (generated by tablegen from the instruction set description). We can gradually migrate targets to print addresses instead of offsets. In any case, downstream projects which don't know `Address` can pass 0 as the argument. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72172
* [Tools] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-224-4/+5
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71808
* [MCA] Show aggregate over Average Wait times for the whole snippet (PR43219)Roman Lebedev2019-10-102-9/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: As disscused in https://bugs.llvm.org/show_bug.cgi?id=43219, i believe it may be somewhat useful to show //some// aggregates over all the sea of statistics provided. Example: ``` Average Wait times (based on the timeline view): [0]: Executions [1]: Average time spent waiting in a scheduler's queue [2]: Average time spent waiting in a scheduler's queue while ready [3]: Average time elapsed from WB until retire stage [0] [1] [2] [3] 0. 3 1.0 1.0 4.7 vmulps %xmm0, %xmm1, %xmm2 1. 3 2.7 0.0 2.3 vhaddps %xmm2, %xmm2, %xmm3 2. 3 6.0 0.0 0.0 vhaddps %xmm3, %xmm3, %xmm4 3 3.2 0.3 2.3 <total> ``` I.e. we average the averages. Reviewers: andreadb, mattd, RKSimon Reviewed By: andreadb Subscribers: gbedwell, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68714 llvm-svn: 374361
* [MCA] Improved cost computation for loop carried dependencies in the ↵Andrea Di Biagio2019-09-192-9/+39
| | | | | | | | | | | | | | | | bottleneck analysis. This patch introduces a cut-off threshold for dependency edge frequences with the goal of simplifying the critical sequence computation. This patch also removes the cost normalization for loop carried dependencies. We didn't really need to artificially amplify the cost of loop-carried dependencies since it is already computed as the integral over time of the delay (in cycle). In the absence of backend stalls there is no need for computing a critical sequence. With this patch we early exit from the critical sequence computation if no bottleneck was reported during the simulation. llvm-svn: 372337
* [MCA] Add flag -show-encoding to llvm-mca.Andrea Di Biagio2019-08-092-9/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flag -show-encoding enables the printing of instruction encodings as part of the the instruction info view. Example (with flags -mtriple=x86_64-- -mcpu=btver2): Instruction Info: [1]: #uOps [2]: Latency [3]: RThroughput [4]: MayLoad [5]: MayStore [6]: HasSideEffects (U) [7]: Encoding Size [1] [2] [3] [4] [5] [6] [7] Encodings: Instructions: 1 2 1.00 4 c5 f0 59 d0 vmulps %xmm0, %xmm1, %xmm2 1 4 1.00 4 c5 eb 7c da vhaddps %xmm2, %xmm2, %xmm3 1 4 1.00 4 c5 e3 7c e3 vhaddps %xmm3, %xmm3, %xmm4 In this example, column Encoding Size is the size in bytes of the instruction encoding. Column Encodings reports the actual instruction encodings as byte sequences in hex (objdump style). The computation of encodings is done by a utility class named mca::CodeEmitter. In future, I plan to expose the CodeEmitter to the instruction builder, so that information about instruction encoding sizes can be used by the simulator. That would be a first step towards simulating the throughput from the decoders in the hardware frontend. Differential Revision: https://reviews.llvm.org/D65948 llvm-svn: 368432
* Revert r367649: Improve raw_ostream so that you can "write" colors using ↵Rui Ueyama2019-08-021-3/+3
| | | | | | | | operator<< This reverts commit r367649 in an attempt to unbreak Windows bots. llvm-svn: 367658
* Improve raw_ostream so that you can "write" colors using operator<<Rui Ueyama2019-08-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. raw_ostream supports ANSI colors so that you can write messages to the termina with colors. Previously, in order to change and reset color, you had to call `changeColor` and `resetColor` functions, respectively. So, if you print out "error: " in red, for example, you had to do something like this: OS.changeColor(raw_ostream::RED); OS << "error: "; OS.resetColor(); With this patch, you can write the same code as follows: OS << raw_ostream::RED << "error: " << raw_ostream::RESET; 2. Add a boolean flag to raw_ostream so that you can disable colored output. If you disable colors, changeColor, operator<<(Color), resetColor and other color-related functions have no effect. Most LLVM tools automatically prints out messages using colors, and you can disable it by passing a flag such as `--disable-colors`. This new flag makes it easy to write code that works that way. Differential Revision: https://reviews.llvm.org/D65564 llvm-svn: 367649
* [MCA][Bottleneck Analysis] Teach how to compute a critical sequence of ↵Andrea Di Biagio2019-06-212-39/+355
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instructions based on the simulation. This patch teaches the bottleneck analysis how to identify and print the most expensive sequence of instructions according to the simulation. Fixes PR37494. The goal is to help users identify the sequence of instruction which is most critical for performance. A dependency graph is internally used by the bottleneck analysis to describe data dependencies and processor resource interferences between instructions. There is one node in the graph for every instruction in the input assembly sequence. The number of nodes in the graph is independent from the number of iterations simulated by the tool. It means that a single node of the graph represents all the possible instances of a same instruction contributed by the simulated iterations. Edges are dynamically "discovered" by the bottleneck analysis by observing instruction state transitions and "backend pressure increase" events generated by the Execute stage. Information from the events is used to identify critical dependencies, and materialize edges in the graph. A dependency edge is uniquely identified by a pair of node identifiers plus an instance of struct DependencyEdge::Dependency (which provides more details about the actual dependency kind). The bottleneck analysis internally ranks dependency edges based on their impact on the runtime (see field DependencyEdge::Dependency::Cost). To this end, each edge of the graph has an associated cost. By default, the cost of an edge is a function of its latency (in cycles). In practice, the cost of an edge is also a function of the number of cycles where the dependency has been seen as 'contributing to backend pressure increases'. The idea is that the higher the cost of an edge, the higher is the impact of the dependency on performance. To put it in another way, the cost of an edge is a measure of criticality for performance. Note how a same edge may be found in multiple iteration of the simulated loop. The logic that adds new edges to the graph checks if an equivalent dependency already exists (duplicate edges are not allowed). If an equivalent dependency edge is found, field DependencyEdge::Frequency of that edge is incremented by one, and the new cost is cumulatively added to the existing edge cost. At the end of simulation, costs are propagated to nodes through the edges of the graph. The goal is to identify a critical sequence from a node of the root-set (composed by node of the graph with no predecessors) to a 'sink node' with no successors. Note that the graph is intentionally kept acyclic to minimize the complexity of the critical sequence computation algorithm (complexity is currently linear in the number of nodes in the graph). The critical path is finally computed as a sequence of dependency edges. For edges describing processor resource interferences, the view also prints a so-called "interference probability" value (by dividing field DependencyEdge::Frequency by the total number of iterations). Examples of critical sequence computations can be found in tests added/modified by this patch. On output streams that support colored output, instructions from the critical sequence are rendered with a different color. Strictly speaking the analysis conducted by the bottleneck analysis view is not a critical path analysis. The cost of an edge doesn't only depend on the dependency latency. More importantly, the cost of a same edge may be computed differently by different iterations. The number of dependencies is discovered dynamically based on the events generated by the simulator. However, their number is not fixed. This is especially true for edges that model processor resource interferences; an interference may not occur in every iteration. For that reason, it makes sense to also print out a "probability of interference". By construction, the accuracy of this analysis (as always) is strongly dependent on the simulation (and therefore the quality of the information available in the scheduling model). That being said, the critical sequence effectively identifies a performance criticality. Instructions from that sequence are expected to have a very big impact on performance. So, users can take advantage of this information to focus their attention on specific interactions between instructions. In my experience, it works quite well in practice, and produces useful output (in a reasonable amount time). Differential Revision: https://reviews.llvm.org/D63543 llvm-svn: 364045
* [MCA] Slightly refactor the bottleneck analysis view. NFCIAndrea Di Biagio2019-06-182-69/+88
| | | | | | | | | | | This patch slightly refactors data structures internally used by the bottleneck analysis to track data and resource dependencies. This patch also updates methods used to print out information about dependency edges when in debug mode. This is the last of a sequence of commits done in preparation for an upcoming patch that fixes PR37494. No functional change intended. llvm-svn: 363677
* [MCA] Fix -Wunused-private-field warning after r362933. NFCAndrea Di Biagio2019-06-102-3/+2
| | | | | | This should unbreak the buildbots. llvm-svn: 362935
* [MCA] Further refactor the bottleneck analysis view. NFCI.Andrea Di Biagio2019-06-102-110/+171
| | | | llvm-svn: 362933
* [MCA] Remove unused fields from BottleneckAnalysis. NFCAndrea Di Biagio2019-05-312-13/+11
| | | | | | This should appease the buildbots. llvm-svn: 362251
* [MCA] Refactor class BottleneckAnalysis. NFCIAndrea Di Biagio2019-05-312-79/+404
| | | | | | | | | | | | | | | | | | | | The resource pressure distribution computation is now delegated by class BottleneckAnalysis to an instance of class PressureTracker. Class PressureTracker is also responsible for: - tracking users of processor resource units. - tracking the number of delay cycles caused by increases in backpressure. BottleneckAnalysis internally initializes a dependency graph. Each nodes represents an instruction in the input code sequence. Edges of the dependency graph are critical register/memory/resource dependencies. Dependencies are only added to the graph if they are seen as critical by backend pressure events. The DependencyGraph is currently unused. It is possible to print the dependency graph (see method DependencyGraph::dump()) for debugging purposes. The long term goal is to use the information stored by the dependency graph in order to do critical path computation. llvm-svn: 362246
* [MCA] Moved the bottleneck analysis to its own file. NFCIAndrea Di Biagio2019-04-174-153/+254
| | | | llvm-svn: 358554
* [llvm-mca][scheduler-stats] Print issued micro opcodes per cycle. NFCIAndrea Di Biagio2019-04-082-18/+16
| | | | | | | | | It makes more sense to print out the number of micro opcodes that are issued every cycle rather than the number of instructions issued per cycle. This behavior is also consistent with the dispatch-stats: numbers from the two views can now be easily compared. llvm-svn: 357919
* [MCA] Correctly update the UsedResourceGroups mask in the InstrBuilder.Andrea Di Biagio2019-03-261-1/+2
| | | | | | | | Found by inspection when looking at the debug output of MCA. This problem was latent, and none of the upstream models were affected by it. No functional change intended. llvm-svn: 357000
* [llvm-mca] Emit a message when no bottlenecks are identified.Matt Davis2019-03-072-6/+12
| | | | | | | | | | | | | | | | | | | | Summary: Since bottleneck hints are enabled via user request, it can be confusing if no bottleneck information is presented. Such is the case when no bottlenecks are identified. This patch emits a message in that case. Reviewers: andreadb Reviewed By: andreadb Subscribers: tschuett, gbedwell, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59098 llvm-svn: 355628
* [MCA] Correctly initialize struct SummaryView::BackPressureInfo.Andrea Di Biagio2019-03-041-1/+1
| | | | | | This should appease the buildbots. llvm-svn: 355309
* [MCA] Highlight kernel bottlenecks in the summary view.Andrea Di Biagio2019-03-042-3/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new flag named -bottleneck-analysis to print out information about throughput bottlenecks. MCA knows how to identify and classify dynamic dispatch stalls. However, it doesn't know how to analyze and highlight kernel bottlenecks. The goal of this patch is to teach MCA how to correlate increases in backend pressure to backend stalls (and therefore, the loss of throughput). From a Scheduler point of view, backend pressure is a function of the scheduler buffer usage (i.e. how the number of uOps in the scheduler buffers changes over time). Backend pressure increases (or decreases) when there is a mismatch between the number of opcodes dispatched, and the number of opcodes issued in the same cycle. Since buffer resources are limited, continuous increases in backend pressure would eventually leads to dispatch stalls. So, there is a strong correlation between dispatch stalls, and how backpressure changed over time. This patch teaches how to identify situations where backend pressure increases due to: - unavailable pipeline resources. - data dependencies. Data dependencies may delay execution of instructions and therefore increase the time that uOps have to spend in the scheduler buffers. That often translates to an increase in backend pressure which may eventually lead to a bottleneck. Contention on pipeline resources may also delay execution of instructions, and lead to a temporary increase in backend pressure. Internally, the Scheduler classifies instructions based on whether register / memory operands are available or not. An instruction is marked as "ready to execute" only if data dependencies are fully resolved. Every cycle, the Scheduler attempts to execute all instructions that are ready to execute. If an instruction cannot execute because of unavailable pipeline resources, then the Scheduler internally updates a BusyResourceUnits mask with the ID of each unavailable resource. ExecuteStage is responsible for tracking changes in backend pressure. If backend pressure increases during a cycle because of contention on pipeline resources, then ExecuteStage sends a "backend pressure" event to the listeners. That event would contain information about instructions delayed by resource pressure, as well as the BusyResourceUnits mask. Note that ExecuteStage also knows how to identify situations where backpressure increased because of delays introduced by data dependencies. The SummaryView observes "backend pressure" events and prints out a "bottleneck report". Example of bottleneck report: ``` Cycles with backend pressure increase [ 99.89% ] Throughput Bottlenecks: Resource Pressure [ 0.00% ] Data Dependencies: [ 99.89% ] - Register Dependencies [ 0.00% ] - Memory Dependencies [ 99.89% ] ``` A bottleneck report is printed out only if increases in backend pressure eventually caused backend stalls. About the time complexity: Time complexity is linear in the number of instructions in the Scheduler::PendingSet. The average slowdown tends to be in the range of ~5-6%. For memory intensive kernels, the slowdown can be significant if flag -noalias=false is specified. In the worst case scenario I have observed a slowdown of ~30% when flag -noalias=false was specified. We can definitely recover part of that slowdown if we optimize class LSUnit (by doing extra bookkeeping to speedup queries). For now, this new analysis is disabled by default, and it can be enabled via flag -bottleneck-analysis. Users of MCA as a library can enable the generation of pressure events through the constructor of ExecuteStage. This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494 Differential Revision: https://reviews.llvm.org/D58728 llvm-svn: 355308
* [MCA][ResourceManager] Add a table that maps processor resource indices to ↵Andrea Di Biagio2019-02-202-6/+11
| | | | | | | | | | | | processor resource identifiers. This patch adds a lookup table to speed up resource queries in the ResourceManager. This patch also moves helper function 'getResourceStateIndex()' from ResourceManager.cpp to Support.h, so that we can reuse that logic in the SummaryView (and potentially other views in llvm-mca). No functional change intended. llvm-svn: 354470
* [MC][X86] Correctly model additional operand latency caused by transfer ↵Andrea Di Biagio2019-01-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | delays from the integer to the floating point unit. This patch adds a new ReadAdvance definition named ReadInt2Fpu. ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by data transfers from the integer unit to the floating point unit. ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all x86 models excluding BtVer2. That means, this patch is only a functional change for the Jaguar cpu model only. Tablegen definitions for instructions (V)PINSR* have been updated to account for the new ReadInt2Fpu. That read is mapped to the the GPR input operand. On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch, that extra delay was added to the opcode latency. In practice, the insert opcode only executes for 1cy. Most of the actual latency is actually contributed by the so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR* latency is defined by expression f+1, where f is defined as a forwarding delay from the integer unit to the fpu. When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC (only when flag -print-schedule is speified), we now need to account for any extra forwarding delays. We do this by checking if scheduling classes declare any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td: "A negative advance effectively increases latency, which may be used for cross-domain stalls". When computing the instruction latency for the purpose of our scheduling tests, we now add any extra delay to the formula. This avoids regressing existing codegen and mca schedule tests. It comes with the cost of an extra (but very simple) hook in MCSchedModel. Differential Revision: https://reviews.llvm.org/D57056 llvm-svn: 351965
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-1918-72/+54
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [MCA] Fix wrong definition of ResourceUnitMask in DefaultResourceStrategy.Andrea Di Biagio2019-01-101-1/+2
| | | | | | | | | | | | | | Field ResourceUnitMask was incorrectly defined as a 'const unsigned' mask. It should have been a 64 bit quantity instead. That means, ResourceUnitMask was always implicitly truncated to a 32 bit quantity. This issue has been found by inspection. Surprisingly, that bug was latent, and it never negatively affected any existing upstream targets. This patch fixes the wrong definition of ResourceUnitMask, and adds a bunch of extra debug prints to help debugging potential issues related to invalid processor resource masks. llvm-svn: 350820
* [llvm-mca] Move llvm-mca library to llvm/lib/MCA.Clement Courbet2018-12-172-2/+2
| | | | | | | | | | | | Summary: See PR38731. Reviewers: andreadb Subscribers: mgorny, javed.absar, tschuett, gbedwell, andreadb, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D55557 llvm-svn: 349332
* [llvm-mca][MC] Add the ability to declare which processor resources model ↵Andrea Di Biagio2018-11-292-10/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | load/store queues (PR36666). This patch adds the ability to specify via tablegen which processor resources are load/store queue resources. A new tablegen class named MemoryQueue can be optionally used to mark resources that model load/store queues. Information about the load/store queue is collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and `StoreQueueID`. Those two fields are identifiers for buffered resources used to describe the load queue and the store queue. Field `BufferSize` is interpreted as the number of entries in the queue, while the number of units is a throughput indicator (i.e. number of available pickers for loads/stores). At construction time, LSUnit in llvm-mca checks for the presence of extra processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If that information is available, and fields LoadQueueID and StoreQueueID are set to a value different than zero (i.e. the invalid processor resource index), then LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value declared by the two processor resources. With this patch, we more accurately track dynamic dispatch stalls caused by the lack of LS tokens (i.e. load/store queue full). This is also shown by the differences in two BdVer2 tests. Stalls that were previously classified as generic SCHEDULER FULL stalls, are not correctly classified either as "load queue full" or "store queue full". About the differences in the -scheduler-stats view: those differences are expected, because entries in the load/store queue are not released at instruction issue stage. Instead, those are released at instruction executed stage. This is the main reason why for the modified tests, the load/store queues gets full before PdEx is full. Differential Revision: https://reviews.llvm.org/D54957 llvm-svn: 347857
* [llvm-mca][View] Improved Retire Control Unit Statistics.Andrea Di Biagio2018-11-232-15/+58
| | | | | | | | | | | | | | | | | | | | | RetireControlUnitStatistics now reports extra information about the ROB and the avg/maximum number of entries consumed over the entire simulation. Example: Retire Control Unit - number of cycles where we saw N instructions retired: [# retired], [# cycles] 0, 109 (17.9%) 1, 102 (16.7%) 2, 399 (65.4%) Total ROB Entries: 64 Max Used ROB Entries: 35 ( 54.7% ) Average Used ROB Entries per cy: 32 ( 50.0% ) Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to reflect this change. llvm-svn: 347493
* [llvm-mca] Add extra counters for move elimination in view ↵Andrea Di Biagio2018-11-012-18/+95
| | | | | | | | | | | | | | | | | | RegisterFileStatistics. This patch teaches view RegisterFileStatistics how to report events for optimizable register moves. For each processor register file, view RegisterFileStatistics reports the following extra information: - Number of optimizable register moves - Number of register moves eliminated - Number of zero moves (i.e. register moves that propagate a zero) - Max Number of moves eliminated per cycle. Differential Revision: https://reviews.llvm.org/D53976 llvm-svn: 345865
* [llvm-mca] Move namespace mca inside llvm::Fangrui Song2018-10-3018-16/+36
| | | | | | | | | | | | | | | | Summary: This allows to remove `using namespace llvm;` in those *.cpp files When we want to revisit the decision (everything resides in llvm::mca::*) in the future, we can move things to a nested namespace of llvm::mca::, to conceptually make them separate from the rest of llvm::mca::* Reviewers: andreadb, mattd Reviewed By: andreadb Subscribers: javed.absar, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D53407 llvm-svn: 345612
* [llvm-mca] Fix -wreorder and -Wunused-private-field after r345376. NFCSam McCall2018-10-262-4/+2
| | | | llvm-svn: 345378
* [llvm-mca] Removed dependency on mca::SourcMgr in some Views. NFCAndrea Di Biagio2018-10-267-35/+48
| | | | llvm-svn: 345376
* [llvm-mca] Removed a couple of redundant method declarations, and simplified ↵Andrea Di Biagio2018-10-255-24/+13
| | | | | | code in ResourcePressureView. NFC llvm-svn: 345259
* [llvm-mca] Remove dependency from InstrBuilder in class InstructionTables.Andrea Di Biagio2018-10-241-2/+1
| | | | | | | | | Also, removed the initialization of vectors used for processor resource masks. Support function 'computeProcResourceMasks()' already calls method resize on those vectors. No functional change intended. llvm-svn: 345161
* [llvm-mca] Refactor class SourceMgr. NFCIAndrea Di Biagio2018-10-243-29/+36
| | | | | | | | Added begin()/end() methods to allow the usage of SourceMgr in foreach loops. With this change, method getMCInstFromIndex() (as well as a couple of other methods) are now redundant, and can be removed from the public interface. llvm-svn: 345147
* [llvm-mca] Remove a couple of using directives and a bunch of redundant ↵Andrea Di Biagio2018-10-225-5/+5
| | | | | | namespace llvm prefixes. NFC llvm-svn: 344916
* [llvm-mca] Remove method ↵Andrea Di Biagio2018-10-122-7/+3
| | | | | | RegisterFileStatistics::initializeRegisterFileInfo(). NFC llvm-svn: 344339
* [llvm-mca] Delay calculation of Cycles per Resources, separate the cycles ↵Matt Davis2018-09-112-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | and resource quantities. Summary: This patch removes the storing of accumulated floating point data within the llvm-mca library. This patch splits-up the two quantities: cycles and number of resource units. By splitting-up these two quantities, we delay the calculation of "cycles per resource unit" until that value is read, reducing the chance of accumulating floating point error. I considered using the APFloat, but after measuring performance, for a large (many iteration) sample, I decided to go with this faster solution. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb Subscribers: llvm-commits, javed.absar, tschuett, gbedwell Differential Revision: https://reviews.llvm.org/D51903 llvm-svn: 341980
* [llvm-mca] correctly initialize field 'CycleRetired' in the TimelineView.Andrea Di Biagio2018-08-301-1/+1
| | | | | | | This fixes a [-Wmissing-field-initializers] warning reported by buildbot lld-x86_64-darwin13, build #25152. llvm-svn: 341056
* [llvm-mca] Report the number of dispatched micro opcodes in the ↵Andrea Di Biagio2018-08-305-37/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DispatchStatistics view. This patch introduces the following changes to the DispatchStatistics view: * DispatchStatistics now reports the number of dispatched opcodes instead of the number of dispatched instructions. * The "Dynamic Dispatch Stall Cycles" table now also reports the percentage of stall cycles against the total simulated cycles. This change allows users to easily compare dispatch group sizes with the processor DispatchWidth. Before this change, it was difficult to correlate the two numbers, since DispatchStatistics view reported numbers of instructions (instead of opcodes). DispatchWidth defines the maximum size of a dispatch group in terms of number of micro opcodes. The other change introduced by this patch is related to how DispatchStage generates "instruction dispatch" events. In particular: * There can be multiple dispatch events associated with a same instruction * Each dispatch event now encapsulates the number of dispatched micro opcodes. The number of micro opcodes declared by an instruction may exceed the processor DispatchWidth. Therefore, we cannot assume that instructions are always fully dispatched in a single cycle. DispatchStage knows already how to handle instructions declaring a number of opcodes bigger that DispatchWidth. However, DispatchStage always emitted a single instruction dispatch event (during the first simulated dispatch cycle) for instructions dispatched. With this patch, DispatchStage now correctly notifies multiple dispatch events for instructions that cannot be dispatched in a single cycle. A few views had to be modified. Views can no longer assume that there can only be one dispatch event per instruction. Tests (and docs) have been updated. Differential Revision: https://reviews.llvm.org/D51430 llvm-svn: 341055
* [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report ↵Andrea Di Biagio2018-08-291-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | generated by the SummaryView. This patch adds two new fields to the perf report generated by the SummaryView. Fields are now logically organized into two small groups; only the second group contains throughput indicators. Example: ``` Iterations: 100 Instructions: 300 Total Cycles: 414 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 1.69 IPC: 0.72 Block RThroughput: 4.0 ``` This patch also updates the docs for llvm-mca. Due to the nature of this change, several tests in the tools/llvm-mca directory were affected, and had to be updated using script `update_mca_test_checks.py`. llvm-svn: 340946
* [llvm-mca] Initialize each element in vector TimelineView::UsedBuffers to a ↵Andrea Di Biagio2018-08-282-14/+18
| | | | | | | | | default invalid buffer descriptor. NFCI Also change the default buffer size for UsedBuffer entries to -1 (i.e. "unknown size"). No functional change intended. llvm-svn: 340830
* [llvm-mca][TimelineView] Force the same number of executions for every entry ↵Andrea Di Biagio2018-08-282-70/+106
| | | | | | | | | | | | | | | | | | | in the 'wait-times' table. This patch also uses colors to highlight problematic wait-time entries. A problematic entry is an entry with an high wait time that tends to match (or exceed) the size of the scheduler's buffer. Color RED is used if an instruction had to wait an average number of cycles which is bigger than (or equal to) the size of the underlying scheduler's buffer. Color YELLOW is used if the time (in cycles) spend waiting for the operands or pipeline resources is bigger than half the size of the underlying scheduler's buffer. Color MAGENTA is used if an instruction does not consume buffer resources according to the scheduling model. llvm-svn: 340825
* [llvm-mca] Pass an instruction reference when notifying event listeners ↵Andrea Di Biagio2018-08-282-4/+8
| | | | | | about reserved/released buffer resources. NFC llvm-svn: 340821
* [llvm-mca] Remove unused include. NFCAndrea Di Biagio2018-08-271-1/+0
| | | | llvm-svn: 340768
* [llvm-mca] Improved report generated by the SchedulerStatistics view.Andrea Di Biagio2018-08-272-67/+99
| | | | | | | | | | | | | Before this patch, the SchedulerStatistics only printed the maximum number of buffer entries consumed in each scheduler's queue at a given point of the simulation. This patch restructures the reported table, and adds an extra field named "Average number of used buffer entries" to it. This patch also uses different colors to help identifying bottlenecks caused by high scheduler's buffer pressure. llvm-svn: 340746
* [llvm-mca] Move views and stats into a Views subdir. NFC.Matt Davis2018-08-2418-0/+1702
llvm-svn: 340645
OpenPOWER on IntegriCloud