summaryrefslogtreecommitdiffstats
path: root/llvm/tools/llvm-mca/llvm-mca.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [MCA] Moved the bottleneck analysis to its own file. NFCIAndrea Di Biagio2019-04-171-1/+5
| | | | llvm-svn: 358554
* [MCA] Add an experimental MicroOpQueue stage.Andrea Di Biagio2019-03-291-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds an experimental stage named MicroOpQueueStage. MicroOpQueueStage can be used to simulate a hardware micro-op queue (basically, a decoupling queue between 'decode' and 'dispatch'). Users can specify a queue size, as well as a optional MaxIPC (which - in the absence of a "Decoders" stage - can be used to simulate a different throughput from the decoders). This stage is added to the default pipeline between the EntryStage and the DispatchStage only if PipelineOption::MicroOpQueue is different than zero. By default, llvm-mca sets PipelineOption::MicroOpQueue to the value of hidden flag -micro-op-queue-size. Throughput from the decoder can be simulated via another hidden flag named -decoder-throughput. That flag allows us to quickly experiment with different frontend throughputs. For targets that declare a loop buffer, flag -decoder-throughput allows users to do multiple runs, each time simulating a different throughput from the decoders. This stage can/will be extended in future. For example, we could add a "buffer full" event to notify bottlenecks caused by backpressure. flag -decoder-throughput would probably go away if in future we delegate to another stage (DecoderStage?) the simulation of a (potentially variable) throughput from the decoders. For now, flag -decoder-throughput is "good enough" to run some simple experiments. Differential Revision: https://reviews.llvm.org/D59928 llvm-svn: 357248
* [MCA] Correctly update the UsedResourceGroups mask in the InstrBuilder.Andrea Di Biagio2019-03-261-6/+2
| | | | | | | | Found by inspection when looking at the debug output of MCA. This problem was latent, and none of the upstream models were affected by it. No functional change intended. llvm-svn: 357000
* [llvm-mca] Emit a message when no bottlenecks are identified.Matt Davis2019-03-071-1/+2
| | | | | | | | | | | | | | | | | | | | Summary: Since bottleneck hints are enabled via user request, it can be confusing if no bottleneck information is presented. Such is the case when no bottlenecks are identified. This patch emits a message in that case. Reviewers: andreadb Reviewed By: andreadb Subscribers: tschuett, gbedwell, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59098 llvm-svn: 355628
* [MCA] Highlight kernel bottlenecks in the summary view.Andrea Di Biagio2019-03-041-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new flag named -bottleneck-analysis to print out information about throughput bottlenecks. MCA knows how to identify and classify dynamic dispatch stalls. However, it doesn't know how to analyze and highlight kernel bottlenecks. The goal of this patch is to teach MCA how to correlate increases in backend pressure to backend stalls (and therefore, the loss of throughput). From a Scheduler point of view, backend pressure is a function of the scheduler buffer usage (i.e. how the number of uOps in the scheduler buffers changes over time). Backend pressure increases (or decreases) when there is a mismatch between the number of opcodes dispatched, and the number of opcodes issued in the same cycle. Since buffer resources are limited, continuous increases in backend pressure would eventually leads to dispatch stalls. So, there is a strong correlation between dispatch stalls, and how backpressure changed over time. This patch teaches how to identify situations where backend pressure increases due to: - unavailable pipeline resources. - data dependencies. Data dependencies may delay execution of instructions and therefore increase the time that uOps have to spend in the scheduler buffers. That often translates to an increase in backend pressure which may eventually lead to a bottleneck. Contention on pipeline resources may also delay execution of instructions, and lead to a temporary increase in backend pressure. Internally, the Scheduler classifies instructions based on whether register / memory operands are available or not. An instruction is marked as "ready to execute" only if data dependencies are fully resolved. Every cycle, the Scheduler attempts to execute all instructions that are ready to execute. If an instruction cannot execute because of unavailable pipeline resources, then the Scheduler internally updates a BusyResourceUnits mask with the ID of each unavailable resource. ExecuteStage is responsible for tracking changes in backend pressure. If backend pressure increases during a cycle because of contention on pipeline resources, then ExecuteStage sends a "backend pressure" event to the listeners. That event would contain information about instructions delayed by resource pressure, as well as the BusyResourceUnits mask. Note that ExecuteStage also knows how to identify situations where backpressure increased because of delays introduced by data dependencies. The SummaryView observes "backend pressure" events and prints out a "bottleneck report". Example of bottleneck report: ``` Cycles with backend pressure increase [ 99.89% ] Throughput Bottlenecks: Resource Pressure [ 0.00% ] Data Dependencies: [ 99.89% ] - Register Dependencies [ 0.00% ] - Memory Dependencies [ 99.89% ] ``` A bottleneck report is printed out only if increases in backend pressure eventually caused backend stalls. About the time complexity: Time complexity is linear in the number of instructions in the Scheduler::PendingSet. The average slowdown tends to be in the range of ~5-6%. For memory intensive kernels, the slowdown can be significant if flag -noalias=false is specified. In the worst case scenario I have observed a slowdown of ~30% when flag -noalias=false was specified. We can definitely recover part of that slowdown if we optimize class LSUnit (by doing extra bookkeeping to speedup queries). For now, this new analysis is disabled by default, and it can be enabled via flag -bottleneck-analysis. Users of MCA as a library can enable the generation of pressure events through the constructor of ExecuteStage. This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494 Differential Revision: https://reviews.llvm.org/D58728 llvm-svn: 355308
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [llvm-mca] Rename an error variable.Matt Davis2018-12-191-2/+2
| | | | llvm-svn: 349662
* [llvm-mca] Add an error handler for error from parseCodeRegionsMatt Davis2018-12-191-10/+14
| | | | | | | | | | | | | | | | | | | Summary: It's a bit tricky to add a test for the failing path right now, binary support will have an easier path to exercise the path here. * Ran clang-format. Reviewers: andreadb Reviewed By: andreadb Subscribers: tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D55803 llvm-svn: 349659
* [MCA] Don't assume that createMCInstrAnalysis() always returns a valid pointer.Andrea Di Biagio2018-12-171-1/+1
| | | | | | | | | | Class InstrBuilder wrongly assumed that llvm targets were always able to return a non-null pointer when createMCInstrAnalysis() was called on them. This was causing crashes when simulating executions for targets that don't provide an MCInstrAnalysis object. This patch fixes the issue by making MCInstrAnalysis optional. llvm-svn: 349352
* [llvm-mca] Move llvm-mca library to llvm/lib/MCA.Clement Courbet2018-12-171-5/+5
| | | | | | | | | | | | Summary: See PR38731. Reviewers: andreadb Subscribers: mgorny, javed.absar, tschuett, gbedwell, andreadb, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D55557 llvm-svn: 349332
* [llvm-mca][MC] Add the ability to declare which processor resources model ↵Andrea Di Biagio2018-11-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | load/store queues (PR36666). This patch adds the ability to specify via tablegen which processor resources are load/store queue resources. A new tablegen class named MemoryQueue can be optionally used to mark resources that model load/store queues. Information about the load/store queue is collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and `StoreQueueID`. Those two fields are identifiers for buffered resources used to describe the load queue and the store queue. Field `BufferSize` is interpreted as the number of entries in the queue, while the number of units is a throughput indicator (i.e. number of available pickers for loads/stores). At construction time, LSUnit in llvm-mca checks for the presence of extra processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If that information is available, and fields LoadQueueID and StoreQueueID are set to a value different than zero (i.e. the invalid processor resource index), then LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value declared by the two processor resources. With this patch, we more accurately track dynamic dispatch stalls caused by the lack of LS tokens (i.e. load/store queue full). This is also shown by the differences in two BdVer2 tests. Stalls that were previously classified as generic SCHEDULER FULL stalls, are not correctly classified either as "load queue full" or "store queue full". About the differences in the -scheduler-stats view: those differences are expected, because entries in the load/store queue are not released at instruction issue stage. Instead, those are released at instruction executed stage. This is the main reason why for the modified tests, the load/store queues gets full before PdEx is full. Differential Revision: https://reviews.llvm.org/D54957 llvm-svn: 347857
* Reapply "[llvm-mca] Return the total number of cycles from method ↵Andrea Di Biagio2018-11-281-2/+3
| | | | | | | | | | | | | | | | | | | | | | Pipeline::run()." This reapplies r347767 (originally reviewed at: https://reviews.llvm.org/D55000) with a fix for the missing std::move of the Error returned by the call to Pipeline::runCycle(). Below is the original commit message from r347767. If a user only cares about the overall latency, then the best/quickest way is to change method Pipeline::run() so that it returns the total number of cycles to the caller. When the simulation pipeline is run, the number of cycles (or an error) is returned from method Pipeline::run(). The advantage is that no hardware event listener is needed for computing that latency. So, the whole process should be faster (and simpler - at least for that particular use case). llvm-svn: 347795
* Revert [llvm-mca] Return the total number of cycles from method Pipeline::run().Andrea Di Biagio2018-11-281-3/+2
| | | | | | This reverts commits 347767. llvm-svn: 347775
* [llvm-mca] Return the total number of cycles from method Pipeline::run().Andrea Di Biagio2018-11-281-2/+3
| | | | | | | | | | | | | | If a user only cares about the overall latency, then the best/quickest way is to change method Pipeline::run() so that it returns the total number of cycles to the caller. When the simulation pipeline is run, the number of cycles (or an error) is returned from method Pipeline::run(). The advantage is that no hardware event listener is needed for computing that latency. So, the whole process should be faster (and simpler - at least for that particular use case). llvm-svn: 347767
* [llvm-mca][View] Improved Retire Control Unit Statistics.Andrea Di Biagio2018-11-231-1/+1
| | | | | | | | | | | | | | | | | | | | | RetireControlUnitStatistics now reports extra information about the ROB and the avg/maximum number of entries consumed over the entire simulation. Example: Retire Control Unit - number of cycles where we saw N instructions retired: [# retired], [# cycles] 0, 109 (17.9%) 1, 102 (16.7%) 2, 399 (65.4%) Total ROB Entries: 64 Max Used ROB Entries: 35 ( 54.7% ) Average Used ROB Entries per cy: 32 ( 50.0% ) Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to reflect this change. llvm-svn: 347493
* [llvm-mca] Partially revert r346417.Matt Davis2018-11-081-16/+19
| | | | | | | Restored the llvm:: namespace qualifier on make_unique. This removes the ambiguity with make_unique. llvm-svn: 346424
* [llvm-mca] PR39261: Rename FetchStage to EntryStage.Andrea Di Biagio2018-11-081-2/+2
| | | | | | | | | | | | This fixes PR39261. FetchStage is a misnomer. It causes confusion with the frontend fetch stage, which we don't currently simulate. I decided to rename it into EntryStage mainly because this is meant to be a "source" stage for all pipelines. Differential Revision: https://reviews.llvm.org/D54268 llvm-svn: 346419
* [llvm-mca] Remove unneeded namespace qualifier. NFC.Matt Davis2018-11-081-24/+21
| | | | llvm-svn: 346417
* [llvm-mca] Move the AssembleInput logic into its own class.Matt Davis2018-11-071-103/+9
| | | | | | | | | | | | | | | | | Summary: This patch introduces a CodeRegionGenerator class which is responsible for parsing some type of input and creating a 'CodeRegions' instance for use by llvm-mca. In the future, we will also have a CodeRegionGenerator subclass for converting an input object file into CodeRegions. For now, we only have the subclass for converting input assembly into CodeRegions. This is mostly a NFC patch, as the logic remains close to the original, but now encapsulated in its own class and moved outside of llvm-mca.cpp. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb Subscribers: mgorny, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D54179 llvm-svn: 346344
* [llvm-mca] Remove the verb 'assemble' from a few options in help. NFC.Matt Davis2018-10-311-17/+17
| | | | | | | * MCA does not assemble anything. * Ran clang-format. llvm-svn: 345750
* [llvm-mca] Lower to mca::Instructon before the pipeline is run.Andrea Di Biagio2018-10-291-20/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | Before this change, the lowering of instructions from llvm::MCInst to mca::Instruction was done as part of the first stage of the pipeline (i.e. the FetchStage). In particular, FetchStage was responsible for picking the next instruction from the source sequence, and lower it to an mca::Instruction with the help of an object of class InstrBuilder. The dependency on InstrBuilder was problematic for a number of reasons. Class InstrBuilder only knows how to lower from llvm::MCInst to mca::Instruction. That means, it is hard to support a different scenario where instructions in input are not instances of class llvm::MCInst. Even if we managed to specialize InstrBuilder, and generalize most of its internal logic, the dependency on InstrBuilder in FetchStage would have caused more troubles (other than complicating the pipeline logic). With this patch, the lowering step is done before the pipeline is run. The pipeline is no longer responsible for lowering from MCInst to mca::Instruction. As a consequence of this, the FetchStage no longer needs to interact with an InstrBuilder. The mca::SourceMgr class now simply wraps a reference to a sequence of mca::Instruction objects. This simplifies the logic of FetchStage, and increases the usability of it. As a result, on a debug build, we see a 7-9% speedup; on a release build, the speedup is around 3-4%. llvm-svn: 345500
* [llvm-mca] Removed dependency on mca::SourcMgr in some Views. NFCAndrea Di Biagio2018-10-261-7/+12
| | | | llvm-svn: 345376
* [llvm-mca] Remove dependency from InstrBuilder in class InstructionTables.Andrea Di Biagio2018-10-241-1/+1
| | | | | | | | | Also, removed the initialization of vectors used for processor resource masks. Support function 'computeProcResourceMasks()' already calls method resize on those vectors. No functional change intended. llvm-svn: 345161
* [llvm-mca] [llvm-mca] Improved error handling and error reporting from class ↵Andrea Di Biagio2018-10-241-7/+33
| | | | | | | | | | | | | | | | | | | | | | | | | InstrBuilder. A new class named InstructionError has been added to Support.h in order to improve the error reporting from class InstrBuilder. The llvm-mca driver is responsible for handling InstructionError objects, and printing them out to stderr. The goal of this patch is to remove all the remaining error handling logic from the library code. In particular, this allows us to: - Simplify the logic in InstrBuilder by removing a needless dependency from MCInstrPrinter. - Centralize all the error halding logic in a new function named 'runPipeline' (see llvm-mca.cpp). This is also a first step towards generalizing class InstrBuilder, so that in future, we will be able to reuse its logic to also "lower" MachineInstr to mca::Instruction objects. Differential Revision: https://reviews.llvm.org/D53585 llvm-svn: 345129
* [llvm-mca] Use llvm::ArrayRef in class SourceMgr. NFCIAndrea Di Biagio2018-10-221-8/+10
| | | | | | | Class SourceMgr now uses type ArrayRef<MCInst> to reference the sequence of code from a "CodeRegion". llvm-svn: 344911
* Test commit. NFC.Owen Rodley2018-09-281-1/+1
| | | | llvm-svn: 343296
* [llvm-mca] Don't disable the SummaryView if flag `-all-stats` is false.Andrea Di Biagio2018-08-291-1/+0
| | | | llvm-svn: 340945
* [llvm-mca] Introduce the llvm-mca library and organize the directory ↵Matt Davis2018-08-271-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | accordingly. NFC. Summary: This patch introduces llvm-mca as a library. The driver (llvm-mca.cpp), views, and stats, are not part of the library. Those are separate components that are not required for the functioning of llvm-mca. The directory has been organized as follows: All library source files now reside in: - `lib/HardwareUnits/` - All subclasses of HardwareUnit (these represent the simulated hardware components of a backend). (LSUnit does not inherit from HardwareUnit, but Scheduler does which uses LSUnit). - `lib/Stages/` - All subclasses of the pipeline stages. - `lib/` - This is the root of the library and contains library code that does not fit into the Stages or HardwareUnit subdirs. All library header files now reside in the `include` directory and mimic the same layout as the `lib` directory mentioned above. In the (near) future we would like to move the library (include and lib) contents from tools and into the core of llvm somewhere. That change would allow various analysis and optimization passes to make use of MCA functionality for things like cost modeling. I left all of the non-library code just where it has always been, in the root of the llvm-mca directory. The include directives for the non-library source file have been updated to refer to the llvm-mca library headers. I updated the llvm-mca/CMakeLists.txt file to include the library headers, but I made the non-library code explicitly reference the library's 'include' directory. Once we eventually (hopefully) migrate the MCA library components into llvm the include directives used by the non-library source files will be updated to point to the proper location in llvm. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb Subscribers: mgorny, javed.absar, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D50929 llvm-svn: 340755
* [llvm-mca] Move views and stats into a Views subdir. NFC.Matt Davis2018-08-241-8/+8
| | | | llvm-svn: 340645
* [llvm-mca] Remove unused formal parameter. NFC.Matt Davis2018-08-201-4/+4
| | | | llvm-svn: 340227
* [llvm-mca] Propagate fatal llvm-mca errors from library classes to driver.Matt Davis2018-08-131-2/+7
| | | | | | | | | | | | | | | | | Summary: This patch introduces error handling to propagate the errors from llvm-mca library classes (or what will become library classes) up to the driver. This patch also introduces an enum to make clearer the intention of the return value for Stage::execute. This supports PR38101. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb Subscribers: llvm-commits, tschuett, gbedwell Differential Revision: https://reviews.llvm.org/D50561 llvm-svn: 339594
* [ADT] Normalize empty triple componentsPetr Hosek2018-08-081-1/+0
| | | | | | | | | | | | | | | | | LLVM triple normalization is handling "unknown" and empty components differently; for example given "x86_64-unknown-linux-gnu" and "x86_64-linux-gnu" which should be equivalent, triple normalization returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's config.sub returns "x86_64-unknown-linux-gnu" for both "x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the triple normalization to behave the same way, replacing empty triple components with "unknown". This addresses PR37129. Differential Revision: https://reviews.llvm.org/D50219 llvm-svn: 339294
* [llvm-mca] Update the help text to reflect "physical" registers. NFC.Matt Davis2018-07-311-1/+1
| | | | llvm-svn: 338430
* [llvm-mca] Turn InstructionTables into a Stage.Matt Davis2018-07-141-6/+12
| | | | | | | | | | | | | | | | | Summary: This patch converts the InstructionTables class into a subclass of mca::Stage. This change allows us to use the Stage's inherited Listeners for event notifications. This also allows us to create a simple pipeline for viewing the InstructionTables report. I have been working on a follow on patch that should cleanup addView in InstructionTables. Right now, addView adds the view to both the Listener list and Views list. The follow-on patch addresses the fact that we don't really need two lists in this case. That change is not specific to just InstructionTables, so it will be a separate patch. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb Subscribers: tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D49329 llvm-svn: 337113
* [llvm-mca] report an error if the assembly sequence contains an unsupported ↵Andrea Di Biagio2018-07-091-1/+1
| | | | | | | | | | | | | instruction. This is a short-term fix for PR38093. For now, we llvm::report_fatal_error if the instruction builder finds an unsupported instruction in the instruction stream. We need to revisit this fix once we start addressing PR38101. Essentially, we need a better framework for error handling. llvm-svn: 336543
* [llvm-mca] Add HardwareUnit and Context classes.Matt Davis2018-07-061-20/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the construction of the default backend from llvm-mca.cpp and into mca::Context. The Context class is responsible for holding ownership of the simulated hardware components. These components are subclasses of HardwareUnit. Right now the HardwareUnit is pretty bare-bones, but eventually we might want to add some common functionality across all hardware components, such as isReady() or something similar. I have a feeling this patch will probably need some updates, but it's a start. One thing I am not particularly fond of is the rather large interface for createDefaultPipeline. That convenience routine takes a rather large set of inputs from the llvm-mca driver, where many of those inputs are generated via command line options. One item I think we might want to change is the separating of ownership of hardware components (owned by the context) and the pipeline (which owns Stages). In short, a Pipeline owns Stages, a Context (currently) owns hardware. The Pipeline's Stages make use of the components, and thus there is a lifetime dependency generated. The components must outlive the pipeline. We could solve this by having the Context also own the Pipeline, and not return a unique_ptr<Pipeline>. Now that I think about it, I like that idea more. Differential Revision: https://reviews.llvm.org/D48691 llvm-svn: 336456
* [llvm-mca] Clear the content of map VariantDescriptors in InstrBuilder ↵Andrea Di Biagio2018-07-021-0/+3
| | | | | | | | | | | | | | before we start analyzing a new CodeBlock. NFCI. Different CodeBlocks don't overlap. The same MCInst cannot appear in more than one code block because all blocks are instantiated before the simulation is run. We should always clear the content of map VariantDescriptors before every simulation, since VariantDescriptors cannot possibly store useful information for the next blocks. It is also "safer" to clear its content because `MCInst*` is used as the key type for map VariantDescriptors. llvm-svn: 336142
* [MC] Error on a .zerofill directive in a non-virtual sectionFrancis Visoiu Mistrih2018-07-021-1/+2
| | | | | | | | | | | | | | | On darwin, all virtual sections have zerofill type, and having a .zerofill directive in a non-virtual section is not allowed. Instead of asserting, show a nicer error. In order to use the equivalent of .zerofill in a non-virtual section, the usage of .zero of .space is required. This patch replaces the assert with an error. Differential Revision: https://reviews.llvm.org/D48517 llvm-svn: 336127
* [llvm-mca] Register listeners with stages; remove Pipeline dependency from ↵Matt Davis2018-06-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Stage. Summary: This patch removes a few callbacks from Pipeline. It comes at the cost of registering Listeners with all Stages. Not all stages need listeners or issue callbacks, this registration is a bit redundant. However, as we build-out the API, this redundancy can disappear. The main purpose here is to move callback code from the Pipeline and into the stages that actually issue those callbacks. This removes the back-pointer to the Pipeline that was put into a few Stage subclasses. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb, courbet Subscribers: tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D48576 llvm-svn: 335748
* [llvm-mca] Rename Backend to Pipeline. NFC.Matt Davis2018-06-251-9/+10
| | | | | | | | | | | | | | | | | | Summary: This change renames the Backend and BackendPrinter to Pipeline and PipelinePrinter respectively. Variables and comments have also been updated to reflect this change. The reason for this rename, is to be slightly more correct about what MCA is modeling. MCA models a Pipeline, which implies some logical sequence of stages. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb, courbet Subscribers: mgorny, javed.absar, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D48496 llvm-svn: 335496
* [llvm-mca] Introduce a sequential container of StagesMatt Davis2018-06-221-16/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Remove explicit stages and introduce a list of stages. A pipeline should be composed of an arbitrary list of stages, and not any predefined list of stages in the Backend. The Backend should not know of any particular stage, rather it should only be concerned that it has a list of stages, and that those stages will fulfill the contract of what it means to be a Stage (namely pre/post/execute a given instruction). For now, we leave the original set of stages defined in the Backend ctor; however, I imagine these will be moved out at a later time. This patch makes an adjustment to the semantics of Stage::isReady. Specifically, what the Backend really needs to know is if a Stage has unfinished work. With that said, it is more appropriately renamed Stage::hasWorkToComplete(). This change will clean up the check in Backend::run(), allowing us to query each stage to see if there is unfinished work, regardless of what subclass a stage might be. I feel that this change simplifies the semantics too, but that's a subjective statement. Given how RetireStage and ExecuteStage handle data in their preExecute(), I've had to change the order of Retire and Execute in our stage list. Retire must complete any of its preExecute actions before ExecuteStage's preExecute can take control. This is mainly because both stages utilize the RCU. In the meantime, I want to see if I can adjust that or remove that coupling. Reviewers: andreadb, RKSimon, courbet Reviewed By: andreadb Subscribers: tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D46907 llvm-svn: 335361
* [llvm-mca][X86] Teach how to identify register writes that implicitly clear ↵Andrea Di Biagio2018-06-201-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | the upper portion of a super-register. This patch teaches llvm-mca how to identify register writes that implicitly zero the upper portion of a super-register. On X86-64, a general purpose register is implemented in hardware as a 64-bit register. Quoting the Intel 64 Software Developer's Manual: "an update to the lower 32 bits of a 64 bit integer register is architecturally defined to zero extend the upper 32 bits". Also, a write to an XMM register performed by an AVX instruction implicitly zeroes the upper 128 bits of the aliasing YMM register. This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis interface to help identify instructions that implicitly clear the upper portion of a super-register. The rest of the patch teaches llvm-mca how to use that new method to obtain the information, and update the register dependencies accordingly. I compared the kernels from tests clear-super-register-1.s and clear-super-register-2.s against the output from perf on btver2. Previously there was a large discrepancy between the estimated IPC and the measured IPC. Now the differences are mostly in the noise. Differential Revision: https://reviews.llvm.org/D48225 llvm-svn: 335113
* [MCA] Add -summary-view optionRoman Lebedev2018-06-151-1/+10
| | | | | | | | | | | | | | | | | | | Summary: While that is indeed a quite interesting summary stat, there are cases where it does not really add anything other than consuming extra lines. Declutters the output of D48190. Reviewers: RKSimon, andreadb, courbet, craig.topper Reviewed By: andreadb Subscribers: javed.absar, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D48209 llvm-svn: 334833
* Revert: [llvm-mca] Flush the output stream before we start the analysis of a ↵Andrea Di Biagio2018-06-131-1/+0
| | | | | | | | | new code region. NFC Not sure why, but it breaks buildbot clang-cmake-armv8-full. It causes a failure in TEST 'Xray-armhf-linux :: TestCases/Posix/profiling-single-threaded.cc'. llvm-svn: 334617
* [llvm-mca] Flush the output stream before we start the analysis of a new ↵Andrea Di Biagio2018-06-131-0/+1
| | | | | | code region. NFC llvm-svn: 334610
* [llvm-mca] Print the "Block RThroughput" in the SummaryView.Andrea Di Biagio2018-05-231-1/+1
| | | | | | | | | | | | | | | | | | | This patch implements the "block reciprocal throughput" computation in the SummaryView. The block reciprocal throughput is computed as the MAX of: - NumMicroOps / DispatchWidth - Resource Cycles / #Units (for every resource consumed). The block throughput is bounded from above by the hardware dispatch throughput. That is because the DispatchWidth is an upper bound on how many opcodes can be part of a single dispatch group. The block throughput is also limited by the amount of hardware parallelism. The number of available resource units affects how the resource pressure is distributed, and also how many blocks can be delivered every cycle. llvm-svn: 333095
* [llvm-mca] Hide unrelated flags from the -help output.Andrea Di Biagio2018-05-171-23/+35
| | | | llvm-svn: 332615
* [llvm-mca] add flag -all-views and flag -all-stats.Andrea Di Biagio2018-05-171-0/+38
| | | | | | | Flag -all-views enables all the views. Flag -all-stats enables all the views that print hardware statistics. llvm-svn: 332602
* [llvm-mca] Introduce a pipeline Stage class and FetchStage.Matt Davis2018-05-151-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is just an idea, really two ideas. I expect some push-back, but I realize that posting a diff is the most comprehensive way to express these concepts. This patch introduces a Stage class which represents the various stages of an instruction pipeline. As a start, I have created a simple FetchStage that is based on existing logic for how MCA produces instructions, but now encapsulated in a Stage. The idea should become more concrete once we introduce additional stages. The idea being, that when a stage completes, the next stage in the pipeline will be executed. Stages are chained together as a singly linked list to closely model a real pipeline. For now there is only one stage, so the stage-to-stage flow of instructions isn't immediately obvious. Eventually, Stage will also handle event notifications, but that functionality is not complete, and not destined for this patch. Ideally, an interested party can register for notifications from a particular stage. Callbacks will be issued to these listeners at various points in the execution of the stage. For now, eventing functionality remains similar to what it has been in mca::Backend. We will be building-up the Stage class as we move on, such as adding debug output. This patch also removes the unique_ptr<Instruction> return value from InstrBuilder::createInstruction. An Instruction pointer is still produced, but now it's up to the caller to decide how that item should be managed post-allocation (e.g., smart pointer). This allows the Fetch stage to create instructions and manage the lifetime of those instructions as it wishes, and not have to be bound to any specific managed pointer type. Other callers of createInstruction might have different requirements, and thus can manage the pointer to fit their needs. Another idea would be to push the ownership to the RCU. Currently, the FetchStage will wrap the Instruction pointer in a shared_ptr. This allows us to remove the Instruction container in Backend, which was probably going to disappear, or move, at some point anyways. Note that I did run these changes through valgrind, to make sure we are not leaking memory. While the shared_ptr comes with some additional overhead it relieves us from having to manage a list of generated instructions, and/or make lookup calls to remove the instructions. I realize that both the Stage class and the Instruction pointer management (mentioned directly above) are separate but related ideas, and probably should land as separate patches; I am happy to do that if either idea is decent. The main reason these two ideas are together is that Stage::execute() can mutate an InstRef. For the fetch stage, the InstRef is populated as the primary action of that stage (execute()). I didn't want to change the Stage interface to support the idea of generating an instruction. Ideally, instructions are to be pushed through the pipeline. I didn't want to draw too much of a specialization just for the fetch stage. Excuse the word-salad. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb Subscribers: llvm-commits, mgorny, javed.absar, tschuett, gbedwell Differential Revision: https://reviews.llvm.org/D46741 llvm-svn: 332390
* [llvm-mca] removes flag -instruction-tables from the "View Options" category.Andrea Di Biagio2018-05-051-25/+15
| | | | | | | | This patch also improves the description of a couple of flags in the view options. With this change, the -help now specifies which views are enabled by default. llvm-svn: 331594
OpenPOWER on IntegriCloud