bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[MCA] Add an experimental MicroOpQueue stage.	Andrea Di Biagio	2019-03-29	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an experimental stage named MicroOpQueueStage. MicroOpQueueStage can be used to simulate a hardware micro-op queue (basically, a decoupling queue between 'decode' and 'dispatch'). Users can specify a queue size, as well as a optional MaxIPC (which - in the absence of a "Decoders" stage - can be used to simulate a different throughput from the decoders). This stage is added to the default pipeline between the EntryStage and the DispatchStage only if PipelineOption::MicroOpQueue is different than zero. By default, llvm-mca sets PipelineOption::MicroOpQueue to the value of hidden flag -micro-op-queue-size. Throughput from the decoder can be simulated via another hidden flag named -decoder-throughput. That flag allows us to quickly experiment with different frontend throughputs. For targets that declare a loop buffer, flag -decoder-throughput allows users to do multiple runs, each time simulating a different throughput from the decoders. This stage can/will be extended in future. For example, we could add a "buffer full" event to notify bottlenecks caused by backpressure. flag -decoder-throughput would probably go away if in future we delegate to another stage (DecoderStage?) the simulation of a (potentially variable) throughput from the decoders. For now, flag -decoder-throughput is "good enough" to run some simple experiments. Differential Revision: https://reviews.llvm.org/D59928 llvm-svn: 357248
*	[MCA] Correctly update the UsedResourceGroups mask in the InstrBuilder.	Andrea Di Biagio	2019-03-26	2	-7/+4
\| \| \| \| \| \| \| \|	Found by inspection when looking at the debug output of MCA. This problem was latent, and none of the upstream models were affected by it. No functional change intended. llvm-svn: 357000
*	[llvm-mca] Emit a message when no bottlenecks are identified.	Matt Davis	2019-03-07	3	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Since bottleneck hints are enabled via user request, it can be confusing if no bottleneck information is presented. Such is the case when no bottlenecks are identified. This patch emits a message in that case. Reviewers: andreadb Reviewed By: andreadb Subscribers: tschuett, gbedwell, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59098 llvm-svn: 355628
*	[MCA] Correctly initialize struct SummaryView::BackPressureInfo.	Andrea Di Biagio	2019-03-04	1	-1/+1
\| \| \| \| \| \|	This should appease the buildbots. llvm-svn: 355309
*	[MCA] Highlight kernel bottlenecks in the summary view.	Andrea Di Biagio	2019-03-04	3	-4/+152
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new flag named -bottleneck-analysis to print out information about throughput bottlenecks. MCA knows how to identify and classify dynamic dispatch stalls. However, it doesn't know how to analyze and highlight kernel bottlenecks. The goal of this patch is to teach MCA how to correlate increases in backend pressure to backend stalls (and therefore, the loss of throughput). From a Scheduler point of view, backend pressure is a function of the scheduler buffer usage (i.e. how the number of uOps in the scheduler buffers changes over time). Backend pressure increases (or decreases) when there is a mismatch between the number of opcodes dispatched, and the number of opcodes issued in the same cycle. Since buffer resources are limited, continuous increases in backend pressure would eventually leads to dispatch stalls. So, there is a strong correlation between dispatch stalls, and how backpressure changed over time. This patch teaches how to identify situations where backend pressure increases due to: - unavailable pipeline resources. - data dependencies. Data dependencies may delay execution of instructions and therefore increase the time that uOps have to spend in the scheduler buffers. That often translates to an increase in backend pressure which may eventually lead to a bottleneck. Contention on pipeline resources may also delay execution of instructions, and lead to a temporary increase in backend pressure. Internally, the Scheduler classifies instructions based on whether register / memory operands are available or not. An instruction is marked as "ready to execute" only if data dependencies are fully resolved. Every cycle, the Scheduler attempts to execute all instructions that are ready to execute. If an instruction cannot execute because of unavailable pipeline resources, then the Scheduler internally updates a BusyResourceUnits mask with the ID of each unavailable resource. ExecuteStage is responsible for tracking changes in backend pressure. If backend pressure increases during a cycle because of contention on pipeline resources, then ExecuteStage sends a "backend pressure" event to the listeners. That event would contain information about instructions delayed by resource pressure, as well as the BusyResourceUnits mask. Note that ExecuteStage also knows how to identify situations where backpressure increased because of delays introduced by data dependencies. The SummaryView observes "backend pressure" events and prints out a "bottleneck report". Example of bottleneck report: ``` Cycles with backend pressure increase [ 99.89% ] Throughput Bottlenecks: Resource Pressure [ 0.00% ] Data Dependencies: [ 99.89% ] - Register Dependencies [ 0.00% ] - Memory Dependencies [ 99.89% ] ``` A bottleneck report is printed out only if increases in backend pressure eventually caused backend stalls. About the time complexity: Time complexity is linear in the number of instructions in the Scheduler::PendingSet. The average slowdown tends to be in the range of ~5-6%. For memory intensive kernels, the slowdown can be significant if flag -noalias=false is specified. In the worst case scenario I have observed a slowdown of ~30% when flag -noalias=false was specified. We can definitely recover part of that slowdown if we optimize class LSUnit (by doing extra bookkeeping to speedup queries). For now, this new analysis is disabled by default, and it can be enabled via flag -bottleneck-analysis. Users of MCA as a library can enable the generation of pressure events through the constructor of ExecuteStage. This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494 Differential Revision: https://reviews.llvm.org/D58728 llvm-svn: 355308
*	[MCA][ResourceManager] Add a table that maps processor resource indices to ↵	Andrea Di Biagio	2019-02-20	2	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \|	processor resource identifiers. This patch adds a lookup table to speed up resource queries in the ResourceManager. This patch also moves helper function 'getResourceStateIndex()' from ResourceManager.cpp to Support.h, so that we can reuse that logic in the SummaryView (and potentially other views in llvm-mca). No functional change intended. llvm-svn: 354470
*	[AsmPrinter] Remove hidden flag -print-schedule.	Andrea Di Biagio	2019-02-04	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes hidden codegen flag -print-schedule effectively reverting the logic originally committed as r300311 (https://llvm.org/viewvc/llvm-project?view=revision&revision=300311). Flag -print-schedule was originally introduced by r300311 to address PR32216 (https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better testing of schedule model instruction latencies/throughputs". These days, we can use llvm-mca to test scheduling models. So there is no longer a need for flag -print-schedule in LLVM. The main use case for PR32216 is now addressed by llvm-mca. Flag -print-schedule is mainly used for debugging purposes, and it is only actually used by x86 specific tests. We already have extensive (latency and throughput) tests under "test/tools/llvm-mca" for X86 processor models. That means, most (if not all) existing -print-schedule tests for X86 are redundant. When flag -print-schedule was first added to LLVM, several files had to be modified; a few APIs gained new arguments (see for example method MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained a couple of getSchedInfoStr() methods. Method getSchedInfoStr() had to originally work for both MCInst and MachineInstr. The original implmentation of getSchedInfoStr() introduced a subtle layering violation (reported as PR37160 and then fixed/worked-around by r330615). In retrospect, that new API could have been designed more optimally. We can always query MCSchedModel to get the latency and throughput. More importantly, the "sched-info" string should not have been generated by the subtarget. Note, r317782 fixed an issue where "print-schedule" didn't work very well in the presence of inline assembly. That commit is also reverted by this change. Differential Revision: https://reviews.llvm.org/D57244 llvm-svn: 353043
*	[MC][X86] Correctly model additional operand latency caused by transfer ↵	Andrea Di Biagio	2019-01-23	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	delays from the integer to the floating point unit. This patch adds a new ReadAdvance definition named ReadInt2Fpu. ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by data transfers from the integer unit to the floating point unit. ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all x86 models excluding BtVer2. That means, this patch is only a functional change for the Jaguar cpu model only. Tablegen definitions for instructions (V)PINSR* have been updated to account for the new ReadInt2Fpu. That read is mapped to the the GPR input operand. On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch, that extra delay was added to the opcode latency. In practice, the insert opcode only executes for 1cy. Most of the actual latency is actually contributed by the so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR* latency is defined by expression f+1, where f is defined as a forwarding delay from the integer unit to the fpu. When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC (only when flag -print-schedule is speified), we now need to account for any extra forwarding delays. We do this by checking if scheduling classes declare any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td: "A negative advance effectively increases latency, which may be used for cross-domain stalls". When computing the instruction latency for the purpose of our scheduling tests, we now add any extra delay to the formula. This avoids regressing existing codegen and mca schedule tests. It comes with the cost of an extra (but very simple) hook in MCSchedModel. Differential Revision: https://reviews.llvm.org/D57056 llvm-svn: 351965
*	Update the file headers across all of the LLVM projects in the monorepo	Chandler Carruth	2019-01-19	26	-104/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
*	[MCA] Fix wrong definition of ResourceUnitMask in DefaultResourceStrategy.	Andrea Di Biagio	2019-01-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Field ResourceUnitMask was incorrectly defined as a 'const unsigned' mask. It should have been a 64 bit quantity instead. That means, ResourceUnitMask was always implicitly truncated to a 32 bit quantity. This issue has been found by inspection. Surprisingly, that bug was latent, and it never negatively affected any existing upstream targets. This patch fixes the wrong definition of ResourceUnitMask, and adds a bunch of extra debug prints to help debugging potential issues related to invalid processor resource masks. llvm-svn: 350820
*	Revert "Work around a linker error caused by https://reviews.llvm.org/D56084."	Adrian Prantl	2019-01-08	1	-2/+0
\| \| \| \| \| \|	This reverts commit r350650 llvm-svn: 350653
*	Work around a linker error caused by https://reviews.llvm.org/D56084.	Adrian Prantl	2019-01-08	1	-0/+2
\| \| \| \| \| \|	This unbreaks all bots that build with -DLLVM_ENABLE_MODULES=1 llvm-svn: 350650
*	[llvm-mca] Rename an error variable.	Matt Davis	2018-12-19	1	-2/+2
\| \| \| \|	llvm-svn: 349662
*	[llvm-mca] Add an error handler for error from parseCodeRegions	Matt Davis	2018-12-19	1	-10/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It's a bit tricky to add a test for the failing path right now, binary support will have an easier path to exercise the path here. * Ran clang-format. Reviewers: andreadb Reviewed By: andreadb Subscribers: tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D55803 llvm-svn: 349659
*	[MCA] Don't assume that createMCInstrAnalysis() always returns a valid pointer.	Andrea Di Biagio	2018-12-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Class InstrBuilder wrongly assumed that llvm targets were always able to return a non-null pointer when createMCInstrAnalysis() was called on them. This was causing crashes when simulating executions for targets that don't provide an MCInstrAnalysis object. This patch fixes the issue by making MCInstrAnalysis optional. llvm-svn: 349352
*	[llvm-mca] Move llvm-mca library to llvm/lib/MCA.	Clement Courbet	2018-12-17	45	-5875/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: See PR38731. Reviewers: andreadb Subscribers: mgorny, javed.absar, tschuett, gbedwell, andreadb, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D55557 llvm-svn: 349332
*	[llvm-mca] Speedup the default resource selection strategy.	Andrea Di Biagio	2018-11-30	2	-18/+31
\| \| \| \| \| \| \| \| \|	This patch removes a (potentially) slow while loop in DefaultResourceStrategy::select(). A better (and faster) approach is to do some bit manipulation in order to shrink the range of candidate resources. On a release build, this change gives an average speedup of ~10%. llvm-svn: 348007
*	[llvm-mca] Simplify code in class Scheduler. NFCI	Andrea Di Biagio	2018-11-30	4	-18/+18
\| \| \| \|	llvm-svn: 347985
*	[llvm-mca][MC] Add the ability to declare which processor resources model ↵	Andrea Di Biagio	2018-11-29	6	-16/+137
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	load/store queues (PR36666). This patch adds the ability to specify via tablegen which processor resources are load/store queue resources. A new tablegen class named MemoryQueue can be optionally used to mark resources that model load/store queues. Information about the load/store queue is collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and `StoreQueueID`. Those two fields are identifiers for buffered resources used to describe the load queue and the store queue. Field `BufferSize` is interpreted as the number of entries in the queue, while the number of units is a throughput indicator (i.e. number of available pickers for loads/stores). At construction time, LSUnit in llvm-mca checks for the presence of extra processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If that information is available, and fields LoadQueueID and StoreQueueID are set to a value different than zero (i.e. the invalid processor resource index), then LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value declared by the two processor resources. With this patch, we more accurately track dynamic dispatch stalls caused by the lack of LS tokens (i.e. load/store queue full). This is also shown by the differences in two BdVer2 tests. Stalls that were previously classified as generic SCHEDULER FULL stalls, are not correctly classified either as "load queue full" or "store queue full". About the differences in the -scheduler-stats view: those differences are expected, because entries in the load/store queue are not released at instruction issue stage. Instead, those are released at instruction executed stage. This is the main reason why for the modified tests, the load/store queues gets full before PdEx is full. Differential Revision: https://reviews.llvm.org/D54957 llvm-svn: 347857
*	Reapply "[llvm-mca] Return the total number of cycles from method ↵	Andrea Di Biagio	2018-11-28	3	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pipeline::run()." This reapplies r347767 (originally reviewed at: https://reviews.llvm.org/D55000) with a fix for the missing std::move of the Error returned by the call to Pipeline::runCycle(). Below is the original commit message from r347767. If a user only cares about the overall latency, then the best/quickest way is to change method Pipeline::run() so that it returns the total number of cycles to the caller. When the simulation pipeline is run, the number of cycles (or an error) is returned from method Pipeline::run(). The advantage is that no hardware event listener is needed for computing that latency. So, the whole process should be faster (and simpler - at least for that particular use case). llvm-svn: 347795
*	Revert [llvm-mca] Return the total number of cycles from method Pipeline::run().	Andrea Di Biagio	2018-11-28	3	-9/+5
\| \| \| \| \| \|	This reverts commits 347767. llvm-svn: 347775
*	[llvm-mca] Return the total number of cycles from method Pipeline::run().	Andrea Di Biagio	2018-11-28	3	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a user only cares about the overall latency, then the best/quickest way is to change method Pipeline::run() so that it returns the total number of cycles to the caller. When the simulation pipeline is run, the number of cycles (or an error) is returned from method Pipeline::run(). The advantage is that no hardware event listener is needed for computing that latency. So, the whole process should be faster (and simpler - at least for that particular use case). llvm-svn: 347767
*	[llvm-mca] Add support for instructions with a variadic number of operands.	Andrea Di Biagio	2018-11-25	1	-18/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, llvm-mca conservatively assumes that a register operand from the variadic sequence is both a register read and a register write. That is because MCInstrDesc doesn't describe extra variadic operands; we don't have enough dataflow information to tell which register operands from the variadic sequence is a definition, and which is a use instead. However, if a variadic instruction is flagged 'mayStore' (but not 'mayLoad'), and it has no 'unmodeledSideEffects', then llvm-mca (very) optimistically assumes that any register operand in the variadic sequence is a register read only. Conversely, if a variadic instruction is marked as 'mayLoad' (but not 'mayStore'), and it has no 'unmodeledSideEffects', then llvm-mca optimistically assumes that any extra register operand is a register definition only. These assumptions work quite well for variadic load/store multiple instructions defined by the ARM backend. llvm-svn: 347522
*	[llvm-mca] InstrBuilder: warnings for call/ret instructions are only ↵	Andrea Di Biagio	2018-11-24	2	-4/+14
\| \| \| \| \| \|	reported once. llvm-svn: 347514
*	[llvm-mca] Refactor some of the logic in InstrBuilder, and add a ↵	Andrea Di Biagio	2018-11-23	2	-77/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	verifyOperands method. With this change, InstrBuilder emits an error if the MCInst sequence contains an instruction with a variadic opcode, and a non-zero number of variadic operands. Currently we don't know how to correctly analyze variadic opcodes. The problem with variadic operands is that there is no information for them in the opcode descriptor (i.e. MCInstrDesc). That means, we don't know which variadic operands are defs, and which are uses. In future, we could try to conservatively assume that any extra register operands is both a register use and a register definition. This patch fixes a subtle bug in the evaluation of read/write operands for ARM VLD1 with implicit index update. Added test vld1-index-update.s llvm-svn: 347503
*	[llvm-mca][View] Improved Retire Control Unit Statistics.	Andrea Di Biagio	2018-11-23	3	-16/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RetireControlUnitStatistics now reports extra information about the ROB and the avg/maximum number of entries consumed over the entire simulation. Example: Retire Control Unit - number of cycles where we saw N instructions retired: [# retired], [# cycles] 0, 109 (17.9%) 1, 102 (16.7%) 2, 399 (65.4%) Total ROB Entries: 64 Max Used ROB Entries: 35 ( 54.7% ) Average Used ROB Entries per cy: 32 ( 50.0% ) Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to reflect this change. llvm-svn: 347493
*	[llvm-mca] LSUnit: use a SmallSet to model load/store queues. NFCI	Andrea Di Biagio	2018-11-22	2	-25/+32
\| \| \| \| \| \| \| \| \| \|	Also, try to minimize the number of queries to the memory queues to speedup the analysis. On average, this change gives a small 2% speedup. For memcpy-like kernels, the speedup is up to 5.5%. llvm-svn: 347469
*	[llvm-mca] Use a SmallVector instead of std::vector to track register ↵	Andrea Di Biagio	2018-11-22	2	-9/+11
\| \| \| \| \| \| \| \| \| \|	reads/writes. NFCI This avoids a heap allocation most of the times. This patch gives a small but consistent 3% speedup on a release build (up to ~5% on a debug build). llvm-svn: 347464
*	[llvm-mca] Fix an invalid memory read introduced by r346487.	Andrea Di Biagio	2018-11-22	3	-16/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes an invalid memory read introduced by r346487. Before this patch, partial register write had to query the latency of the dependent full register write by calling a method on the full write descriptor. However, if the full write is from an already retired instruction, chances are that the EntryStage already reclaimed its memory. In some parial register write tests, valgrind was reporting an invalid memory read. This change fixes the invalid memory access problem. Writes are now responsible for tracking dependent partial register writes, and notify them in the event of instruction issued. That means, partial register writes no longer need to query their associated full write to check when they are ready to execute. Added test X86/BtVer2/partial-reg-update-7.s llvm-svn: 347459
*	[llvm-mca] Correctly update the resource strategy for processor resources ↵	Andrea Di Biagio	2018-11-12	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with multiple units. When looking at the tests committed by Roman at r346587, I noticed that numbers reported by the resource pressure for PdAGU01 were wrong. In particular, according to the aut-generated CHECK lines in tests memcpy-like-test.s and store-throughput.s, resource pressure for PdAGU01 was not uniformly distributed among the two AGEN pipes. It turns out that the reason why pressure was not correctly distributed, was because the "resource selection strategy" object associated with PdAGU01 was not correctly updated on the event of AGEN pipe used. As a result, llvm-mca was not simulating a round-robin pipeline allocation for PdAGU01. Instead, PdAGU1 was always prioritized over PdAGU0. This patch fixes the issue; now processor resource strategy objects for resources declaring multiple units, are correctly notified in the event of "resource used". llvm-svn: 346650
*	[llvm-mca] Account for buffered resources when analyzing "Super" resources.	Andrea Di Biagio	2018-11-09	1	-1/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was noticed when working on PR3946. By construction, a group cannot be used as a "Super" resource. That constraint is enforced by method `SubtargetEmitter::ExpandProcResource()`. A Super resource S can be part of a group G. However, method `SubtargetEmitter::ExpandProcResource()` would not update the number of consumed resource cycles in G based on S. In practice, this is perfectly fine because the resource usage is correctly computed for processor resource units. However, llvm-mca should still check if G is a buffered resource. Before this patch, llvm-mca didn't correctly check if S was part of a group that defines a buffer. So, the instruction descriptor was not correctly set. For now, the semantic change introduced by this patch doesn't affect any of the upstream scheduling models. However, it will allow to make some progress on PR3946. llvm-svn: 346545
*	[llvm-mca] Use a small vector for instructions in the EntryStage.	Andrea Di Biagio	2018-11-09	3	-11/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a simple SmallVector to track the lifetime of simulated instructions. An ordered map was not needed because instructions are already picked in program order. It is also much faster if we avoid searching for already retired instructions at the end of every cycle. The new policy only triggers a "garbage collection" when the number of retired instructions becomes significantly big when compared with the total size of the vector. While working on this, I noticed that instructions were correctly retired, but their internal state was not updated (i.e. there was no transition from the EXECUTED state, to the RETIRED state). While this was not a problem for the views, it prevented the EntryStage from correctly garbage collecting already retired instructions. That was a bad oversight, and this patch fixes it. The observed speedup on a debug build of llvm-mca after this patch is ~6%. On a release build of llvm-mca, the observed speedup is ~%15%. llvm-svn: 346487
*	[llvm-mca] Partially revert r346417.	Matt Davis	2018-11-08	1	-16/+19
\| \| \| \| \| \| \|	Restored the llvm:: namespace qualifier on make_unique. This removes the ambiguity with make_unique. llvm-svn: 346424
*	[llvm-mca] PR39261: Rename FetchStage to EntryStage.	Andrea Di Biagio	2018-11-08	5	-22/+23
\| \| \| \| \| \| \| \| \| \| \| \|	This fixes PR39261. FetchStage is a misnomer. It causes confusion with the frontend fetch stage, which we don't currently simulate. I decided to rename it into EntryStage mainly because this is meant to be a "source" stage for all pipelines. Differential Revision: https://reviews.llvm.org/D54268 llvm-svn: 346419
*	[llvm-mca] Remove unneeded namespace qualifier. NFC.	Matt Davis	2018-11-08	1	-24/+21
\| \| \| \|	llvm-svn: 346417
*	[llvm-mca] Move the AssembleInput logic into its own class.	Matt Davis	2018-11-07	5	-103/+218
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch introduces a CodeRegionGenerator class which is responsible for parsing some type of input and creating a 'CodeRegions' instance for use by llvm-mca. In the future, we will also have a CodeRegionGenerator subclass for converting an input object file into CodeRegions. For now, we only have the subclass for converting input assembly into CodeRegions. This is mostly a NFC patch, as the logic remains close to the original, but now encapsulated in its own class and moved outside of llvm-mca.cpp. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb Subscribers: mgorny, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D54179 llvm-svn: 346344
*	[llvm-mca] Add extra counters for move elimination in view ↵	Andrea Di Biagio	2018-11-01	7	-47/+163
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RegisterFileStatistics. This patch teaches view RegisterFileStatistics how to report events for optimizable register moves. For each processor register file, view RegisterFileStatistics reports the following extra information: - Number of optimizable register moves - Number of register moves eliminated - Number of zero moves (i.e. register moves that propagate a zero) - Max Number of moves eliminated per cycle. Differential Revision: https://reviews.llvm.org/D53976 llvm-svn: 345865
*	[llvm-mca] Remove the verb 'assemble' from a few options in help. NFC.	Matt Davis	2018-10-31	1	-17/+17
\| \| \| \| \| \| \|	* MCA does not assemble anything. * Ran clang-format. llvm-svn: 345750
*	[llvm-mca] Remove namespace prefixes made redundant by r345612. NFC	Andrea Di Biagio	2018-10-31	18	-149/+129
\| \| \| \|	llvm-svn: 345730
*	[llvm-mca] Move namespace mca inside llvm::	Fangrui Song	2018-10-30	59	-42/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows to remove `using namespace llvm;` in those .cpp files When we want to revisit the decision (everything resides in llvm::mca::) in the future, we can move things to a nested namespace of llvm::mca::, to conceptually make them separate from the rest of llvm::mca::* Reviewers: andreadb, mattd Reviewed By: andreadb Subscribers: javed.absar, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D53407 llvm-svn: 345612
*	[llvm-mca] Lower to mca::Instructon before the pipeline is run.	Andrea Di Biagio	2018-10-29	7	-59/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this change, the lowering of instructions from llvm::MCInst to mca::Instruction was done as part of the first stage of the pipeline (i.e. the FetchStage). In particular, FetchStage was responsible for picking the next instruction from the source sequence, and lower it to an mca::Instruction with the help of an object of class InstrBuilder. The dependency on InstrBuilder was problematic for a number of reasons. Class InstrBuilder only knows how to lower from llvm::MCInst to mca::Instruction. That means, it is hard to support a different scenario where instructions in input are not instances of class llvm::MCInst. Even if we managed to specialize InstrBuilder, and generalize most of its internal logic, the dependency on InstrBuilder in FetchStage would have caused more troubles (other than complicating the pipeline logic). With this patch, the lowering step is done before the pipeline is run. The pipeline is no longer responsible for lowering from MCInst to mca::Instruction. As a consequence of this, the FetchStage no longer needs to interact with an InstrBuilder. The mca::SourceMgr class now simply wraps a reference to a sequence of mca::Instruction objects. This simplifies the logic of FetchStage, and increases the usability of it. As a result, on a debug build, we see a 7-9% speedup; on a release build, the speedup is around 3-4%. llvm-svn: 345500
*	[llvm-mca] Fix -wreorder and -Wunused-private-field after r345376. NFC	Sam McCall	2018-10-26	2	-4/+2
\| \| \| \|	llvm-svn: 345378
*	[llvm-mca] Removed dependency on mca::SourcMgr in some Views. NFC	Andrea Di Biagio	2018-10-26	9	-44/+61
\| \| \| \|	llvm-svn: 345376
*	[llvm-mca] Introduce a new base class for mca::Instruction, and change how ↵	Andrea Di Biagio	2018-10-25	5	-79/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	read/write information is stored. This patch introduces a new base class for Instruction named InstructionBase. Class InstructionBase is responsible for tracking data dependencies with the help of ReadState and WriteState objects. Class Instruction now derives from InstructionBase, and adds extra information related to the `InstrStage` as well as the `RCUTokenID`. ReadState and WriteState objects are no longer unique pointers. This avoids extra heap allocation and pointer checks that weren't really needed. Now, those objects are simply stored into SmallVectors. We use a SmallVector instead of a std::vector because we expect most instructions to only have a very small number of reads and writes. By using a simple SmallVector we also avoid extra heap allocations most of the time. In a debug build, this improves the performance of llvm-mca by roughly 10% (I still have to verify the impact in performance on a release build). llvm-svn: 345280
*	[llvm-mca] Removed a couple of redundant method declarations, and simplified ↵	Andrea Di Biagio	2018-10-25	7	-38/+23
\| \| \| \| \| \|	code in ResourcePressureView. NFC llvm-svn: 345259
*	[llvm-mca] Replace InstRef::isValid with operator bool. NFC.	Matt Davis	2018-10-24	6	-17/+12
\| \| \| \|	llvm-svn: 345190
*	[llvm-mca] Simplify the logic in FetchStage. NFCI	Andrea Di Biagio	2018-10-24	3	-22/+18
\| \| \| \| \| \|	Only method 'getNextInstruction()' needs to interact with the SourceMgr. llvm-svn: 345185
*	[llvm-mca] Remove dependency from InstrBuilder in class InstructionTables.	Andrea Di Biagio	2018-10-24	6	-12/+9
\| \| \| \| \| \| \| \| \|	Also, removed the initialization of vectors used for processor resource masks. Support function 'computeProcResourceMasks()' already calls method resize on those vectors. No functional change intended. llvm-svn: 345161
*	[llvm-mca] Refactor class SourceMgr. NFCI	Andrea Di Biagio	2018-10-24	4	-44/+43
\| \| \| \| \| \| \| \|	Added begin()/end() methods to allow the usage of SourceMgr in foreach loops. With this change, method getMCInstFromIndex() (as well as a couple of other methods) are now redundant, and can be removed from the public interface. llvm-svn: 345147
*	[llvm-mca] [llvm-mca] Improved error handling and error reporting from class ↵	Andrea Di Biagio	2018-10-24	4	-47/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	InstrBuilder. A new class named InstructionError has been added to Support.h in order to improve the error reporting from class InstrBuilder. The llvm-mca driver is responsible for handling InstructionError objects, and printing them out to stderr. The goal of this patch is to remove all the remaining error handling logic from the library code. In particular, this allows us to: - Simplify the logic in InstrBuilder by removing a needless dependency from MCInstrPrinter. - Centralize all the error halding logic in a new function named 'runPipeline' (see llvm-mca.cpp). This is also a first step towards generalizing class InstrBuilder, so that in future, we will be able to reuse its logic to also "lower" MachineInstr to mca::Instruction objects. Differential Revision: https://reviews.llvm.org/D53585 llvm-svn: 345129