bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[MCA][Bottleneck Analysis] Teach how to compute a critical sequence of ↵	Andrea Di Biagio	2019-06-21	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions based on the simulation. This patch teaches the bottleneck analysis how to identify and print the most expensive sequence of instructions according to the simulation. Fixes PR37494. The goal is to help users identify the sequence of instruction which is most critical for performance. A dependency graph is internally used by the bottleneck analysis to describe data dependencies and processor resource interferences between instructions. There is one node in the graph for every instruction in the input assembly sequence. The number of nodes in the graph is independent from the number of iterations simulated by the tool. It means that a single node of the graph represents all the possible instances of a same instruction contributed by the simulated iterations. Edges are dynamically "discovered" by the bottleneck analysis by observing instruction state transitions and "backend pressure increase" events generated by the Execute stage. Information from the events is used to identify critical dependencies, and materialize edges in the graph. A dependency edge is uniquely identified by a pair of node identifiers plus an instance of struct DependencyEdge::Dependency (which provides more details about the actual dependency kind). The bottleneck analysis internally ranks dependency edges based on their impact on the runtime (see field DependencyEdge::Dependency::Cost). To this end, each edge of the graph has an associated cost. By default, the cost of an edge is a function of its latency (in cycles). In practice, the cost of an edge is also a function of the number of cycles where the dependency has been seen as 'contributing to backend pressure increases'. The idea is that the higher the cost of an edge, the higher is the impact of the dependency on performance. To put it in another way, the cost of an edge is a measure of criticality for performance. Note how a same edge may be found in multiple iteration of the simulated loop. The logic that adds new edges to the graph checks if an equivalent dependency already exists (duplicate edges are not allowed). If an equivalent dependency edge is found, field DependencyEdge::Frequency of that edge is incremented by one, and the new cost is cumulatively added to the existing edge cost. At the end of simulation, costs are propagated to nodes through the edges of the graph. The goal is to identify a critical sequence from a node of the root-set (composed by node of the graph with no predecessors) to a 'sink node' with no successors. Note that the graph is intentionally kept acyclic to minimize the complexity of the critical sequence computation algorithm (complexity is currently linear in the number of nodes in the graph). The critical path is finally computed as a sequence of dependency edges. For edges describing processor resource interferences, the view also prints a so-called "interference probability" value (by dividing field DependencyEdge::Frequency by the total number of iterations). Examples of critical sequence computations can be found in tests added/modified by this patch. On output streams that support colored output, instructions from the critical sequence are rendered with a different color. Strictly speaking the analysis conducted by the bottleneck analysis view is not a critical path analysis. The cost of an edge doesn't only depend on the dependency latency. More importantly, the cost of a same edge may be computed differently by different iterations. The number of dependencies is discovered dynamically based on the events generated by the simulator. However, their number is not fixed. This is especially true for edges that model processor resource interferences; an interference may not occur in every iteration. For that reason, it makes sense to also print out a "probability of interference". By construction, the accuracy of this analysis (as always) is strongly dependent on the simulation (and therefore the quality of the information available in the scheduling model). That being said, the critical sequence effectively identifies a performance criticality. Instructions from that sequence are expected to have a very big impact on performance. So, users can take advantage of this information to focus their attention on specific interactions between instructions. In my experience, it works quite well in practice, and produces useful output (in a reasonable amount time). Differential Revision: https://reviews.llvm.org/D63543 llvm-svn: 364045
*	[llvm-mca] Enable bottleneck analysis when flag -all-views is specified.	Andrea Di Biagio	2019-06-10	1	-0/+7
\| \| \| \| \| \| \|	Bottleneck Analysis is one of the many views available in llvm-mca. Therefore, it should be enabled when flag -all-views is passed in input to the tool. llvm-svn: 362964
*	[llvm-mca][scheduler-stats] Print issued micro opcodes per cycle. NFCI	Andrea Di Biagio	2019-04-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	It makes more sense to print out the number of micro opcodes that are issued every cycle rather than the number of instructions issued per cycle. This behavior is also consistent with the dispatch-stats: numbers from the two views can now be easily compared. llvm-svn: 357919
*	[llvm-mca][View] Improved Retire Control Unit Statistics.	Andrea Di Biagio	2018-11-23	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RetireControlUnitStatistics now reports extra information about the ROB and the avg/maximum number of entries consumed over the entire simulation. Example: Retire Control Unit - number of cycles where we saw N instructions retired: [# retired], [# cycles] 0, 109 (17.9%) 1, 102 (16.7%) 2, 399 (65.4%) Total ROB Entries: 64 Max Used ROB Entries: 35 ( 54.7% ) Average Used ROB Entries per cy: 32 ( 50.0% ) Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to reflect this change. llvm-svn: 347493
*	[utils] Stricter checking from update_mca_test_checks.py	Greg Bedwell	2018-09-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If any prefixes have been specified on the RUN lines that do not end up ever actually getting printed, raise an Error. This is either an indication that the run lines just need cleaning up, or that something is more fundamentally wrong with the test. Also raise an Error if there are any blocks which cannot be checked because they are not uniquely covered by a prefix. Fixed up a couple of tests where the extra checking flagged up issues. Differential Revision: https://reviews.llvm.org/D48276 llvm-svn: 343332
*	[utils] Allow better identification of matching blocks in ↵	Greg Bedwell	2018-09-28	1	-47/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	update_mca_test_checks.py Insert empty blocks to cause the positions of matching blocks to match across lists where possible so that later stages of the algorithm can actually identify them as being identical. Regenerated all tests with this change. Differential Revision: https://reviews.llvm.org/D52560 llvm-svn: 343331
*	[llvm-mca] Report the number of dispatched micro opcodes in the ↵	Andrea Di Biagio	2018-08-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DispatchStatistics view. This patch introduces the following changes to the DispatchStatistics view: * DispatchStatistics now reports the number of dispatched opcodes instead of the number of dispatched instructions. * The "Dynamic Dispatch Stall Cycles" table now also reports the percentage of stall cycles against the total simulated cycles. This change allows users to easily compare dispatch group sizes with the processor DispatchWidth. Before this change, it was difficult to correlate the two numbers, since DispatchStatistics view reported numbers of instructions (instead of opcodes). DispatchWidth defines the maximum size of a dispatch group in terms of number of micro opcodes. The other change introduced by this patch is related to how DispatchStage generates "instruction dispatch" events. In particular: * There can be multiple dispatch events associated with a same instruction * Each dispatch event now encapsulates the number of dispatched micro opcodes. The number of micro opcodes declared by an instruction may exceed the processor DispatchWidth. Therefore, we cannot assume that instructions are always fully dispatched in a single cycle. DispatchStage knows already how to handle instructions declaring a number of opcodes bigger that DispatchWidth. However, DispatchStage always emitted a single instruction dispatch event (during the first simulated dispatch cycle) for instructions dispatched. With this patch, DispatchStage now correctly notifies multiple dispatch events for instructions that cannot be dispatched in a single cycle. A few views had to be modified. Views can no longer assume that there can only be one dispatch event per instruction. Tests (and docs) have been updated. Differential Revision: https://reviews.llvm.org/D51430 llvm-svn: 341055
*	[llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report ↵	Andrea Di Biagio	2018-08-29	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	generated by the SummaryView. This patch adds two new fields to the perf report generated by the SummaryView. Fields are now logically organized into two small groups; only the second group contains throughput indicators. Example: ``` Iterations: 100 Instructions: 300 Total Cycles: 414 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 1.69 IPC: 0.72 Block RThroughput: 4.0 ``` This patch also updates the docs for llvm-mca. Due to the nature of this change, several tests in the tools/llvm-mca directory were affected, and had to be updated using script `update_mca_test_checks.py`. llvm-svn: 340946
*	[llvm-mca] Improved report generated by the SchedulerStatistics view.	Andrea Di Biagio	2018-08-27	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	Before this patch, the SchedulerStatistics only printed the maximum number of buffer entries consumed in each scheduler's queue at a given point of the simulation. This patch restructures the reported table, and adds an extra field named "Average number of used buffer entries" to it. This patch also uses different colors to help identifying bottlenecks caused by high scheduler's buffer pressure. llvm-svn: 340746
*	[llvm-mca] Regenerate X86 specific tests. NFC	Andrea Di Biagio	2018-07-15	1	-1/+1
\| \| \| \| \| \|	Not all tests were correctly updated by the update script after r336797. llvm-svn: 337124
*	[llvm-mca] Use an ordered map to collect hardware statistics. NFC.	Andrea Di Biagio	2018-06-18	1	-1/+1
\| \| \| \| \| \| \|	Histogram entries are now ordered by key. This should improves their readability when statistics are printed. llvm-svn: 334961
*	[llvm-mca] Make sure not to end the test files with an empty line.	Roman Lebedev	2018-06-04	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It's super irritating. [properly configured] git client then complains about that double-newline, and you have to use `--force` to ignore the warning, since even if you fix it manually, it will be reintroduced the very next runtime :/ Reviewers: RKSimon, andreadb, courbet, craig.topper, javed.absar, gbedwell Reviewed By: gbedwell Subscribers: javed.absar, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47697 llvm-svn: 333887
*	[UpdateTestChecks] Improved update_mca_test_checks block analysis	Greg Bedwell	2018-05-24	1	-80/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously update_mca_test_checks worked entirely at "block" level where a block is some sequence of lines delimited by at least one empty line. This generally worked well, but could sometimes lead to excessive repetition of check lines for various prefixes if some block was almost identical between prefixes, but not quite (for example, due to a different dispatch width in the otherwise identical summary views). This new analyis attempts to split blocks further in the case where the following conditions are met: a) There is some prefix common to every RUN line (typically 'ALL'). b) The first line of the block is common to the output with every prefix. c) The block has the same number of lines for the output with every prefix. Also, regenerated all llvm-mca test files with the following command: update_mca_test_checks.py "../test/tools/llvm-mca//.s" "../test/tools/llvm-mca///*.s" The new analysis showed a "multiple lines not disambiguated by prefixes" warning for test "AArch64/Exynos/scheduler-queue-usage.s" so I've also added some explicit prefixes to each of the RUN lines in that test. Differential Revision: https://reviews.llvm.org/D47321 llvm-svn: 333204
*	[X86][BtVer2] Add a 'J' prefix to the PRF/RCU defs. NFC	Andrea Di Biagio	2018-05-21	1	-2/+2
\| \| \| \| \| \| \|	This is to keep the Jaguar model's naming convention. Processor resources all have a 'J' prefix in the BtVer2 scheduling model. llvm-svn: 332851
*	[llvm-mca] add flag -all-views and flag -all-stats.	Andrea Di Biagio	2018-05-17	1	-0/+141
	Flag -all-views enables all the views. Flag -all-stats enables all the views that print hardware statistics. llvm-svn: 332602