summaryrefslogtreecommitdiffstats
path: root/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
Commit message (Collapse)AuthorAgeFilesLines
* [llvm-exegesis][NFC] Remove extra `llvm::` qualifications.Clement Courbet2019-10-091-10/+9
| | | | | | | | | | | | | | Summary: Second patch: in the lib. Reviewers: gchatelet Subscribers: nemanjai, tschuett, MaskRay, mgrang, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68692 llvm-svn: 374158
* [llvm-exegesis] Finish plumbing the `Config` field.Clement Courbet2019-10-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: Right now there are no snippet generators that emit the `Config` Field, but I plan to add it to investigate LEA operands for PR32326. What was broken was: - `Config` Was not propagated up until the BenchmarkResult::Key. - Clustering should really consider different configs as measuring different things, so we should stabilize on (Opcode, Config) instead of just Opcode. Reviewers: gchatelet Subscribers: tschuett, llvm-commits, lebedev.ri Tags: #llvm Differential Revision: https://reviews.llvm.org/D68629 llvm-svn: 374031
* [llvm-exegesis] Add loop mode for repeating the snippet.Clement Courbet2019-09-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Before this change the Executable function was made by duplicating the snippet. This change adds a --repetion-mode={loop|duplicate} flag that allows choosing between this behaviour and wrapping the snippet instructions in a loop. The new mode can help measurements when the snippet fits in the DSB by short-cirtcuiting decoding. The loop adds a dec + jmp to the measurements, but since these are not part of the critical path, they execute in parallel with the measured code and do not impact measurements in practice. Overview of the change: - New SnippetRepetitor abstraction that handles repeating the snippet. The assembler delegates repeating the instructions to this class. - ExegesisTarget learns how to decrement loop counter and jump. - Some refactoring of the assembler into FunctionFiller/BasicBlockFiller. Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68125 llvm-svn: 373083
* [llvm-exegesis] Fix error propagation from yaml writing (from serialization)Roman Lebedev2019-04-101-2/+3
| | | | | | Investigating https://bugs.llvm.org/show_bug.cgi?id=41448 llvm-svn: 358076
* [llvm-exegesis] Opcode stabilization / reclusterization (PR40715)Roman Lebedev2019-02-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then, to facilitate further analysis we group the benchmarks with *similar* characteristics into clusters. Now, this is all not entirely deterministic. Some instructions have variable characteristics, depending on their arguments. And thus, if we do several benchmarks of the same instruction `Opcode`, we may end up with *different* performance characteristics measurements. And when we then do clustering, these several benchmarks of the same instruction `Opcode` may end up being clustered into *different* clusters. This is not great for further analysis. We shall find every `Opcode` with benchmarks not in just one cluster, and move *all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`. I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit, and introducing `-analysis-display-unstable-clusters` switch to toggle between displaying stable-only clusters and unstable-only clusters. The reclusterization is deterministically stable, produces identical reports between runs. (Or at least that is what i'm seeing, maybe it isn't) Timings/comparisons: old (current trunk/head) {F8303582} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% ) 172 context-switches # 25.965 M/sec ( +- 29.89% ) 0 cpu-migrations # 0.042 M/sec ( +- 56.54% ) 31073 page-faults # 4690.754 M/sec ( +- 0.08% ) 26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%) 2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%) 13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%) 19770706799 instructions # 0.74 insn per cycle # 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%) 4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%) 121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%) 6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% ) ``` patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs): 6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 213 context-switches # 32.952 M/sec ( +- 23.81% ) 1 cpu-migrations # 0.130 M/sec ( +- 43.84% ) 31287 page-faults # 4832.057 M/sec ( +- 0.08% ) 25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%) 1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%) 13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%) 19752995402 instructions # 0.76 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%) 4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%) 121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%) 6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% ) ``` Funnily, *this* measurement shows that said reclustering actually improved performance. patch, with reclustering, only the stable clusters {F8303594} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs): 6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% ) 133 context-switches # 20.792 M/sec ( +- 23.39% ) 0 cpu-migrations # 0.063 M/sec ( +- 61.24% ) 31318 page-faults # 4903.256 M/sec ( +- 0.08% ) 25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%) 1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%) 13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%) 19767554347 instructions # 0.77 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%) 4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%) 118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%) 6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% ) ``` Performance improved even further?! Makes sense i guess, less clusters to print. patch, with reclustering, only the unstable clusters {F8303601} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs): 6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% ) 194 context-switches # 31.709 M/sec ( +- 20.46% ) 0 cpu-migrations # 0.039 M/sec ( +- 49.77% ) 31413 page-faults # 5129.261 M/sec ( +- 0.06% ) 24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%) 1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%) 13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%) 18260877653 instructions # 0.74 insn per cycle # 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%) 4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%) 114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%) 6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% ) ``` This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps) (Also, wow this is fast, it used to take several minutes originally) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D58355 llvm-svn: 354441
* [llvm-exegesis] Add throughput mode.Clement Courbet2019-01-301-1/+1
| | | | | | | | | | | | | | | | Summary: This just uses the latency benchmark runner on the parallel uops snippet generator. Fixes PR37698. Reviewers: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D57000 llvm-svn: 352632
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* Revert "[llvm-exegesis] Add a snippet generator to generate snippets to ↵Clement Courbet2018-11-081-1/+1
| | | | | | | | compute ROB sizes." This reverts accidental commit rL346394. llvm-svn: 346398
* [llvm-exegesis] Add a snippet generator to generate snippets to compute ROB ↵Clement Courbet2018-11-081-1/+1
| | | | | | sizes. llvm-svn: 346394
* [llvm-exegesis] Move namespace exegesis inside llvm::Fangrui Song2018-10-221-0/+2
| | | | | | | | | | | | | | | | Summary: This allows simplifying references of llvm::foo with foo when the needs come in the future. Reviewers: courbet, gchatelet Reviewed By: gchatelet Subscribers: javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53455 llvm-svn: 344922
* [llvm-exegesis] Get rid of debug_string.Clement Courbet2018-09-261-3/+1
| | | | | | | | | | | | | | | | Summary: THis is a backwards-compatible change (existing files will work as expected). See PR39082. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52546 llvm-svn: 343108
* [llvm-exegesis] Add support for measuring NumMicroOps.Clement Courbet2018-09-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Example output for vzeroall: --- mode: uops key: instructions: - 'VZEROALL' config: '' register_initial_values: cpu_name: haswell llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { debug_string: HWPort0, value: 0.0006, per_snippet_value: 0.0006, key: '3' } - { debug_string: HWPort1, value: 0.0011, per_snippet_value: 0.0011, key: '4' } - { debug_string: HWPort2, value: 0.0004, per_snippet_value: 0.0004, key: '5' } - { debug_string: HWPort3, value: 0.0018, per_snippet_value: 0.0018, key: '6' } - { debug_string: HWPort4, value: 0.0002, per_snippet_value: 0.0002, key: '7' } - { debug_string: HWPort5, value: 1.0019, per_snippet_value: 1.0019, key: '8' } - { debug_string: HWPort6, value: 1.0033, per_snippet_value: 1.0033, key: '9' } - { debug_string: HWPort7, value: 0.0001, per_snippet_value: 0.0001, key: '10' } - { debug_string: NumMicroOps, value: 20.0069, per_snippet_value: 20.0069, key: NumMicroOps } error: '' info: '' assembled_snippet: C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C3 ... Reviewers: gchatelet Subscribers: tschuett, RKSimon, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D52539 llvm-svn: 343094
* [llvm-exegesis] Output the unscaled value as well as the scaled one.Clement Courbet2018-09-261-2/+7
| | | | | | | | | | | | Summary: See PR38936 for context. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52500 llvm-svn: 343081
* [llvm-exegesis] Serializes registers initial values.Guillaume Chatelet2018-09-251-0/+3
| | | | | | | | | | | | Summary: Adds the registers initial values to the YAML output of llvm-exegesis. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52460 llvm-svn: 342982
* [llvm-exegesis][NFC] Rewrite of the YAML serialization.Guillaume Chatelet2018-09-251-41/+6
| | | | | | | | | | | | | | Summary: This is a NFC in preparation of exporting the initial registers as part of the YAML dump Reviewers: courbet Reviewed By: courbet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52427 llvm-svn: 342967
* [llvm-exegesis][NFC] Add more comments.Clement Courbet2018-06-151-0/+3
| | | | llvm-svn: 334811
* [llvm-exegesis] Print the whole snippet in analysis.Clement Courbet2018-06-151-0/+1
| | | | | | | | | | | | | | | | | Summary: On hover, the whole asm snippet is displayed, including operands. This requires the actual assembly output instead of just the MCInsts: This is because some pseudo-instructions get lowered to actual target instructions during codegen (e.g. ABS_Fp32 -> SSE or X87). Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D48164 llvm-svn: 334805
* [llvm-exegesis] Use BenchmarkResult::Instructions instead of OpcodeNameClement Courbet2018-06-141-1/+0
| | | | | | | | | | | | | | | | | | Summary: Get rid of OpcodeName. To remove the opcode name from an old file: ``` cat old_file | sed '/opcode_name.*/d' ``` Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D48121 llvm-svn: 334691
* [llvm-exegesis] Improve error reporting.Guillaume Chatelet2018-06-071-8/+6
| | | | | | | | | | | | Summary: BenchmarkResult IO functions now return an Error or Expected so caller can deal take proper action. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D47868 llvm-svn: 334167
* [llvm-exegesis] move Mode from Key to BenchmarResult.Clement Courbet2018-06-061-2/+2
| | | | | | | | | | | | | | | | | | Moves the Mode field out of the Key. The existing yaml benchmark results can be fixed with the following script: ``` readonly FILE=$1 readonly MODE=latency # Change to uops to fix a uops benchmark. cat $FILE | \ sed "/^\ \+mode:\ \+$MODE$/d" | \ sed "/^cpu_name.*$/i mode: $MODE" ``` Differential Revision: https://reviews.llvm.org/D47813 Authored by: Guillaume Chatelet llvm-svn: 334079
* [llvm-exegesis] Add instructions to BenchmarkResult Key.Clement Courbet2018-06-051-7/+53
| | | | | | | | | | We want llvm-exegesis to explore instructions (effect of initial register values, effect of operand selection). To enable this a BenchmarkResult muststore all the relevant data in its key. This patch starts adding such data. Here we simply allow to store the generated instructions, following patches will add operands and initial values for registers. https://reviews.llvm.org/D47764 Authored by: Guilluame Chatelet llvm-svn: 334008
* [llvm-exegesis][NFC] Use an enum instead of a string for benchmark mode.Clement Courbet2018-06-041-2/+2
| | | | | | | | | | | | Summary: YAML encoding is backwards-compatible. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D47705 llvm-svn: 333886
* [llvm-exegesis] Analysis: Show inconsistencies between checked-in and ↵Clement Courbet2018-06-041-0/+2
| | | | | | | | | | | | | | | | | | measured data. Summary: We now highlight any sched classes whose measurements do not match the LLVM SchedModel. "bad" clusters are marked in red. Screenshot in phabricator diff. Reviewers: gchatelet Subscribers: tschuett, mgrang, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D47639 llvm-svn: 333884
* [llvm-exegesis] Analysis: Show value extents.Clement Courbet2018-05-241-0/+24
| | | | | | | | | | | | Summary: Screenshot attached in phabricator. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D47318 llvm-svn: 333181
* reland r332579: [llvm-exegesis] Update to cover latency through another opcode.Clement Courbet2018-05-171-2/+7
| | | | | | | | | | | | | | Restructuring the code to measure latency and uops. The end goal is to have this program spawn another process to deal with SIGILL and other malformed programs. It is not yet the case in this redesign, it is still the main program that runs the code (and may crash). It now uses BitVector instead of Graph for performance reasons. https://reviews.llvm.org/D46821 (with fixed ARM tests) Authored by Guillaume Chatelet llvm-svn: 332592
* Revert r332579 "[llvm-exegesis] Update to cover latency through another opcode."Clement Courbet2018-05-171-7/+2
| | | | | | The revision failed to update the ARM tests. llvm-svn: 332580
* [llvm-exegesis] Update to cover latency through another opcode.Clement Courbet2018-05-171-2/+7
| | | | | | | | | | | | Restructuring the code to measure latency and uops. The end goal is to have this program spawn another process to deal with SIGILL and other malformed programs. It is not yet the case in this redesign, it is still the main program that runs the code (and may crash). It now uses BitVector instead of Graph for performance reasons. https://reviews.llvm.org/D46821 Authored by Guillaume Chatelet llvm-svn: 332579
* [llvm-exegesis] Split AsmTemplate.Name into components.Clement Courbet2018-05-151-3/+10
| | | | | | | | | | | | | | Summary: AsmTemplate becomes IntructionBenchmarkKey, which has three components. This allows retreiving the opcode for analysis. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D46873 llvm-svn: 332348
* [llvm-exegesis] Allow lists of BenchmarkResults to be parsed as ↵Clement Courbet2018-05-141-0/+1
| | | | | | std::vector<BenchmarkResult>. llvm-svn: 332221
* [llvm-exegesis] Fix compilation on lld-x86_64-darwin13Clement Courbet2018-04-041-1/+1
| | | | | | | YAMLTraits does not know how to serialize `size_t` portably. Use `int` instead. llvm-svn: 329176
* Re-land r329156 "Add llvm-exegesis tool."Clement Courbet2018-04-041-0/+53
| | | | | | Fixed to depend on and initialize the native target instead of X86. llvm-svn: 329169
* Revert r329156 "Add llvm-exegesis tool."Clement Courbet2018-04-041-53/+0
| | | | | | Breaks a bunch of bots. llvm-svn: 329157
* Add llvm-exegesis tool.Clement Courbet2018-04-041-0/+53
Summary: [llvm-exegesis][RFC] Automatic Measurement of Instruction Latency/Uops This is the code corresponding to the RFC "llvm-exegesis Automatic Measurement of Instruction Latency/Uops". The RFC is available on the LLVM mailing lists as well as the following document for easier reading: https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?usp=sharing Subscribers: mgorny, gchatelet, orwant, llvm-commits Differential Revision: https://reviews.llvm.org/D44519 llvm-svn: 329156
OpenPOWER on IntegriCloud