summaryrefslogtreecommitdiffstats
path: root/llvm/tools/llvm-exegesis/lib/Analysis.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [Disassembler] Delete the VStream parameter of MCDisassembler::getInstruction()Fangrui Song2020-01-111-1/+1
| | | | | | | | | | The argument is llvm::null() everywhere except llvm::errs() in llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds. If we ever have the needs to add verbose log to disassemblers, we can record log with a member function, instead of passing it around as an argument.
* [MC] Add parameter `Address` to MCInstPrinter::printInstFangrui Song2020-01-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | printInst prints a branch/call instruction as `b offset` (there are many variants on various targets) instead of `b address`. It is a convention to use address instead of offset in most external symbolizers/disassemblers. This difference makes `llvm-objdump -d` output unsatisfactory. Add `uint64_t Address` to printInst(), so that it can pass the argument to printInstruction(). `raw_ostream &OS` is moved to the last to be consistent with other print* methods. The next step is to pass `Address` to printInstruction() (generated by tablegen from the instruction set description). We can gradually migrate targets to print addresses instead of offsets. In any case, downstream projects which don't know `Address` can pass 0 as the argument. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72172
* [Mips] Use appropriate private label prefix based on Mips ABIMirko Brkusanin2019-10-231-1/+4
| | | | | | | | | | MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64 regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo we can find out Mips ABI and pick appropriate prefix. Tags: #llvm, #clang, #lldb Differential Revision: https://reviews.llvm.org/D66795
* [llvm-exegesis] Show noise cluster in analysis output.Clement Courbet2019-10-111-19/+62
| | | | | | | | | | | | Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68780 llvm-svn: 374533
* [llvm-exegesis][NFC] Remove extra `llvm::` qualifications.Clement Courbet2019-10-091-56/+46
| | | | | | | | | | | | | | Summary: Second patch: in the lib. Reviewers: gchatelet Subscribers: nemanjai, tschuett, MaskRay, mgrang, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68692 llvm-svn: 374158
* [llvm] Migrate llvm::make_unique to std::make_uniqueJonas Devlieghere2019-08-151-1/+1
| | | | | | | | Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013
* [NFC][llvm-exegesis] Also promote getSchedClassPoint() into ResolvedSchedClass.Roman Lebedev2019-03-291-79/+1
| | | | | | | | | | | | | | | | | | Summary: It doesn't need anything from Analysis::SchedClassCluster class, and takes ResolvedSchedClass as param, so this seems rather fitting. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59994 llvm-svn: 357263
* [NFC][llvm-exegesis] Refactor ResolvedSchedClass & friendsRoman Lebedev2019-03-291-216/+8
| | | | | | | | | | | | | | | | | | | | | Summary: `ResolvedSchedClass` will need to be used outside of `Analysis` (before `InstructionBenchmarkClustering` even), therefore promote it into a non-private top-level class, and while there also move all of the functions that are only called by `ResolvedSchedClass` into that same new file. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: mgorny, tschuett, mgrang, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59993 llvm-svn: 357259
* [NFC][llvm-exegesis] Refactor Analysis::SchedClassCluster::measurementsMatch()Roman Lebedev2019-03-291-31/+51
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The diff looks scary but it really isn't: 1. I moved the check for the number of measurements into `SchedClassClusterCentroid::validate()` 2. While there, added a check that we can only have a single inverse throughput measurement. I missed that when adding it initially. 3. In `Analysis::SchedClassCluster::measurementsMatch()` is called with the current LLVM values from schedule class and the values from Centroid. 3.1. The values from centroid we can already get from `SchedClassClusterCentroid::getAsPoint()`. This isn't 100% a NFC, because previously for inverse throughput we used `min()`. I have asked whether i have done that correctly in https://reviews.llvm.org/D57647?id=184939#inline-510384 but did not hear back. I think `avg()` should be used too, thus it is a fix. 3.2. Finally, refactor the computation of the LLVM-specified values into `Analysis::SchedClassCluster::getSchedClassPoint()` I will need that function for [[ https://bugs.llvm.org/show_bug.cgi?id=41275 | PR41275 ]] Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59951 llvm-svn: 357245
* [llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880)Roman Lebedev2019-03-281-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is an alternative to D59539. Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`. Let's suppose we are using `-analysis-clustering-epsilon=0.5`. By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster. Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster. Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster. So all these points ended up in the same cluster. This may or may not be a correct implementation of dbscan clustering algorithm. But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data. Let's suppose all those opcodes are currently in the same sched cluster. If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter the LLVM values this cluster will **never** match the LLVM values, and thus this cluster will **always** be displayed as inconsistent. The solution is obviously to split off some of these opcodes into different sched cluster. But how do i do that? Out of 4 opcodes displayed in the inconsistency report, which ones are the "bad ones"? Which ones are the most different from the checked-in data? I'd need to go in to the `.yaml` and look it up manually. The trivial solution is to, when creating clusters, don't use the full dbscan algorithm, but instead "pick some unclustered point, pick all unclustered points that are it's neighbor, put them all into a new cluster, repeat". And just so as it happens, we can arrive at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step. But that won't work well once we teach analyze mode to operate in on-1D mode (i.e. on more than a single measurement type at a time), because the clustering would depend on the order of the measurements. Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster. And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster, and if they are not, the cluster (==opcode) is unstable. This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model.. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59820 llvm-svn: 357152
* [llvm-exegesis] Split Epsilon param into two (PR40787)Roman Lebedev2019-02-251-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values What if one wants to only look at highly different clusters, without changing the clustering itself? In particular, this helps to weed out noisy measurements (since the clusterization epsilon is still small, so there is a better chance that noisy measurements from the same opcode will go into different clusters) By splitting it into two params it is now possible. This is nearly-free performance-wise: Old: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% ) 12 context-switches # 31.735 M/sec ( +- 27.38% ) 0 cpu-migrations # 0.000 K/sec 4745 page-faults # 12183.732 M/sec ( +- 0.54% ) 1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%) 185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%) 392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%) 1839236666 instructions # 1.18 insn per cycle # 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%) 407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%) 10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%) 0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% ) 262 context-switches # 38.546 M/sec ( +- 23.06% ) 0 cpu-migrations # 0.065 M/sec ( +- 76.03% ) 13287 page-faults # 1953.206 M/sec ( +- 0.32% ) 27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%) 1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%) 16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%) 17611143370 instructions # 0.65 insn per cycle # 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%) 3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%) 116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%) 6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%) ``` New: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs): 400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% ) 12 context-switches # 29.429 M/sec ( +- 25.95% ) 0 cpu-migrations # 0.100 M/sec ( +-100.00% ) 4714 page-faults # 11796.496 M/sec ( +- 0.55% ) 1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%) 199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%) 402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%) 1847783963 instructions # 1.15 insn per cycle # 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%) 407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%) 10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%) 0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% ) lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs): 6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% ) 217 context-switches # 31.236 M/sec ( +- 36.16% ) 1 cpu-migrations # 0.096 M/sec ( +- 50.00% ) 13258 page-faults # 1908.389 M/sec ( +- 0.34% ) 27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%) 1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%) 16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%) 17755545931 instructions # 0.64 insn per cycle # 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%) 3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%) 117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%) 6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% ) ``` I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps. Within noise i'd say. Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58476 llvm-svn: 354767
* [llvm-exegesis] Opcode stabilization / reclusterization (PR40715)Roman Lebedev2019-02-201-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then, to facilitate further analysis we group the benchmarks with *similar* characteristics into clusters. Now, this is all not entirely deterministic. Some instructions have variable characteristics, depending on their arguments. And thus, if we do several benchmarks of the same instruction `Opcode`, we may end up with *different* performance characteristics measurements. And when we then do clustering, these several benchmarks of the same instruction `Opcode` may end up being clustered into *different* clusters. This is not great for further analysis. We shall find every `Opcode` with benchmarks not in just one cluster, and move *all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`. I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit, and introducing `-analysis-display-unstable-clusters` switch to toggle between displaying stable-only clusters and unstable-only clusters. The reclusterization is deterministically stable, produces identical reports between runs. (Or at least that is what i'm seeing, maybe it isn't) Timings/comparisons: old (current trunk/head) {F8303582} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% ) 172 context-switches # 25.965 M/sec ( +- 29.89% ) 0 cpu-migrations # 0.042 M/sec ( +- 56.54% ) 31073 page-faults # 4690.754 M/sec ( +- 0.08% ) 26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%) 2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%) 13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%) 19770706799 instructions # 0.74 insn per cycle # 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%) 4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%) 121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%) 6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% ) ``` patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs): 6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 213 context-switches # 32.952 M/sec ( +- 23.81% ) 1 cpu-migrations # 0.130 M/sec ( +- 43.84% ) 31287 page-faults # 4832.057 M/sec ( +- 0.08% ) 25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%) 1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%) 13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%) 19752995402 instructions # 0.76 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%) 4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%) 121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%) 6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% ) ``` Funnily, *this* measurement shows that said reclustering actually improved performance. patch, with reclustering, only the stable clusters {F8303594} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs): 6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% ) 133 context-switches # 20.792 M/sec ( +- 23.39% ) 0 cpu-migrations # 0.063 M/sec ( +- 61.24% ) 31318 page-faults # 4903.256 M/sec ( +- 0.08% ) 25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%) 1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%) 13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%) 19767554347 instructions # 0.77 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%) 4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%) 118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%) 6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% ) ``` Performance improved even further?! Makes sense i guess, less clusters to print. patch, with reclustering, only the unstable clusters {F8303601} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs): 6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% ) 194 context-switches # 31.709 M/sec ( +- 20.46% ) 0 cpu-migrations # 0.039 M/sec ( +- 49.77% ) 31413 page-faults # 5129.261 M/sec ( +- 0.06% ) 24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%) 1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%) 13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%) 18260877653 instructions # 0.74 insn per cycle # 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%) 4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%) 114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%) 6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% ) ``` This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps) (Also, wow this is fast, it used to take several minutes originally) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D58355 llvm-svn: 354441
* [llvm-exegesis] Throughput support in analysis modeRoman Lebedev2019-02-041-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: D57000 / [[ https://bugs.llvm.org/show_bug.cgi?id=37698 | PR37698 ]] added support for measuring of the inverse throughput. But the support for the analysis was not added. This attempts to fix that. (analysis done o bdver2 / piledriver) First, small-scale experiment: ``` $ ./bin/llvm-exegesis -num-repetitions=10000 -mode=inverse_throughput -opcode-name=BSF64rr Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-d0acdd.o --- mode: inverse_throughput key: instructions: - 'BSF64rr RAX RDX' config: '' register_initial_values: - 'RDX=0x0' cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: inverse_throughput, value: 3.0278, per_snippet_value: 3.0278 } error: '' info: instruction has no tied variables picking Uses different from defs assembled_snippet: 48BA0000000000000000480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2C3 ... ``` If we plug `bsfq %r12, %r10` into llvm-mca: https://godbolt.org/z/ZtOyhJ ``` Dispatch Width: 4 uOps Per Cycle: 3.00 IPC: 0.50 Block RThroughput: 2.0 ``` So RThroughput mismatch exists. Now, let's upscale and analyse: {F8207148} `$ ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html`: {F8207172} {F8207197} And if we now look at https://www.agner.org/optimize/instruction_tables.pdf, `Reciprocal throughput` for `BSF r,r` is listed as `3`. Yay? Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57647 llvm-svn: 353023
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for ↵Roman Lebedev2018-11-191-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | double each time. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54382) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 9024.354355 task-clock (msec) # 1.000 CPUs utilized ( +- 0.18% ) ... 9.0262 +- 0.0161 seconds time elapsed ( +- 0.18% ) ``` New time: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 8996.541057 task-clock (msec) # 0.999 CPUs utilized ( +- 0.19% ) ... 9.0045 +- 0.0172 seconds time elapsed ( +- 0.19% ) ``` -~0.3%, not that much. But this isn't the important part. Old: * calls to allocation functions: 2109712 * temporary allocations: 33112 * bytes allocated in total (ignoring deallocations): 4.43 GB New: * calls to allocation functions: 2095345 (-0.68%) * temporary allocations: 18745 (-43.39% !!!) * bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54383 llvm-svn: 347199
* [llvm-exegesis] Analysis::writeSnippet(): be smarter about memory allocations.Roman Lebedev2018-11-191-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.487s user 0m9.745s sys 0m0.740s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m9.599s user 0m8.824s sys 0m0.772s ``` Not that much, around -9%. But that is not the good part yet, again. Old: * calls to allocation functions: 3347676 * temporary allocations: 277818 * bytes allocated in total (ignoring deallocations): 10.52 GB New: * calls to allocation functions: 2109712 (-36%) * temporary allocations: 33112 (-88%) * bytes allocated in total (ignoring deallocations): 4.43 GB (-58% *sic*) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54382 llvm-svn: 347198
* [llvm-exegesis] Move namespace exegesis inside llvm::Fangrui Song2018-10-221-0/+2
| | | | | | | | | | | | | | | | Summary: This allows simplifying references of llvm::foo with foo when the needs come in the future. Reviewers: courbet, gchatelet Reviewed By: gchatelet Subscribers: javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53455 llvm-svn: 344922
* Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFCFangrui Song2018-10-191-5/+5
| | | | llvm-svn: 344774
* [llvm-exegesis][NFC] Revert rL343682 "Fix unused variable warning".Clement Courbet2018-10-031-1/+1
| | | | | | That was not the proper fix: the variable is used in debug mode. llvm-svn: 343685
* [llvm-exegesis] Fix rL343680 in release mode.Clement Courbet2018-10-031-2/+2
| | | | llvm-svn: 343684
* [llvm-exegesis][NFC] Fix unused variable warning.Clement Courbet2018-10-031-1/+0
| | | | llvm-svn: 343682
* [llvm-exegesis] Resolve variant classes in analysis.Clement Courbet2018-10-031-52/+82
| | | | | | | | | | | | Summary: See PR38884. Reviewers: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D52825 llvm-svn: 343680
* [llvm-exegesis] Fix PR39096.Clement Courbet2018-09-271-16/+37
| | | | | | | | | | | | Summary: The key is now the resource name, not the resource id. Reviewers: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D52607 llvm-svn: 343208
* llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)Fangrui Song2018-09-271-9/+7
| | | | | | | | | | | | Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163
* [llvm-exegesis] Get rid of debug_string.Clement Courbet2018-09-261-4/+1
| | | | | | | | | | | | | | | | Summary: THis is a backwards-compatible change (existing files will work as expected). See PR39082. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52546 llvm-svn: 343108
* [llvm-exegesis] Output the unscaled value as well as the scaled one.Clement Courbet2018-09-261-7/+7
| | | | | | | | | | | | Summary: See PR38936 for context. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52500 llvm-svn: 343081
* [llvm-exegesis] Print the whole snippet in analysis.Clement Courbet2018-06-151-34/+63
| | | | | | | | | | | | | | | | | Summary: On hover, the whole asm snippet is displayed, including operands. This requires the actual assembly output instead of just the MCInsts: This is because some pseudo-instructions get lowered to actual target instructions during codegen (e.g. ABS_Fp32 -> SSE or X87). Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D48164 llvm-svn: 334805
* [llvm-exegesis] Use BenchmarkResult::Instructions instead of OpcodeNameClement Courbet2018-06-141-26/+79
| | | | | | | | | | | | | | | | | | Summary: Get rid of OpcodeName. To remove the opcode name from an old file: ``` cat old_file | sed '/opcode_name.*/d' ``` Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D48121 llvm-svn: 334691
* [llvm-exegesis] move Mode from Key to BenchmarResult.Clement Courbet2018-06-061-4/+3
| | | | | | | | | | | | | | | | | | Moves the Mode field out of the Key. The existing yaml benchmark results can be fixed with the following script: ``` readonly FILE=$1 readonly MODE=latency # Change to uops to fix a uops benchmark. cat $FILE | \ sed "/^\ \+mode:\ \+$MODE$/d" | \ sed "/^cpu_name.*$/i mode: $MODE" ``` Differential Revision: https://reviews.llvm.org/D47813 Authored by: Guillaume Chatelet llvm-svn: 334079
* [llvm-exegesis][NFC] Use an enum instead of a string for benchmark mode.Clement Courbet2018-06-041-5/+6
| | | | | | | | | | | | Summary: YAML encoding is backwards-compatible. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D47705 llvm-svn: 333886
* [llvm-exegesis] Analysis: Show inconsistencies between checked-in and ↵Clement Courbet2018-06-041-60/+153
| | | | | | | | | | | | | | | | | | measured data. Summary: We now highlight any sched classes whose measurements do not match the LLVM SchedModel. "bad" clusters are marked in red. Screenshot in phabricator diff. Reviewers: gchatelet Subscribers: tschuett, mgrang, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D47639 llvm-svn: 333884
* [llvm-exegesis] Analysis: Display idealized sched class port pressure.Clement Courbet2018-06-011-10/+139
| | | | | | | | | | | | Summary: Screenshot in phabricator diff. Reviewers: gchatelet Subscribers: mgorny, tschuett, mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D47329 llvm-svn: 333753
* [llvm-exegesis] Analysis: Show value extents.Clement Courbet2018-05-241-9/+23
| | | | | | | | | | | | Summary: Screenshot attached in phabricator. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D47318 llvm-svn: 333181
* [llvm-exegesis] Analysis: show debug string instead of raw key if provided.Clement Courbet2018-05-241-1/+4
| | | | | | | | | | Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D47315 llvm-svn: 333175
* [llvm-exegesis] Show sched class details in analysis.Clement Courbet2018-05-241-14/+138
| | | | | | | | | | | | Summary: And update docs. Reviewers: gchatelet Subscribers: tschuett, craig.topper, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D47254 llvm-svn: 333169
* [llvm-exegesis] Analysis output uses HTML.Clement Courbet2018-05-221-35/+170
| | | | | | | | | | | | Summary: This makes the report much more readable. Reviewers: gchatelet Subscribers: tschuett, mgrang, craig.topper, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D47189 llvm-svn: 332979
* [llvm-exegesis] Remove redudant explicit template instantiations.Clement Courbet2018-05-171-5/+0
| | | | llvm-svn: 332611
* [llvm-exegesis] Write out inconsistencies to a file.Clement Courbet2018-05-171-3/+11
| | | | | | | | | | Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D47013 llvm-svn: 332608
* [llvm-exegesis] Analysis: detect clustering inconsistencies.Clement Courbet2018-05-171-16/+79
| | | | | | | | | | | | | | | | | Summary: Warn on instructions that should have the same performance characteristics according to the sched model but actually differ in their benchmarks. Next step: Make the display nicer to browse, I was thinking maybe html. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D46945 llvm-svn: 332601
* [llvm-exegesis] Fix unused variable warning in release mode.Clement Courbet2018-05-161-1/+1
| | | | llvm-svn: 332455
* Fix unused variable warning in r332437.Clement Courbet2018-05-161-2/+2
| | | | llvm-svn: 332441
* [llvm-exegesis] Analysis: Display sched class for instructions.Clement Courbet2018-05-161-20/+39
| | | | | | | | | | Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D46883 llvm-svn: 332437
* [llvm-exegesis] Split AsmTemplate.Name into components.Clement Courbet2018-05-151-9/+13
| | | | | | | | | | | | | | Summary: AsmTemplate becomes IntructionBenchmarkKey, which has three components. This allows retreiving the opcode for analysis. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D46873 llvm-svn: 332348
* [llvm-exegesis] Add an analysis mode.Clement Courbet2018-05-151-0/+84
| | | | | | | | | | | | | | | | Summary: The analysis mode gives the user a clustered view of the measurement results. Next steps are (requires the split ok AsmTemplate.Name into {mnemonic, mode}): - Show the sched class. - Highlight any inconsistencies with the checked-in data. Reviewers: gchatelet Subscribers: mgorny, llvm-commits, tschuett Differential Revision: https://reviews.llvm.org/D46865 llvm-svn: 332344
* [llvm-exegesis] Revert accidentally commited code.Clement Courbet2018-05-141-48/+0
| | | | llvm-svn: 332231
* [llvm-exegesis] Fix a warning in r332221Clement Courbet2018-05-141-16/+9
| | | | | | | | | comparison of integers of different signs: 'const unsigned long' and 'const int' [-Werror,-Wsign-compare] unittests/tools/llvm-exegesis/BenchmarkResultTest.cpp:60:5: note: in instantiation of function template specialization 'testing::internal::EqHelper<false>::Compare<unsigned long, int>' requested here ASSERT_EQ(FromDiskVector.size(), 1); llvm-svn: 332230
* [llvm-exegesis] Add an analysis mode.Clement Courbet2018-05-141-0/+55
The analysis mode gives the user a clustered view of the measurement results and highlights any inconsistencies with the checked-in data. llvm-svn: 332229
OpenPOWER on IntegriCloud