| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new flag named -bottleneck-analysis to print out information
about throughput bottlenecks.
MCA knows how to identify and classify dynamic dispatch stalls. However, it
doesn't know how to analyze and highlight kernel bottlenecks. The goal of this
patch is to teach MCA how to correlate increases in backend pressure to backend
stalls (and therefore, the loss of throughput).
From a Scheduler point of view, backend pressure is a function of the scheduler
buffer usage (i.e. how the number of uOps in the scheduler buffers changes over
time). Backend pressure increases (or decreases) when there is a mismatch
between the number of opcodes dispatched, and the number of opcodes issued in
the same cycle. Since buffer resources are limited, continuous increases in
backend pressure would eventually leads to dispatch stalls. So, there is a
strong correlation between dispatch stalls, and how backpressure changed over
time.
This patch teaches how to identify situations where backend pressure increases
due to:
- unavailable pipeline resources.
- data dependencies.
Data dependencies may delay execution of instructions and therefore increase the
time that uOps have to spend in the scheduler buffers. That often translates to
an increase in backend pressure which may eventually lead to a bottleneck.
Contention on pipeline resources may also delay execution of instructions, and
lead to a temporary increase in backend pressure.
Internally, the Scheduler classifies instructions based on whether register /
memory operands are available or not.
An instruction is marked as "ready to execute" only if data dependencies are
fully resolved.
Every cycle, the Scheduler attempts to execute all instructions that are ready
to execute. If an instruction cannot execute because of unavailable pipeline
resources, then the Scheduler internally updates a BusyResourceUnits mask with
the ID of each unavailable resource.
ExecuteStage is responsible for tracking changes in backend pressure. If backend
pressure increases during a cycle because of contention on pipeline resources,
then ExecuteStage sends a "backend pressure" event to the listeners.
That event would contain information about instructions delayed by resource
pressure, as well as the BusyResourceUnits mask.
Note that ExecuteStage also knows how to identify situations where backpressure
increased because of delays introduced by data dependencies.
The SummaryView observes "backend pressure" events and prints out a "bottleneck
report".
Example of bottleneck report:
```
Cycles with backend pressure increase [ 99.89% ]
Throughput Bottlenecks:
Resource Pressure [ 0.00% ]
Data Dependencies: [ 99.89% ]
- Register Dependencies [ 0.00% ]
- Memory Dependencies [ 99.89% ]
```
A bottleneck report is printed out only if increases in backend pressure
eventually caused backend stalls.
About the time complexity:
Time complexity is linear in the number of instructions in the
Scheduler::PendingSet.
The average slowdown tends to be in the range of ~5-6%.
For memory intensive kernels, the slowdown can be significant if flag
-noalias=false is specified. In the worst case scenario I have observed a
slowdown of ~30% when flag -noalias=false was specified.
We can definitely recover part of that slowdown if we optimize class LSUnit (by
doing extra bookkeeping to speedup queries). For now, this new analysis is
disabled by default, and it can be enabled via flag -bottleneck-analysis. Users
of MCA as a library can enable the generation of pressure events through the
constructor of ExecuteStage.
This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494
Differential Revision: https://reviews.llvm.org/D58728
llvm-svn: 355308
|
|
|
|
|
|
|
|
|
|
|
| |
Part 2 of CSPGO changes (mostly related to ProfileSummary).
Note that I use a default parameter in setProfileSummary() and getSummary().
This is to break the dependency in clang. I will make the parameter explicit
after changing clang in a separated patch.
Differential Revision: https://reviews.llvm.org/D54175
llvm-svn: 355131
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This eps param is used for two distinct things:
* initial point clusterization
* checking clusters against the llvm values
What if one wants to only look at highly different clusters, without changing
the clustering itself? In particular, this helps to weed out noisy measurements
(since the clusterization epsilon is still small, so there is a better chance
that noisy measurements from the same opcode will go into different clusters)
By splitting it into two params it is now possible.
This is nearly-free performance-wise:
Old:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% )
12 context-switches # 31.735 M/sec ( +- 27.38% )
0 cpu-migrations # 0.000 K/sec
4745 page-faults # 12183.732 M/sec ( +- 0.54% )
1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%)
185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%)
392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%)
1839236666 instructions # 1.18 insn per cycle
# 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%)
407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%)
10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%)
0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):
6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% )
262 context-switches # 38.546 M/sec ( +- 23.06% )
0 cpu-migrations # 0.065 M/sec ( +- 76.03% )
13287 page-faults # 1953.206 M/sec ( +- 0.32% )
27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%)
1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%)
16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%)
17611143370 instructions # 0.65 insn per cycle
# 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%)
3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%)
116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%)
6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%)
```
New:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):
400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% )
12 context-switches # 29.429 M/sec ( +- 25.95% )
0 cpu-migrations # 0.100 M/sec ( +-100.00% )
4714 page-faults # 11796.496 M/sec ( +- 0.55% )
1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%)
199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%)
402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%)
1847783963 instructions # 1.15 insn per cycle
# 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%)
407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%)
10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%)
0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% )
lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):
6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% )
217 context-switches # 31.236 M/sec ( +- 36.16% )
1 cpu-migrations # 0.096 M/sec ( +- 50.00% )
13258 page-faults # 1908.389 M/sec ( +- 0.34% )
27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%)
1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%)
16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%)
17755545931 instructions # 0.64 insn per cycle
# 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%)
3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%)
117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%)
6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% )
```
I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
Within noise i'd say.
Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58476
llvm-svn: 354767
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Given an instruction `Opcode`, we can make benchmarks (measurements) of the
instruction characteristics/performance. Then, to facilitate further analysis
we group the benchmarks with *similar* characteristics into clusters.
Now, this is all not entirely deterministic. Some instructions have variable
characteristics, depending on their arguments. And thus, if we do several
benchmarks of the same instruction `Opcode`, we may end up with *different*
performance characteristics measurements. And when we then do clustering,
these several benchmarks of the same instruction `Opcode` may end up being
clustered into *different* clusters. This is not great for further analysis.
We shall find every `Opcode` with benchmarks not in just one cluster, and move
*all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.
I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit,
and introducing `-analysis-display-unstable-clusters` switch to toggle between
displaying stable-only clusters and unstable-only clusters.
The reclusterization is deterministically stable, produces identical reports
between runs. (Or at least that is what i'm seeing, maybe it isn't)
Timings/comparisons:
old (current trunk/head) {F8303582}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% )
172 context-switches # 25.965 M/sec ( +- 29.89% )
0 cpu-migrations # 0.042 M/sec ( +- 56.54% )
31073 page-faults # 4690.754 M/sec ( +- 0.08% )
26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%)
2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%)
13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%)
19770706799 instructions # 0.74 insn per cycle
# 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%)
4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%)
121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%)
6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% )
```
patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):
6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% )
213 context-switches # 32.952 M/sec ( +- 23.81% )
1 cpu-migrations # 0.130 M/sec ( +- 43.84% )
31287 page-faults # 4832.057 M/sec ( +- 0.08% )
25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%)
1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%)
13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%)
19752995402 instructions # 0.76 insn per cycle
# 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%)
4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%)
121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%)
6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% )
```
Funnily, *this* measurement shows that said reclustering actually improved performance.
patch, with reclustering, only the stable clusters {F8303594}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):
6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% )
133 context-switches # 20.792 M/sec ( +- 23.39% )
0 cpu-migrations # 0.063 M/sec ( +- 61.24% )
31318 page-faults # 4903.256 M/sec ( +- 0.08% )
25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%)
1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%)
13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%)
19767554347 instructions # 0.77 insn per cycle
# 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%)
4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%)
118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%)
6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% )
```
Performance improved even further?! Makes sense i guess, less clusters to print.
patch, with reclustering, only the unstable clusters {F8303601}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):
6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% )
194 context-switches # 31.709 M/sec ( +- 20.46% )
0 cpu-migrations # 0.039 M/sec ( +- 49.77% )
31413 page-faults # 5129.261 M/sec ( +- 0.06% )
24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%)
1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%)
13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%)
18260877653 instructions # 0.74 insn per cycle
# 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%)
4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%)
114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%)
6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% )
```
This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
(Also, wow this is fast, it used to take several minutes originally)
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58355
llvm-svn: 354441
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch adds support for --hash-filenames to llvm-cov. This option adds md5
hash of the source path to the name of the generated .gcov file. The option is
crucial for cases where you have multiple files with the same name but can't
use --preserve-paths as resulting filenames exceed the limit.
from gcov(1):
```
-x
--hash-filenames
By default, gcov uses the full pathname of the source files to to
create an output filename. This can lead to long filenames that
can overflow filesystem limits. This option creates names of the
form source-file##md5.gcov, where the source-file component is
the final filename part and the md5 component is calculated from
the full mangled name that would have been used otherwise.
```
Patch by Igor Ignatev!
Differential Revision: https://reviews.llvm.org/D58370
llvm-svn: 354379
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: courbet, gchatelet
Reviewed By: courbet, gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54895
llvm-svn: 354250
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Up until the point i have looked in the source, i didn't even understood that
i can disable 'cluster' output. I have always silenced it via ` &> /dev/null`.
(And hoped it wasn't contributing much of the run time.)
While i expect that it has it's use-cases i never once needed it so far.
If i forget to silence it, console is completely flooded with that output.
How about not expecting users to opt-out of analyses,
but to explicitly specify the analyses that should be performed?
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57648
llvm-svn: 353021
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This just uses the latency benchmark runner on the parallel uops snippet
generator.
Fixes PR37698.
Reviewers: gchatelet
Subscribers: tschuett, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D57000
llvm-svn: 352632
|
|
|
|
|
|
| |
The address isn't dynamically relocated. The object is.
llvm-svn: 352477
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a stack trace or similar has a list of addresses from an executable
or DSO loaded at a variable address (e.g. due to ASLR), the addresses
will not directly correspond to the addresses stored in the object file.
If a user wishes to use llvm-symbolizer, they have to subtract the load
address from every address. This is somewhat inconvenient, especially as
the output of --print-address will result in the adjusted address being
listed, rather than the address coming from the stack trace, making it
harder to map results between the two.
This change adds a new switch to llvm-symbolizer --adjust-vma which
takes an offset, which is then used to automatically do this
calculation. The printed address remains the input address (allowing for
easy mapping), whilst the specified offset is applied to the addresses
when performing the lookup.
The switch is conceptually similar to llvm-objdump's new switch of the
same name (see D57051), which in turn mirrors a GNU switch. There is no
equivalent switch in addr2line.
Reviewed by: grimar
Differential Revision: https://reviews.llvm.org/D57151
llvm-svn: 352195
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds two options, -i and -inlines as aliases for the -inlining option to llvm-symbolizer to improve compatibility with the GNU addr2line utility which accepts these options.
It also modifies existing tests that use -inlining to exercise these new aliases as well.
This fixes PR40073.
Reviewed by: jhenderson, Quolyk, ruiu
Differential Revision: https://reviews.llvm.org/D57083
llvm-svn: 351999
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes https://bugs.llvm.org/show_bug.cgi?id=40072.
GNU addr2line's --functions switch is off by default, has a short alias
of -f, and does not take an argument. This patch changes llvm-symbolizer
to allow the second and third point (changing the default behaviour may
have negative impacts on users). If the option is missing a value, it
now treats it as "linkage".
This change does cause one previously valid command-line to behave
differently. Before --functions <value> was accepted, but now only
--functions=<value> is allowed (as well as --functions). The old
behaviour will result in the value being treated as a positional
argument.
The previous testing for --functions=short has been pulled out into a
new test that also tests the other accepted values and option formats.
Reviewed by: ruiu
Differential Revision: https://reviews.llvm.org/D57049
llvm-svn: 351968
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old diagnostic form of the trace produced by -v and -vv looks
like:
```
check1:1:8: remark: CHECK: expected string found in input
CHECK: abc
^
<stdin>:1:3: note: found here
; abc def
^~~
```
When dumping annotated input is requested (via -dump-input), I find
that this old trace is not useful and is sometimes harmful:
1. The old trace is mostly redundant because the same basic
information also appears in the input dump's annotations.
2. The old trace buries any error diagnostic between it and the input
dump, but I find it useful to see any error diagnostic up front.
3. FILECHECK_OPTS=-dump-input=fail requests annotated input dumps only
for failed FileCheck calls. However, I have to also add -v or -vv
to get a full set of annotations, and that can produce massive
output from all FileCheck calls in all tests. That's a real
problem when I run this in the IDE I use, which grinds to a halt as
it tries to capture all that output.
When -dump-input=fail|always, this patch suppresses the old trace from
-v or -vv. Error diagnostics still print as usual. If you want the
old trace, perhaps to see variable expansions, you can set
-dump-input=none (the default).
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D55825
llvm-svn: 351881
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes https://bugs.llvm.org/show_bug.cgi?id=40068.
--basenames is a GNU addr2line switch which strips the directory names
from the file path in the output.
Reviewed by: ruiu
Differential Revision: https://reviews.llvm.org/D56919
llvm-svn: 351795
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Provides -no-demangle as alias for -demangle=false. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40075
Reviewers: jhenderson, ruiu
Reviewed By: jhenderson
Subscribers: erik.pilkington, rupprecht, llvm-commits
Differential Revision: https://reviews.llvm.org/D56773
llvm-svn: 351735
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds demangling support to the ELF side of llvm-readobj,
under the switch --demangle/-C.
The following places are demangled: symbol table dumps (static and
dynamic), relocation dumps (static and dynamic), addrsig dumps, call
graph profile dumps, and group section signature symbols.
Although GNU readelf doesn't support demangling, it is still a useful
feature to have, and brings it on a par with llvm-objdump's
capabilities.
This fixes https://bugs.llvm.org/show_bug.cgi?id=40054.
Reviewed by: grimar, rupprecht
Differential Revision: https://reviews.llvm.org/D56791
llvm-svn: 351450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Provides -C as alias to -demangle. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40069.
Reviewers: jhenderson, ruiu, rnk, fjricci
Reviewed By: jhenderson, ruiu
Subscribers: rupprecht, erik.pilkington, llvm-commits
Differential Revision: https://reviews.llvm.org/D56591
llvm-svn: 351300
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When running llvm-objdump with the -macho option objdump will by default
disassemble only the __TEXT,__text section (or __TEXT_EXEC,__text when
disassembling MH_KEXT_BUNDLE files). The -disassemble-all option is
treated no diferently than -disassemble.
This change upates llvm-objdump's MachO parsing code to disassemble all
__text sections found in a file when -disassemble-all is specified. This
is useful for disassembling files with more than one __text section, or
when disassembling files whose __text section is not present in __TEXT.
I added a lit test case that verifies "llvm-objdump -m -d" and
"llvm-objdump -m -D" produce the expected results on a reference binary.
I also updated the CommandGuide documentation for llvm-objdump.rst and
verified it renders correctly as man and html.
rdar://42899338
Reviewers: ab, pete, lhames
Reviewed By: lhames
Subscribers: rupprecht, llvm-commits
Differential Revision: https://reviews.llvm.org/D56649
llvm-svn: 351238
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"
Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.
"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".
tests are mostly updated with
// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"
// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"
Patch by Yuanfang Chen (tabloid.adroit)!
Differential Revision: https://reviews.llvm.org/D56351
llvm-svn: 351049
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Provides -addresses, -a as aliases for -print-address. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40067.
Reviewers: jhenderson, ruiu, rnk, fjricci
Reviewed By: jhenderson
Subscribers: rupprecht, llvm-commits
Differential Revision: https://reviews.llvm.org/D56635
llvm-svn: 351043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Provides -exe, -e as aliases to -obj. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40071
Reviewers: ruiu, rnk, fjricci, jhenderson
Reviewed By: jhenderson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56580
llvm-svn: 350925
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Provides -p as a short alias for -pretty-print. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40076
Reviewers: samsonov, khemant, ruiu, rnk, fjricci, jhenderson
Reviewed By: jhenderson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56542
llvm-svn: 350832
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch improves llvm-profdata show command:
(1) add -value-cutoff=<N> option: Show only those functions whose max count
values are greater or equal to N.
(2) add -list-below-cutoff option: Only output names of functions whose max
count value are below the cutoff.
(3) formats value-profile counts and prints out percentage.
Differential Revision: https://reviews.llvm.org/D56342
llvm-svn: 350673
|
|
|
|
|
|
| |
Will re-commit it using the correct commit message.
llvm-svn: 350670
|
|
|
|
|
|
|
|
|
|
| |
In LTO or Thin-lto mode (though linker plugin), the module
names are of temp file names which are different for
different compilations. Using SourceFileName avoids the issue.
This should not change any functionality for current PGO as
all the current callers of getPGOFuncName() is before LTO.
llvm-svn: 350579
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend FileCheck to dump its input annotated with FileCheck's
diagnostics: errors, good matches if -v, and additional information if
-vv. The goal is to make it easier to visualize FileCheck's matching
behavior when debugging.
Each patch in this series implements input annotations for a
particular category of FileCheck diagnostics. While the first few
patches alone are somewhat useful, the annotations become much more
useful as later patches implement annotations for -v and -vv
diagnostics, which show the matching behavior leading up to the error.
This first patch implements boilerplate plus input annotations for
error diagnostics reporting that no matches were found for a
directive. These annotations mark the search ranges of the failed
directives. Instead of using the usual `^~~`, which is used by later
patches for good matches, these annotations use `X~~` so that this
category of errors is visually distinct.
For example:
```
$ FileCheck -dump-input=help
The following description was requested by -dump-input=help to
explain the input annotations printed by -dump-input=always and
-dump-input=fail:
- L: labels line number L of the input file
- T:L labels the match result for a pattern of type T from line L of
the check file
- X~~ marks search range when no match is found
- colors error
If you are not seeing color above or in input dumps, try: -color
$ FileCheck -v -dump-input=always check1 < input1 |& sed -n '/^Input file/,$p'
Input file: <stdin>
Check file: check1
-dump-input=help describes the format of the following dump.
Full input was:
<<<<<<
1: ; abc def
2: ; ghI jkl
next:3 X~~~~~~~~ error: no match found
>>>>>>
$ cat check1
CHECK: abc
CHECK-SAME: def
CHECK-NEXT: ghi
CHECK-SAME: jkl
$ cat input1
; abc def
; ghI jkl
```
Some additional details related to the boilerplate:
* Enabling: The annotated input dump is enabled by `-dump-input`,
which can also be set via the `FILECHECK_OPTS` environment variable.
Accepted values are `help`, `always`, `fail`, or `never`. As shown
above, `help` describes the format of the dump. `always` is helpful
when you want to investigate a successful FileCheck run, perhaps for
an unexpected pass. `-dump-input-on-failure` and
`FILECHECK_DUMP_INPUT_ON_FAILURE` remain as a deprecated alias for
`-dump-input=fail`.
* Diagnostics: The usual diagnostics are not suppressed in this mode
and are printed first. For brevity in the example above, I've
omitted them using a sed command. Sometimes they're perfectly
sufficient, and then they make debugging quicker than if you were
forced to hunt through a dump of long input looking for the error.
If you think they'll get in the way sometimes, keep in mind that
it's pretty easy to grep for the start of the input dump, which is
`<<<`.
* Colored Annotations: The annotated input is colored if colors are
enabled (enabling colors can be forced using -color). For example,
errors are red. However, as in the above example, colors are not
vital to reading the annotations.
I don't know how to test color in the output, so any hints here would
be appreciated.
Reviewed By: george.karpenkov, zturner, probinson
Differential Revision: https://reviews.llvm.org/D52999
llvm-svn: 349418
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RetireControlUnitStatistics now reports extra information about the ROB and the
avg/maximum number of entries consumed over the entire simulation.
Example:
Retire Control Unit - number of cycles where we saw N instructions retired:
[# retired], [# cycles]
0, 109 (17.9%)
1, 102 (16.7%)
2, 399 (65.4%)
Total ROB Entries: 64
Max Used ROB Entries: 35 ( 54.7% )
Average Used ROB Entries per cy: 32 ( 50.0% )
Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to
reflect this change.
llvm-svn: 347493
|
|
|
|
| |
llvm-svn: 346740
|
|
|
|
| |
llvm-svn: 346725
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases it is desirable to match the same pattern repeatedly
many times. Currently the only way to do it is to copy the same
check pattern as many times as needed. And that gets pretty unwieldy
when its more than count is big.
Introducing CHECK-COUNT-<num> directive which acts like a plain CHECK
directive yet matches the same pattern exactly <num> times.
Extended FileCheckType to a struct to add Count there.
Changed some parsing routines to handle non-fixed length of directive
(all currently existing directives were fixed-length).
The code is generic enough to allow future support for COUNT in more
than just PlainCheck directives.
See motivating example for this feature in reviews.llvm.org/D54223.
Reviewed By: chandlerc, dblaikie
Differential Revision: https://reviews.llvm.org/D54336
llvm-svn: 346722
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
lcov tracefiles are used by various coverage reporting tools and build
systems (e.g., Bazel). It is a simple text-based format to parse and
more convenient to use than the JSON export format, which needs
additional processing to map regions/segments back to line numbers.
It's a little unfortunate that "text" format is now overloaded to refer
specifically to JSON for export, but I wanted to avoid making any
breaking changes to the UI of the llvm-cov tool at this time.
Patch by Tony Allevato (@allevato).
Reviewers: Dor1s, vsk
Reviewed By: Dor1s, vsk
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D54266
llvm-svn: 346506
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This feature makes it easy to tune FileCheck diagnostic output when
running the test suite via ninja, a bot, or an IDE. For example:
```
$ FILECHECK_OPTS='-color -v -dump-input-on-failure' \
LIT_FILTER='OpenMP/for_codegen.cpp' ninja check-clang \
| less -R
```
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D53517
llvm-svn: 346272
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The pfm counters are now in the ExegesisTarget rather than the
MCSchedModel (PR39165).
This also compresses the pfm counter tables (PR37068).
Reviewers: RKSimon, gchatelet
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D52932
llvm-svn: 345243
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(Relands r344930, reverted in r344935, and now hopefully fixed for
Windows.)
While this change specifically targets FileCheck, it affects any tool
using the same SourceMgr facilities.
Previously, -color was documented in FileCheck's -help output, but
-color had no effect. Now, -color obeys its documentation: it forces
colors to be used in FileCheck diagnostics even when stderr is not a
terminal.
-color is especially helpful when combined with FileCheck's -v, which
can produce a long series of diagnostics that you might wish to pipe
to a pager, such as less -R. The WithColor extensions here will also
help to clean up color usage in FileCheck's annotated dump of input,
which is proposed in D52999.
Reviewed By: JDevlieghere, zturner
Differential Revision: https://reviews.llvm.org/D53419
llvm-svn: 345202
|
|
|
|
|
|
| |
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/739
llvm-svn: 344935
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While this change specifically targets FileCheck, it affects any tool
using the same SourceMgr facilities.
Previously, -color was documented in FileCheck's -help output, but
-color had no effect. Now, -color obeys its documentation: it forces
colors to be used in FileCheck diagnostics even when stderr is not a
terminal.
-color is especially helpful when combined with FileCheck's -v, which
can produce a long series of diagnostics that you might wish to pipe
to a pager, such as less -R. The WithColor extensions here will also
help to clean up color usage in FileCheck's annotated dump of input,
which is proposed in D52999.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D53419
llvm-svn: 344930
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We try to recover gracefully on instructions that would crash the
program.
This includes some refactoring of runMeasurement() implementations.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53371
llvm-svn: 344695
|
|
|
|
| |
llvm-svn: 343215
|
|
|
|
|
|
| |
llvm-exegesis.rst was using invalid indentation for bullet points.
llvm-svn: 342948
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is a step towards fixing PR38048.
Note that right now the measurements are given per instruction. We'll
need to give measurements a per code snippet and update the analysis (PR38731).
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D52041
llvm-svn: 342947
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
add a tool to generate symbol remapping files.
Summary:
The new tool llvm-cxxmap builds a symbol mapping table from a file containing
a description of partial equivalences to apply to mangled names and files
containing old and new symbol tables.
Reviewers: davidxl
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D51470
llvm-svn: 342168
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DispatchStatistics view.
This patch introduces the following changes to the DispatchStatistics view:
* DispatchStatistics now reports the number of dispatched opcodes instead of
the number of dispatched instructions.
* The "Dynamic Dispatch Stall Cycles" table now also reports the percentage of
stall cycles against the total simulated cycles.
This change allows users to easily compare dispatch group sizes with the
processor DispatchWidth.
Before this change, it was difficult to correlate the two numbers, since
DispatchStatistics view reported numbers of instructions (instead of opcodes).
DispatchWidth defines the maximum size of a dispatch group in terms of number of
micro opcodes.
The other change introduced by this patch is related to how DispatchStage
generates "instruction dispatch" events.
In particular:
* There can be multiple dispatch events associated with a same instruction
* Each dispatch event now encapsulates the number of dispatched micro opcodes.
The number of micro opcodes declared by an instruction may exceed the processor
DispatchWidth. Therefore, we cannot assume that instructions are always fully
dispatched in a single cycle.
DispatchStage knows already how to handle instructions declaring a number of
opcodes bigger that DispatchWidth. However, DispatchStage always emitted a
single instruction dispatch event (during the first simulated dispatch cycle)
for instructions dispatched.
With this patch, DispatchStage now correctly notifies multiple dispatch events
for instructions that cannot be dispatched in a single cycle.
A few views had to be modified. Views can no longer assume that there can only
be one dispatch event per instruction.
Tests (and docs) have been updated.
Differential Revision: https://reviews.llvm.org/D51430
llvm-svn: 341055
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generated by the SummaryView.
This patch adds two new fields to the perf report generated by the SummaryView.
Fields are now logically organized into two small groups; only the second group
contains throughput indicators.
Example:
```
Iterations: 100
Instructions: 300
Total Cycles: 414
Total uOps: 700
Dispatch Width: 4
uOps Per Cycle: 1.69
IPC: 0.72
Block RThroughput: 4.0
```
This patch also updates the docs for llvm-mca.
Due to the nature of this change, several tests in the tools/llvm-mca directory
were affected, and had to be updated using script `update_mca_test_checks.py`.
llvm-svn: 340946
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch, the SchedulerStatistics only printed the maximum number of
buffer entries consumed in each scheduler's queue at a given point of the
simulation.
This patch restructures the reported table, and adds an extra field named
"Average number of used buffer entries" to it.
This patch also uses different colors to help identifying bottlenecks caused by
high scheduler's buffer pressure.
llvm-svn: 340746
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D48842
llvm-svn: 340677
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the plumbing for the Tiny code model for the AArch64 backend. This,
instead of loading addresses through the normal ADRP;ADD pair used in the Small
model, uses a single ADR. The 21 bit range of an ADR means that the code and
its statically defined symbols need to be within 1MB of each other.
This makes it mostly interesting for embedded applications where we want to fit
as much as we can in as small a space as possible.
Differential Revision: https://reviews.llvm.org/D49673
llvm-svn: 340397
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add a CommandGuide for llvm-objdump summarizing its usage along with some
general context.
Reviewers: beanz
Reviewed By: beanz
Subscribers: Eugene.Zelenko, llvm-commits
Differential Revision: https://reviews.llvm.org/D50034
llvm-svn: 339250
|
|
|
|
|
|
| |
Hopefully fixes an issue on the docs build bot.
llvm-svn: 338980
|
|
|
|
|
|
|
| |
Also fixed a few undecorated 'llvm-mca' references to be highlighted
with the 'program' emphasis.
llvm-svn: 338900
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scheduler.
This patch is a follow-up to r338702.
We don't need to use a map to model the wait/ready/issued sets. It is much more
efficient to use a vector instead.
This patch gives us an average 7.5% speedup (on top of the ~12% speedup obtained
after r338702).
llvm-svn: 338883
|