bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[llvm-objdump] Add `Version Definitions` dumper	Xing GUO	2019-02-26	1	-3/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: `llvm-objdump` needs a `Version Definitions` dumper. Reviewers: grimar, jhenderson Reviewed By: grimar, jhenderson Subscribers: rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58615 llvm-svn: 354871
*	[llvm-objdump] Implement -Mreg-names-raw/-std options.	Igor Kudrin	2019-02-26	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The --disassembler-options, or -M, are used to customize the disassembler and affect its output. The two implemented options allow selecting register names on ARM: * With -Mreg-names-raw, the disassembler uses rNN for all registers. * With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15, which is the default behavior of llvm-objdump. Differential Revision: https://reviews.llvm.org/D57680 llvm-svn: 354870
*	[llvm-exegesis] Teach llvm-exegesis to handle instructions with multiple ↵	Clement Courbet	2019-02-26	1	-17/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	tied variables. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58285 llvm-svn: 354862
*	[llvm-objcopy] Add --set-start, --change-start and --adjust-start	Eugene Leviant	2019-02-26	5	-1/+48
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D58173 llvm-svn: 354854
*	Revert "Improve "llvm-nm -f sysv" output for Elf files"	Vlad Tsyrklevich	2019-02-26	2	-44/+14
\| \| \| \| \| \| \|	This reverts commit r354833, it was causing ASan test failures on sanitizer-x86_64-linux-fast. llvm-svn: 354849
*	Improve "llvm-nm -f sysv" output for Elf files	Sunil Srivastava	2019-02-26	2	-14/+44
\| \| \| \| \| \| \| \|	Specifically, compute and Print Type and Section columns. Differential Revision: https://reviews.llvm.org/D58263 llvm-svn: 354833
*	[llvm-objcopy] Add --add-symbol	Eugene Leviant	2019-02-25	6	-2/+106
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D58234 llvm-svn: 354787
*	[llvm-objdump] Add `Version References` dumper	Xing GUO	2019-02-25	3	-1/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add symbol version dumper for [#30241](https://bugs.llvm.org/show_bug.cgi?id=30241) Reviewers: jhenderson, MaskRay, kristina, emaste, grimar Reviewed By: jhenderson, grimar Subscribers: grimar, rupprecht, jakehehrlich, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54697 llvm-svn: 354782
*	[yaml2obj]Re-allow dynamic sections to have raw content	James Henderson	2019-02-25	1	-4/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recently, support was added to yaml2obj to allow dynamic sections to have a list of entries, to make it easier to write tests with dynamic sections. However, this change also removed the ability to provide custom contents to the dynamic section, making it hard to test malformed contents (e.g. because the section is not a valid size to contain an array of entries). This change reinstates this. An error is emitted if raw content and dynamic entries are both specified. Reviewed by: grimar, ruiu Differential Review: https://reviews.llvm.org/D58543 llvm-svn: 354770
*	[llvm-exegesis] Split Epsilon param into two (PR40787)	Roman Lebedev	2019-02-25	5	-19/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values What if one wants to only look at highly different clusters, without changing the clustering itself? In particular, this helps to weed out noisy measurements (since the clusterization epsilon is still small, so there is a better chance that noisy measurements from the same opcode will go into different clusters) By splitting it into two params it is now possible. This is nearly-free performance-wise: Old: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% ) 12 context-switches # 31.735 M/sec ( +- 27.38% ) 0 cpu-migrations # 0.000 K/sec 4745 page-faults # 12183.732 M/sec ( +- 0.54% ) 1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%) 185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%) 392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%) 1839236666 instructions # 1.18 insn per cycle # 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%) 407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%) 10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%) 0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% ) 262 context-switches # 38.546 M/sec ( +- 23.06% ) 0 cpu-migrations # 0.065 M/sec ( +- 76.03% ) 13287 page-faults # 1953.206 M/sec ( +- 0.32% ) 27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%) 1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%) 16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%) 17611143370 instructions # 0.65 insn per cycle # 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%) 3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%) 116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%) 6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%) ``` New: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs): 400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% ) 12 context-switches # 29.429 M/sec ( +- 25.95% ) 0 cpu-migrations # 0.100 M/sec ( +-100.00% ) 4714 page-faults # 11796.496 M/sec ( +- 0.55% ) 1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%) 199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%) 402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%) 1847783963 instructions # 1.15 insn per cycle # 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%) 407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%) 10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%) 0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% ) lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs): 6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% ) 217 context-switches # 31.236 M/sec ( +- 36.16% ) 1 cpu-migrations # 0.096 M/sec ( +- 50.00% ) 13258 page-faults # 1908.389 M/sec ( +- 0.34% ) 27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%) 1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%) 16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%) 17755545931 instructions # 0.64 insn per cycle # 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%) 3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%) 117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%) 6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% ) ``` I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps. Within noise i'd say. Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 \| PR40787 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58476 llvm-svn: 354767
*	[XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert"	Roman Lebedev	2019-02-25	1	-48/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert Abstractions are great. Readable code is great. JSON support library is a good idea. However unfortunately, there is an internal detail that one needs to be aware of in `llvm::json::Object` - it uses `llvm::DenseMap`. So for every `llvm::json::Object`, even if you only store a single `int` entry there, you pay the whole price of `llvm::DenseMap`. Unfortunately, it matters for `llvm-xray`. I was trying to analyse the `llvm-exegesis` analysis mode performance, and for that i wanted to view the LLVM X-Ray log visualization in Chrome trace viewer. And the `llvm-xray convert` is sluggish, and sometimes even ended up being killed by OOM. `xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis` (compiled with ` -fxray-instruction-threshold=128`) analysis mode over `-benchmarks-file` with 10099 points (one full latency measurement set), with normal runtime of 0.387s. Timings: Old: (copied from D58580) ``` $ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs): 21346.24 msec task-clock # 1.000 CPUs utilized ( +- 0.28% ) 314 context-switches # 14.701 M/sec ( +- 59.13% ) 1 cpu-migrations # 0.037 M/sec ( +-100.00% ) 2181354 page-faults # 102191.251 M/sec ( +- 0.02% ) 85477442102 cycles # 4004415.019 GHz ( +- 0.28% ) (83.33%) 14526427066 stalled-cycles-frontend # 16.99% frontend cycles idle ( +- 0.70% ) (83.33%) 32371533721 stalled-cycles-backend # 37.87% backend cycles idle ( +- 0.27% ) (33.34%) 67896890228 instructions # 0.79 insn per cycle # 0.48 stalled cycles per insn ( +- 0.03% ) (50.00%) 14592654840 branches # 683631198.653 M/sec ( +- 0.02% ) (66.67%) 212207534 branch-misses # 1.45% of all branches ( +- 0.94% ) (83.34%) 21.3502 +- 0.0585 seconds time elapsed ( +- 0.27% ) ``` New: ``` $ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs): 7178.38 msec task-clock # 1.000 CPUs utilized ( +- 0.26% ) 182 context-switches # 25.402 M/sec ( +- 28.84% ) 0 cpu-migrations # 0.046 M/sec ( +- 70.71% ) 33701 page-faults # 4694.994 M/sec ( +- 0.88% ) 28761053971 cycles # 4006833.933 GHz ( +- 0.26% ) (83.32%) 2028297997 stalled-cycles-frontend # 7.05% frontend cycles idle ( +- 1.61% ) (83.32%) 10773154901 stalled-cycles-backend # 37.46% backend cycles idle ( +- 0.38% ) (33.36%) 36199132874 instructions # 1.26 insn per cycle # 0.30 stalled cycles per insn ( +- 0.03% ) (50.02%) 6434504227 branches # 896420204.421 M/sec ( +- 0.03% ) (66.68%) 73355176 branch-misses # 1.14% of all branches ( +- 1.46% ) (83.33%) 7.1807 +- 0.0190 seconds time elapsed ( +- 0.26% ) ``` So using `llvm::json` nearly triples run-time on that test case. (+3x is times, not percent.) Memory: Old: ``` total runtime: 39.88s. bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s) calls to allocation functions: 33267816 (834135/s) temporary memory allocations: 5832298 (146235/s) peak heap memory consumption: 9.21GB peak RSS (including heaptrack overhead): 147.98GB total memory leaked: 1.09MB ``` New: ``` total runtime: 17.42s. bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s) calls to allocation functions: 21382982 (1227284/s) temporary memory allocations: 232858 (13364/s) peak heap memory consumption: 350.69MB peak RSS (including heaptrack overhead): 2.55GB total memory leaked: 79.95KB ``` Diff: ``` total runtime: -22.46s. bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s) calls to allocation functions: -11884834 (529155/s) temporary memory allocations: -5599440 (249307/s) peak heap memory consumption: -8.86GB peak RSS (including heaptrack overhead): 0B total memory leaked: -1.01MB ``` So using `llvm::json` increases peak memory consumption on this testcase ~+27x. And total allocation count +15x. Both of these numbers are times, not percent. And note that memory usage is clearly unbound with `llvm::json`, it directly depends on the length of the log, so peak memory consumption is always increasing. This isn't so with the dumb code, there is no accumulating memory consumption, peak memory consumption is fixed. Naturally, that means it will handle much larger logs without OOM'ing. Readability is good, but the price is simply unacceptable here. Too bad none of this analysis was done as part of the development/review D50129 itself. Reviewers: dberris, kpw, sammccall Reviewed By: dberris Subscribers: riccibruno, hans, courbet, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58584 llvm-svn: 354764
*	[obj2yaml] - Do not miss section index for special symbols.	George Rimar	2019-02-22	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes https://bugs.llvm.org/show_bug.cgi?id=40786 ("obj2yaml symbol output missing section index for SHN_ABS and SHN_COMMON symbols") Since SHN_ABS and SHN_COMMON symbols are special, we should preserve the st_shndx for them. The patch does this for them and the other special symbols. The test case is based on the test provided by James Henderson at the bug page! Differential revision: https://reviews.llvm.org/D58498 llvm-svn: 354661
*	[llvm-objcopy][NFC] Add std::move() to fix older BB	Jordan Rupprecht	2019-02-21	1	-2/+2
\| \| \| \|	llvm-svn: 354603
*	[llvm-objcopy][NFC] More error cleanup	Jordan Rupprecht	2019-02-21	3	-89/+177
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This removes calls to `error()`/`reportError()` in the main driver (llvm-objcopy.cpp) as well as the associated argv-parsing (CopyConfig.cpp). `logAllUnhandledErrors()` is now the main way to print errors. There are still a few uses from within the per-arch drivers, so we can't delete them yet... but almost! Reviewers: jhenderson, alexshap, espindola Reviewed By: jhenderson Subscribers: emaste, arichardson, jakehehrlich, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58316 llvm-svn: 354600
*	[llvm-objcopy] Make removeSectionReferences batched	Jordan Rupprecht	2019-02-21	2	-17/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Removing a large number of sections from a file with a lot of symbols can have abysmal (i.e. O(n^2)) performance, e.g. when running `--only-section` to extract one section out of a large file. This comes from iterating over all symbols in the symbol table each time we remove a section, to remove references to the section we just removed. Instead, do just one pass of symbol removal by passing a hash set of all the sections we'd like to remove references to. This fixes a regression when running llvm-objcopy -j <one section> on an object file with many sections and symbols -- on my machine, running `objcopy -j .keep_me huge-input.o /tmp/foo.o` takes .3s with GNU objcopy, 1.3s with an updated llvm-objcopy, and 7+ minutes with llvm-objcopy prior to this patch. Reviewers: MaskRay, jhenderson, jakehehrlich, alexshap, espindola Reviewed By: MaskRay, jhenderson Subscribers: echristo, emaste, arichardson, mgrang, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58296 llvm-svn: 354597
*	[yaml2obj][obj2yaml] - Support SHT_GNU_verdef (.gnu.version_d) section.	George Rimar	2019-02-21	2	-9/+120
\| \| \| \| \| \| \| \| \| \|	This patch adds support for parsing/dumping the .gnu.version section. Description of the section is: https://refspecs.linuxfoundation.org/LSB_1.3.0/gLSB/gLSB/symverdefs.html Differential revision: https://reviews.llvm.org/D58437 llvm-svn: 354574
*	[llvm-readobj] Change "SHT_MIPS_DWARF" to "MIPS_DWARF"	Fangrui Song	2019-02-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is to be consistent with the display of other MIPS section types. This string is also used by binutils-gdb/binutils/readelf.c:get_mips_section_type_name Since we are here, reorder the two enum constatns because SHT_MIPS_DWARF < SHT_MIPS_ABIFLAGS. Reviewers: jhenderson, atanasyan Reviewed By: jhenderson Subscribers: aprantl, sdardis, arichardson, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58496 llvm-svn: 354571
*	Fix file header issues in fuzzers. NFC	Fangrui Song	2019-02-21	7	-7/+7
\| \| \| \|	llvm-svn: 354551
*	Fix some include order and file headers issues. NFC	Fangrui Song	2019-02-21	7	-8/+8
\| \| \| \|	llvm-svn: 354550
*	[MCA][ResourceManager] Add a table that maps processor resource indices to ↵	Andrea Di Biagio	2019-02-20	2	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \|	processor resource identifiers. This patch adds a lookup table to speed up resource queries in the ResourceManager. This patch also moves helper function 'getResourceStateIndex()' from ResourceManager.cpp to Support.h, so that we can reuse that logic in the SummaryView (and potentially other views in llvm-mca). No functional change intended. llvm-svn: 354470
*	Fix the build with gcc/libstdc++ 4.8.2 after r354441	Hans Wennborg	2019-02-20	1	-2/+2
\| \| \| \|	llvm-svn: 354469
*	[yaml2elf] - Rename a variable. NFC.	George Rimar	2019-02-20	1	-2/+2
\| \| \| \| \| \|	Was suggested during review of D58441. llvm-svn: 354463
*	[yaml2obj] - Simplify implementation. NFCI.	George Rimar	2019-02-20	1	-21/+15
\| \| \| \| \| \| \| \| \| \| \| \|	Knowing about how types are declared for 32/64 bit platforms: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/BinaryFormat/ELF.h#L28 it is possible to simplify code that writes a binary a bit. The patch does that. Differential revision: https://reviews.llvm.org/D58441 llvm-svn: 354462
*	[llvm-exegesis] Opcode stabilization / reclusterization (PR40715)	Roman Lebedev	2019-02-20	6	-16/+152
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then, to facilitate further analysis we group the benchmarks with similar characteristics into clusters. Now, this is all not entirely deterministic. Some instructions have variable characteristics, depending on their arguments. And thus, if we do several benchmarks of the same instruction `Opcode`, we may end up with different performance characteristics measurements. And when we then do clustering, these several benchmarks of the same instruction `Opcode` may end up being clustered into different clusters. This is not great for further analysis. We shall find every `Opcode` with benchmarks not in just one cluster, and move all the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`. I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit, and introducing `-analysis-display-unstable-clusters` switch to toggle between displaying stable-only clusters and unstable-only clusters. The reclusterization is deterministically stable, produces identical reports between runs. (Or at least that is what i'm seeing, maybe it isn't) Timings/comparisons: old (current trunk/head) {F8303582} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% ) 172 context-switches # 25.965 M/sec ( +- 29.89% ) 0 cpu-migrations # 0.042 M/sec ( +- 56.54% ) 31073 page-faults # 4690.754 M/sec ( +- 0.08% ) 26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%) 2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%) 13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%) 19770706799 instructions # 0.74 insn per cycle # 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%) 4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%) 121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%) 6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% ) ``` patch, with reclustering but without filtering (i.e. outputting all the stable and unstable clusters) {F8303586} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs): 6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 213 context-switches # 32.952 M/sec ( +- 23.81% ) 1 cpu-migrations # 0.130 M/sec ( +- 43.84% ) 31287 page-faults # 4832.057 M/sec ( +- 0.08% ) 25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%) 1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%) 13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%) 19752995402 instructions # 0.76 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%) 4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%) 121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%) 6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% ) ``` Funnily, this measurement shows that said reclustering actually improved performance. patch, with reclustering, only the stable clusters {F8303594} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs): 6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% ) 133 context-switches # 20.792 M/sec ( +- 23.39% ) 0 cpu-migrations # 0.063 M/sec ( +- 61.24% ) 31318 page-faults # 4903.256 M/sec ( +- 0.08% ) 25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%) 1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%) 13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%) 19767554347 instructions # 0.77 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%) 4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%) 118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%) 6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% ) ``` Performance improved even further?! Makes sense i guess, less clusters to print. patch, with reclustering, only the unstable clusters {F8303601} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs): 6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% ) 194 context-switches # 31.709 M/sec ( +- 20.46% ) 0 cpu-migrations # 0.039 M/sec ( +- 49.77% ) 31413 page-faults # 5129.261 M/sec ( +- 0.06% ) 24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%) 1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%) 13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%) 18260877653 instructions # 0.74 insn per cycle # 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%) 4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%) 114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%) 6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% ) ``` This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps) (Also, wow this is fast, it used to take several minutes originally) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 \| PR40715 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D58355 llvm-svn: 354441
*	[WebAssembly] Update MC for bulk memory	Thomas Lively	2019-02-19	2	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Rename MemoryIndex to InitFlags and implement logic for determining data segment layout in ObjectYAML and MC. Also adds a "passive" flag for the .section assembler directive although this cannot be assembled yet because the assembler does not support data sections. Reviewers: sbc100, aardappel, aheejin, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57938 llvm-svn: 354397
*	[llvm-cov] Add support for gcov --hash-filenames option	Vedant Kumar	2019-02-19	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch adds support for --hash-filenames to llvm-cov. This option adds md5 hash of the source path to the name of the generated .gcov file. The option is crucial for cases where you have multiple files with the same name but can't use --preserve-paths as resulting filenames exceed the limit. from gcov(1): ``` -x --hash-filenames By default, gcov uses the full pathname of the source files to to create an output filename. This can lead to long filenames that can overflow filesystem limits. This option creates names of the form source-file##md5.gcov, where the source-file component is the final filename part and the md5 component is calculated from the full mangled name that would have been used otherwise. ``` Patch by Igor Ignatev! Differential Revision: https://reviews.llvm.org/D58370 llvm-svn: 354379
*	Revert "Revert "[llvm-objdump] Allow short options without arguments to be ↵	Matthew Voss	2019-02-19	2	-19/+27
\| \| \| \| \| \| \| \| \| \| \| \|	grouped"" - Tests that use multiple short switches now test them grouped and ungrouped. - Ensure the output of ungrouped and grouped variants is identical Differential Revision: https://reviews.llvm.org/D57904 llvm-svn: 354375
*	[yaml2obj][obj2yaml] - Support SHT_GNU_versym (.gnu.version) section.	George Rimar	2019-02-19	2	-0/+49
\| \| \| \| \| \| \| \| \|	This patch adds support for parsing dumping the .gnu.version section. Description of the section is: https://refspecs.linuxfoundation.org/LSB_1.3.0/gLSB/gLSB/symversion.html#SYMVERTBL Differential revision: https://reviews.llvm.org/D58280 llvm-svn: 354338
*	Recommit r354328, r354329 "[obj2yaml][yaml2obj] - Add support of ↵	George Rimar	2019-02-19	2	-9/+164
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	parsing/dumping of the .gnu.version_r section." Fix: Replace assert(!IO.getContext() && "The IO context is initialized already"); with assert(IO.getContext() && "The IO context is not initialized"); (this was introduced in r354329, where I tried to quickfix the darwin BB and seems copypasted the assert from the wrong place). Original commit message: The section is described here: https://refspecs.linuxfoundation.org/LSB_1.3.0/gLSB/gLSB/symverrqmts.html Patch just teaches obj2yaml/yaml2obj to dump and parse such sections. We did the finalization of string tables very late, and I had to move the logic to make it a bit earlier. That was needed in this patch since .gnu.version_r adds strings to .dynstr. This might also be useful for implementing other special sections. Everything else changed in this patch seems to be straightforward. Differential revision: https://reviews.llvm.org/D58119 llvm-svn: 354335
*	Revert r354328, r354329 "[obj2yaml][yaml2obj] - Add support of ↵	George Rimar	2019-02-19	2	-164/+9
\| \| \| \| \| \| \| \| \|	parsing/dumping of the .gnu.version_r section." Something went wrong. Bots are unhappy: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/44113/steps/test/logs/stdio llvm-svn: 354332
*	[obj2yaml][yaml2obj] - Add support of parsing/dumping of the .gnu.version_r ↵	George Rimar	2019-02-19	2	-9/+164
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	section. The section is described here: https://refspecs.linuxfoundation.org/LSB_1.3.0/gLSB/gLSB/symverrqmts.html Patch just teaches obj2yaml/yaml2obj to dump and parse such sections. We did the finalization of string tables very late, and I had to move the logic to make it a bit earlier. That was needed in this patch since .gnu.version_r adds strings to .dynstr. This might also be useful for implementing other special sections. Everything else changed in this patch seems to be straightforward. Differential revision: https://reviews.llvm.org/D58119 llvm-svn: 354328
*	[yaml2obj] - Do not skip zeroes blocks if there are relocations against them.	George Rimar	2019-02-19	1	-10/+16
\| \| \| \| \| \| \| \| \| \| \|	This is for -D -reloc combination. With this patch, we do not skip the zero bytes that have a relocation against them when -reloc is used. If -reloc is not used, then the behavior will be the same. Differential revision: https://reviews.llvm.org/D58174 llvm-svn: 354319
*	[yaml2obj] - Do not ignore explicit addresses for .dynsym and .dynstr	George Rimar	2019-02-19	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \|	This fixes https://bugs.llvm.org/show_bug.cgi?id=40339 Previously if the addresses were set in YAML they were ignored for .dynsym and .dynstr sections. The patch fixes that. Differential revision: https://reviews.llvm.org/D58168 llvm-svn: 354318
*	[llvm-readobj] - Simplify .gnu.version_d dumping.	George Rimar	2019-02-18	1	-10/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This is similar to D58048. Instead of scanning the dynamic table to read the DT_VERDEFNUM, we could take it from the sh_info field. (https://docs.oracle.com/cd/E19683-01/816-1386/chapter6-94076/index.html) The patch does this. llvm-svn: 354270
*	llvm-nm: Observe -no-llvm-bc for archive members	Dave Lee	2019-02-16	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change fixes the `-no-llvm-bc` flag to work with object files within archives. Currently the `-no-llvm-bc` flag works for regular object files, but not static libraries, where it continues to show bitcode symbol info. Original support was added in D4371. Reviewers: compnerd, smeenai, pcc Reviewed By: compnerd Subscribers: rupprecht, keith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D48798 llvm-svn: 354196
*	[llvm-cxxfilt] Fix a comment typo. NFC.	Matt Davis	2019-02-15	1	-1/+1
\| \| \| \|	llvm-svn: 354094
*	[llvm-ar] Implement the P modifier.	Jordan Rupprecht	2019-02-14	1	-17/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: GNU ar has a `P` modifier that changes filename comparisons to use full paths instead of the basename. As noted in the GNU docs, regular archives are not created with full path names, so P is used to deal with archives created by other archive programs (e.g. see the updated `absolute-paths.test` test case). Since thin archives use full path names -- paths are relative to the archive -- it seems very error prone to not imply P when dealing with thin archives, so P is implied in those cases. (I think this is a deviation from GNU ar that makes sense). This fixes PR37436 via https://github.com/ClangBuiltLinux/linux/issues/33. Reviewers: mstorsjo, pcc, ruiu, davide, david2050, rnk Subscribers: tpimh, llvm-commits, nickdesaulniers Tags: #llvm Differential Revision: https://reviews.llvm.org/D57927 llvm-svn: 354044
*	Revert "[llvm-objdump] Allow short options without arguments to be grouped"	Matthew Voss	2019-02-14	2	-27/+19
\| \| \| \| \| \| \| \|	Reverted due to failures on the llvm-hexagon-elf. This reverts commit 77e1f27476c89f65eeb496d131065177e6417f23. llvm-svn: 354002
*	[llvm-objdump] Allow short options without arguments to be grouped	Matthew Voss	2019-02-14	2	-19/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: https://bugs.llvm.org/show_bug.cgi?id=31679 Reviewers: kristina, jhenderson, grimar, jakehehrlich, rupprecht Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57904 llvm-svn: 353998
*	[llvm-ar][libObject] Fix relative paths when nesting thin archives.	Jordan Rupprecht	2019-02-13	1	-6/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When adding one thin archive to another, we currently chop off the relative path to the flattened members. For instance, when adding `foo/child.a` (which contains `x.txt`) to `parent.a`, when flattening it we should add it as `foo/x.txt` (which exists) instead of `x.txt` (which does not exist). As a note, this also undoes the `IsNew` parameter of handling relative paths in r288280. The unit test there still passes. This was reported as part of testing the kernel build with llvm-ar: https://patchwork.kernel.org/patch/10767545/ (see the second point). Reviewers: mstorsjo, pcc, ruiu, davide, david2050, inglorion Reviewed By: ruiu Subscribers: void, jdoerfert, tpimh, mgorny, hans, nickdesaulniers, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57842 llvm-svn: 353995
*	[llvm-readobj] Dump GNU_PROPERTY_X86_ISA_1_{NEEDED,USED} notes in ↵	Fangrui Song	2019-02-13	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	.note.gnu.property Reviewers: grimar, rupprecht Reviewed By: rupprecht Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58175 llvm-svn: 353991
*	[llvm-readobj] Rename pr_data to PrData	Fangrui Song	2019-02-13	1	-12/+12
\| \| \| \| \| \|	As requested by grimar in D58112. llvm-svn: 353951
*	[llvm-objcopy] Add --strip-unneeded-symbol(s)	Eugene Leviant	2019-02-13	5	-8/+32
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D58027 llvm-svn: 353919
*	[llvm-readobj] Dump GNU_PROPERTY_X86_FEATURE_2_{NEEDED,USED} notes in ↵	Fangrui Song	2019-02-13	1	-17/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	.note.gnu.property Summary: And change the output ("X86 features" -> "x86 feature") a bit. Reviewers: grimar, xiangzhangllvm, hjl.tools, rupprecht Reviewed By: rupprecht Subscribers: rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58112 llvm-svn: 353908
*	[dsymutil] Improve readability of cloneAllCompileUnits (NFC)	Jonas Devlieghere	2019-02-13	1	-5/+13
\| \| \| \| \| \|	Add some newlines and improve consistency between the two loops. llvm-svn: 353904
*	[dsymutil] Don't clone empty CUs	Jonas Devlieghere	2019-02-13	2	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The DWARF standard says that an empty compile unit is not valid: > Each such contribution consists of a compilation unit header (see > Section 7.5.1.1 on page 200) followed by a single DW_TAG_compile_unit or > DW_TAG_partial_unit debugging information entry, together with its > children. Therefore we shouldn't clone them in dsymutil. Differential revision: https://reviews.llvm.org/D57979 llvm-svn: 353903
*	[llvm-dwp] Use color-formatted error reporting	Jordan Rupprecht	2019-02-12	1	-2/+3
\| \| \| \|	llvm-svn: 353876
*	[llvm-dwp] Avoid writing the output dwp file when there is an error	Jordan Rupprecht	2019-02-12	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Use ToolOutputFile to clean up the output file unless dwp actually finishes successfully. Reviewers: dblaikie Reviewed By: dblaikie Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58130 llvm-svn: 353873
*	[llvm-dwp] Abort when dwo_id is unset	Jordan Rupprecht	2019-02-12	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: An empty dwo_id indicates a degenerate .dwo file that should not have been generated in the first place. Instead of discovering this error later when merging with another degenerate .dwo file, print an error immediately when noticing an unset dwo_id, including the filename of the offending file. Test case created by compiling a trivial file w/ `-fno-split-dwarf-inlining -gmlt -gsplit-dwarf -c` prior to r353771 Reviewers: dblaikie Reviewed By: dblaikie Subscribers: jdoerfert, aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58085 llvm-svn: 353846
*	[llvm-readobj] Only allow 4-byte pr_data	Fangrui Song	2019-02-12	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: AMD64 psABI says: "The pr_data field of each property contains a 4-byte unsigned integer." Thus we don't need to handle 8-byte pr_data. Reviewers: mike.dvoretsky, grimar, craig.topper, xiangzhangllvm, hjl.tools Reviewed By: grimar Subscribers: rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58103 llvm-svn: 353815