summaryrefslogtreecommitdiffstats
path: root/llvm/tools
Commit message (Collapse)AuthorAgeFilesLines
...
* [llvm-readobj] Display section names for STT_SECTION symbols.Matt Davis2019-03-011-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch will obtain the section name for symbols that refer to a section. Prior to this patch the Name field for STT_SECTIONs was blank, now it is populated. Before: ``` Symbol table '.symtab' contains 6 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 SECTION LOCAL DEFAULT 1 2: 0000000000000000 0 SECTION LOCAL DEFAULT 3 3: 0000000000000000 0 SECTION LOCAL DEFAULT 4 4: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_ 5: 0000000000000000 0 TLS GLOBAL DEFAULT UND sym ``` With this patch: ``` Symbol table '.symtab' contains 6 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text 2: 0000000000000000 0 SECTION LOCAL DEFAULT 3 .data 3: 0000000000000000 0 SECTION LOCAL DEFAULT 4 .bss 4: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_ 5: 0000000000000000 0 TLS GLOBAL DEFAULT UND sym ``` This fixes PR40788 Reviewers: jhenderson, rupprecht, espindola Reviewed By: rupprecht Subscribers: emaste, javed.absar, arichardson, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58796 llvm-svn: 355207
* [yaml2obj] - Allow setting custom sh_info for RawContentSection sections.George Rimar2019-03-012-7/+12
| | | | | | | | | | | This is for tweaking SHT_SYMTAB sections. Their sh_info contains the (number of symbols + 1) usually. But for creating invalid inputs for test cases it would be convenient to allow explicitly override this field from YAML. Differential revision: https://reviews.llvm.org/D58779 llvm-svn: 355193
* Attempt to fix buildbot after r354972 [#2]. NFCI.Alexey Lapshin2019-03-011-1/+14
| | | | llvm-svn: 355192
* [CommandLine] Allow grouping options which can have values.Igor Kudrin2019-03-011-1/+1
| | | | | | | | | | | | | | | | This patch allows all forms of values for options to be used at the end of a group. With the fix, it is possible to follow the way GNU binutils tools handle grouping options better. For example, the -j option can be used with objdump in any of the following ways: $ objdump -d -j .text a.o $ objdump -d -j.text a.o $ objdump -dj .text a.o $ objdump -dj.text a.o Differential Revision: https://reviews.llvm.org/D58711 llvm-svn: 355185
* llvm-readobj: Try the DWARF CFI dumper on all machines.Peter Collingbourne2019-02-281-8/+5
| | | | | | | | | | | | There's no reason to limit the DWARF CFI dumper to EM_386 and EM_X86_64; ELF files could contain DWARF CFI on almost any platform (even 32-bit ARM; NetBSD uses DWARF CFI on that platform). So start using the DWARF CFI dumper unconditionally so that we can dump .eh_frame sections on the remaining ELF platforms as well as in NetBSD binaries. Differential Revision: https://reviews.llvm.org/D58761 llvm-svn: 355151
* dsymutil support for DW_OP_convertAdrian Prantl2019-02-284-48/+160
| | | | | | | | | | | Add support for cloning DWARF expressions that contain base type DIE references in dsymutil. <rdar://problem/48167812> Differential Revision: https://reviews.llvm.org/D58534 llvm-svn: 355148
* [PGO] Context sensitive PGO (part 2)Rong Xu2019-02-281-6/+15
| | | | | | | | | | | Part 2 of CSPGO changes (mostly related to ProfileSummary). Note that I use a default parameter in setProfileSummary() and getSummary(). This is to break the dependency in clang. I will make the parameter explicit after changing clang in a separated patch. Differential Revision: https://reviews.llvm.org/D54175 llvm-svn: 355131
* [dsymutil] Use rfind for paths with parenthesesJonas Devlieghere2019-02-281-1/+1
| | | | | | | | Dsymutil gets library member information is through the ambiguous /path/to/archive.a(member.o). The current logic we use would get confused by additional parentheses. Using rfind mitigates this issue. llvm-svn: 355114
* llvm-config: Include -stdlib= in --cxxflagsTom Stellard2019-02-281-1/+2
| | | | | | | | | | | | | | | | | | Summary: This was removed in r349068, but it is needed when llvm is compiled using the non-default c++ standard library on a platform. Reviewers: sylvestre.ledru, infinity0, mgorny, cuviper Reviewed By: sylvestre.ledru Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57859 llvm-svn: 355107
* [llvm-objdump] - Improve the error message for "removing a section that is ↵George Rimar2019-02-281-6/+6
| | | | | | | | | | used by relocation" case. This refines/improves the error message introduced in D58625 Differential revision: https://reviews.llvm.org/D58709 llvm-svn: 355074
* [llvm-readobj] - Fix the invalid dumping of the dynamic sections without ↵George Rimar2019-02-281-19/+13
| | | | | | | | | | | | | | | | terminating DT_NULL entry. This is https://bugs.llvm.org/show_bug.cgi?id=40861, Previously llvm-readobj would print the DT_NULL sometimes for the dynamic section that has no terminator entry. The logic of printDynamicTable was a bit overcomplicated. I rewrote it slightly to fix the issue and commented. Differential revision: https://reviews.llvm.org/D58716 llvm-svn: 355073
* [llvm-cxxfilt] Re-enable split and demangle stdin input on certain ↵Matt Davis2019-02-271-7/+38
| | | | | | | | | | | | | | | | non-alphanumerics. This restores the patch that splits demangled stdin input on non-alphanumerics. I had reverted this patch earlier because it broke Windows build-bots. I have updated the test so that it passes on Windows. I was running the test from powershell and never saw the issue until I switched to the mingw shell. This reverts commit 628ab5c6820bdf3bb5a8e494b0fd9e7312ce7150. llvm-svn: 355031
* Revert "[llvm-cxxfilt] Split and demangle stdin input on certain ↵Matt Davis2019-02-271-38/+7
| | | | | | | | | | | | | non-alphanumerics." This reverts commit 5cd5f8f2563395f8767f94604eb4c4bea8dcbea0. The test passes on linux, but fails on the windows build-bots. This test failure seems to be a quoting issue between my test and FileCheck on Windows. I'm reverting this patch until I can replicate and fix in my Windows environment. llvm-svn: 355021
* [llvm-readobj] Print section type values for unknown sections.Matt Davis2019-02-271-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch displays a hexadecimal section value (Elf_Shdr::sh_type) or section-relative offset when printing unknown sections. Here is a subset of the output (ignoring the fields following "Type" when dumping an ELF's GNU `--section-headers` table). Section Headers: ``` [Nr] Name Type [16] android_rel LOOS+0x1 [17] android_rela LOOS+0x2 [27] unknown 0x1000: <unknown> [28] loos LOOS+0 [30] hios VERSYM [31] loproc LOPROC+0 [33] hiproc LOPROC+0xFFFFFFF [34] louser LOUSER+0 [36] hiuser LOUSER+0x7FFFFFFF ``` As a comparison, the previous output looked something like the above, but with a blank "Type" field: ``` [Nr] Name Type [27] unknown [28] loos [30] hios VERSYM [31] loproc [33] hiproc [34] louser [36] hiuser ``` This fixes PR40773 Reviewers: jhenderson, rupprecht, Bigcheese Reviewed By: jhenderson, rupprecht, Bigcheese Subscribers: MaskRay, Bigcheese, srhines, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58701 llvm-svn: 355014
* Recommit r354930 "[PGO] Context sensitive PGO (part 1)"Rong Xu2019-02-272-1/+24
| | | | | | Fixed UBSan failures. llvm-svn: 355005
* [llvm-objdump] Should print strings when dumping DT_RPATH, DT_RUNPATH, ↵Xing GUO2019-02-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DT_SONAME, DT_AUXILIARY and DT_FILTER tags in dynamic section. Summary: Before: ``` Dynamic Section: NEEDED libpthread.so.0 ... NEEDED ld-linux-x86-64.so.2 RPATH 0x00000000001c2e61 ``` After: ``` Dynamic Section: NEEDED libpthread.so.0 ... NEEDED ld-linux-x86-64.so.2 RPATH $ORIGIN/../lib ``` Only a small problem here, I have no idea on choosing test case. I see there's a test file(test/tools/llvm-objdump/private-headers-dynamic-section.test). But it has no DT_RPATH and DT_RUNPATH tags. Shall I replace the ELF file in the Inputs dir by a new one? Reviewers: jhenderson, grimar Reviewed By: jhenderson Subscribers: srhines, rupprecht, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58707 llvm-svn: 355001
* [llvm-cxxfilt] Split and demangle stdin input on certain non-alphanumerics.Matt Davis2019-02-271-7/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch attempts to replicate GNU c++-filt behavior when splitting stdin input for demangling. Previously, cxx-filt would split input only on spaces. Each delimited item is then demangled. From what I have tested, GNU c++filt also splits input on any character that does not make up the mangled name (notably commas, but also a large set of non-alphanumeric characters). This patch splits stdin input on any character that does not belong to the Itanium mangling format (since Itanium is currently the only supported format in llvm-cxxfilt). This is an update to PR39990 Reviewers: jhenderson, tejohnson, compnerd Reviewed By: compnerd Subscribers: erik.pilkington, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58416 llvm-svn: 354998
* [DebugInfo] add SectionedAddress to DebugInfo interfaces.Alexey Lapshin2019-02-2715-79/+163
| | | | | | | | | | | | | | | | | That patch is the fix for https://bugs.llvm.org/show_bug.cgi?id=40703 "wrong line number info for obj file compiled with -ffunction-sections" bug. The problem happened with only .o files. If object file contains several .text sections then line number information showed incorrectly. The reason for this is that DwarfLineTable could not detect section which corresponds to specified address(because address is the local to the section). And as the result it could not select proper sequence in the line table. The fix is to pass SectionIndex with the address. So that it would be possible to differentiate addresses from various sections. With this fix llvm-objdump shows correct line numbers for disassembled code. Differential review: https://reviews.llvm.org/D58194 llvm-svn: 354972
* [llvm-objcopy] - Check for invalidated relocations when removing a section.George Rimar2019-02-272-6/+26
| | | | | | | | | | | This is https://bugs.llvm.org/show_bug.cgi?id=40818 Removing a section that is used by relocation is an error we did not report. The patch fixes that. Differential revision: https://reviews.llvm.org/D58625 llvm-svn: 354962
* [llvm-readobj]Fix error messages for bad archive members and add testing for ↵James Henderson2019-02-271-1/+1
| | | | | | | | | | | | | | archive handling llvm-readobj's error messages were broken for bad archive members. This patch fixes them, and also adds testing for archive and thin archive handling within llvm-readobj. Reviewed by: rupprecht, grimar, higuoxing Differential Revision: https://reviews.llvm.org/D58681 llvm-svn: 354960
* Fix Wenum-compare gcc7 warning. NFCI.Simon Pilgrim2019-02-271-3/+3
| | | | llvm-svn: 354958
* [llvm-readobj] Print DF_1_DISPRELPNDFangrui Song2019-02-271-0/+1
| | | | | | The test will be added by D58677. llvm-svn: 354955
* Revert "[PGO] Context sensitive PGO (part 1)"Vlad Tsyrklevich2019-02-272-24/+1
| | | | | | This reverts commit r354930, it was causing UBSan failures. llvm-svn: 354953
* [PGO] Context sensitive PGO (part 1)Rong Xu2019-02-262-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current PGO profile counts are not context sensitive. The branch probabilities for the inlined functions are kept the same for all call-sites, and they might be very different from the actual branch probabilities. These suboptimal profiles can greatly affect some downstream optimizations, in particular for the machine basic block placement optimization. In this patch, we propose to have a post-inline PGO instrumentation/use pass, which we called Context Sensitive PGO (CSPGO). For the users who want the best possible performance, they can perform a second round of PGO instrument/use on the top of the regular PGO. They will have two sets of profile counts. The first pass profile will be manly for inline, indirect-call promotion, and CGSCC simplification pass optimizations. The second pass profile is for post-inline optimizations and code-gen optimizations. A typical usage: // Regular PGO instrumentation and generate pass1 profile. > clang -O2 -fprofile-generate source.c -o gen > ./gen > llvm-profdata merge default.*profraw -o pass1.profdata // CSPGO instrumentation. > clang -O2 -fprofile-use=pass1.profdata -fcs-profile-generate -o gen2 > ./gen2 // Merge two sets of profiles > llvm-profdata merge default.*profraw pass1.profdata -o profile.profdata // Use the combined profile. Pass manager will invoke two PGO use passes. > clang -O2 -fprofile-use=profile.profdata -o use This change touches many components in the compiler. The reviewed patch (D54175) will committed in phrases. Differential Revision: https://reviews.llvm.org/D54175 llvm-svn: 354930
* [llvm-objdump] Add `Version Definitions` dumperXing GUO2019-02-261-3/+44
| | | | | | | | | | | | | | | | Summary: `llvm-objdump` needs a `Version Definitions` dumper. Reviewers: grimar, jhenderson Reviewed By: grimar, jhenderson Subscribers: rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58615 llvm-svn: 354871
* [llvm-objdump] Implement -Mreg-names-raw/-std options.Igor Kudrin2019-02-261-0/+13
| | | | | | | | | | | | | | The --disassembler-options, or -M, are used to customize the disassembler and affect its output. The two implemented options allow selecting register names on ARM: * With -Mreg-names-raw, the disassembler uses rNN for all registers. * With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15, which is the default behavior of llvm-objdump. Differential Revision: https://reviews.llvm.org/D57680 llvm-svn: 354870
* [llvm-exegesis] Teach llvm-exegesis to handle instructions with multiple ↵Clement Courbet2019-02-261-17/+47
| | | | | | | | | | | | | | tied variables. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58285 llvm-svn: 354862
* [llvm-objcopy] Add --set-start, --change-start and --adjust-startEugene Leviant2019-02-265-1/+48
| | | | | | Differential revision: https://reviews.llvm.org/D58173 llvm-svn: 354854
* Revert "Improve "llvm-nm -f sysv" output for Elf files"Vlad Tsyrklevich2019-02-262-44/+14
| | | | | | | This reverts commit r354833, it was causing ASan test failures on sanitizer-x86_64-linux-fast. llvm-svn: 354849
* Improve "llvm-nm -f sysv" output for Elf filesSunil Srivastava2019-02-262-14/+44
| | | | | | | | Specifically, compute and Print Type and Section columns. Differential Revision: https://reviews.llvm.org/D58263 llvm-svn: 354833
* [llvm-objcopy] Add --add-symbolEugene Leviant2019-02-256-2/+106
| | | | | | Differential revision: https://reviews.llvm.org/D58234 llvm-svn: 354787
* [llvm-objdump] Add `Version References` dumperXing GUO2019-02-253-1/+74
| | | | | | | | | | | | | | | | Summary: Add symbol version dumper for [#30241](https://bugs.llvm.org/show_bug.cgi?id=30241) Reviewers: jhenderson, MaskRay, kristina, emaste, grimar Reviewed By: jhenderson, grimar Subscribers: grimar, rupprecht, jakehehrlich, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54697 llvm-svn: 354782
* [yaml2obj]Re-allow dynamic sections to have raw contentJames Henderson2019-02-251-4/+20
| | | | | | | | | | | | | | | | Recently, support was added to yaml2obj to allow dynamic sections to have a list of entries, to make it easier to write tests with dynamic sections. However, this change also removed the ability to provide custom contents to the dynamic section, making it hard to test malformed contents (e.g. because the section is not a valid size to contain an array of entries). This change reinstates this. An error is emitted if raw content and dynamic entries are both specified. Reviewed by: grimar, ruiu Differential Review: https://reviews.llvm.org/D58543 llvm-svn: 354770
* [llvm-exegesis] Split Epsilon param into two (PR40787)Roman Lebedev2019-02-255-19/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values What if one wants to only look at highly different clusters, without changing the clustering itself? In particular, this helps to weed out noisy measurements (since the clusterization epsilon is still small, so there is a better chance that noisy measurements from the same opcode will go into different clusters) By splitting it into two params it is now possible. This is nearly-free performance-wise: Old: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% ) 12 context-switches # 31.735 M/sec ( +- 27.38% ) 0 cpu-migrations # 0.000 K/sec 4745 page-faults # 12183.732 M/sec ( +- 0.54% ) 1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%) 185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%) 392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%) 1839236666 instructions # 1.18 insn per cycle # 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%) 407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%) 10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%) 0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% ) 262 context-switches # 38.546 M/sec ( +- 23.06% ) 0 cpu-migrations # 0.065 M/sec ( +- 76.03% ) 13287 page-faults # 1953.206 M/sec ( +- 0.32% ) 27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%) 1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%) 16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%) 17611143370 instructions # 0.65 insn per cycle # 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%) 3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%) 116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%) 6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%) ``` New: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs): 400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% ) 12 context-switches # 29.429 M/sec ( +- 25.95% ) 0 cpu-migrations # 0.100 M/sec ( +-100.00% ) 4714 page-faults # 11796.496 M/sec ( +- 0.55% ) 1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%) 199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%) 402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%) 1847783963 instructions # 1.15 insn per cycle # 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%) 407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%) 10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%) 0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% ) lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs): 6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% ) 217 context-switches # 31.236 M/sec ( +- 36.16% ) 1 cpu-migrations # 0.096 M/sec ( +- 50.00% ) 13258 page-faults # 1908.389 M/sec ( +- 0.34% ) 27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%) 1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%) 16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%) 17755545931 instructions # 0.64 insn per cycle # 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%) 3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%) 117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%) 6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% ) ``` I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps. Within noise i'd say. Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58476 llvm-svn: 354767
* [XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert"Roman Lebedev2019-02-251-48/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert Abstractions are great. Readable code is great. JSON support library is a *good* idea. However unfortunately, there is an internal detail that one needs to be aware of in `llvm::json::Object` - it uses `llvm::DenseMap`. So for **every** `llvm::json::Object`, even if you only store a single `int` entry there, you pay the whole price of `llvm::DenseMap`. Unfortunately, it matters for `llvm-xray`. I was trying to analyse the `llvm-exegesis` analysis mode performance, and for that i wanted to view the LLVM X-Ray log visualization in Chrome trace viewer. And the `llvm-xray convert` is sluggish, and sometimes even ended up being killed by OOM. `xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis` (compiled with ` -fxray-instruction-threshold=128`) analysis mode over `-benchmarks-file` with 10099 points (one full latency measurement set), with normal runtime of 0.387s. Timings: Old: (copied from D58580) ``` $ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs): 21346.24 msec task-clock # 1.000 CPUs utilized ( +- 0.28% ) 314 context-switches # 14.701 M/sec ( +- 59.13% ) 1 cpu-migrations # 0.037 M/sec ( +-100.00% ) 2181354 page-faults # 102191.251 M/sec ( +- 0.02% ) 85477442102 cycles # 4004415.019 GHz ( +- 0.28% ) (83.33%) 14526427066 stalled-cycles-frontend # 16.99% frontend cycles idle ( +- 0.70% ) (83.33%) 32371533721 stalled-cycles-backend # 37.87% backend cycles idle ( +- 0.27% ) (33.34%) 67896890228 instructions # 0.79 insn per cycle # 0.48 stalled cycles per insn ( +- 0.03% ) (50.00%) 14592654840 branches # 683631198.653 M/sec ( +- 0.02% ) (66.67%) 212207534 branch-misses # 1.45% of all branches ( +- 0.94% ) (83.34%) 21.3502 +- 0.0585 seconds time elapsed ( +- 0.27% ) ``` New: ``` $ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs): 7178.38 msec task-clock # 1.000 CPUs utilized ( +- 0.26% ) 182 context-switches # 25.402 M/sec ( +- 28.84% ) 0 cpu-migrations # 0.046 M/sec ( +- 70.71% ) 33701 page-faults # 4694.994 M/sec ( +- 0.88% ) 28761053971 cycles # 4006833.933 GHz ( +- 0.26% ) (83.32%) 2028297997 stalled-cycles-frontend # 7.05% frontend cycles idle ( +- 1.61% ) (83.32%) 10773154901 stalled-cycles-backend # 37.46% backend cycles idle ( +- 0.38% ) (33.36%) 36199132874 instructions # 1.26 insn per cycle # 0.30 stalled cycles per insn ( +- 0.03% ) (50.02%) 6434504227 branches # 896420204.421 M/sec ( +- 0.03% ) (66.68%) 73355176 branch-misses # 1.14% of all branches ( +- 1.46% ) (83.33%) 7.1807 +- 0.0190 seconds time elapsed ( +- 0.26% ) ``` So using `llvm::json` nearly triples run-time on that test case. (+3x is times, not percent.) Memory: Old: ``` total runtime: 39.88s. bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s) calls to allocation functions: 33267816 (834135/s) temporary memory allocations: 5832298 (146235/s) peak heap memory consumption: 9.21GB peak RSS (including heaptrack overhead): 147.98GB total memory leaked: 1.09MB ``` New: ``` total runtime: 17.42s. bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s) calls to allocation functions: 21382982 (1227284/s) temporary memory allocations: 232858 (13364/s) peak heap memory consumption: 350.69MB peak RSS (including heaptrack overhead): 2.55GB total memory leaked: 79.95KB ``` Diff: ``` total runtime: -22.46s. bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s) calls to allocation functions: -11884834 (529155/s) temporary memory allocations: -5599440 (249307/s) peak heap memory consumption: -8.86GB peak RSS (including heaptrack overhead): 0B total memory leaked: -1.01MB ``` So using `llvm::json` increases *peak* memory consumption on *this* testcase ~+27x. And total allocation count +15x. Both of these numbers are times, *not* percent. And note that memory usage is clearly unbound with `llvm::json`, it directly depends on the length of the log, so peak memory consumption is always increasing. This isn't so with the dumb code, there is no accumulating memory consumption, peak memory consumption is fixed. Naturally, that means it will handle *much* larger logs without OOM'ing. Readability is good, but the price is simply unacceptable here. Too bad none of this analysis was done as part of the development/review D50129 itself. Reviewers: dberris, kpw, sammccall Reviewed By: dberris Subscribers: riccibruno, hans, courbet, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58584 llvm-svn: 354764
* [obj2yaml] - Do not miss section index for special symbols.George Rimar2019-02-221-0/+7
| | | | | | | | | | | | | | This fixes https://bugs.llvm.org/show_bug.cgi?id=40786 ("obj2yaml symbol output missing section index for SHN_ABS and SHN_COMMON symbols") Since SHN_ABS and SHN_COMMON symbols are special, we should preserve the st_shndx for them. The patch does this for them and the other special symbols. The test case is based on the test provided by James Henderson at the bug page! Differential revision: https://reviews.llvm.org/D58498 llvm-svn: 354661
* [llvm-objcopy][NFC] Add std::move() to fix older BBJordan Rupprecht2019-02-211-2/+2
| | | | llvm-svn: 354603
* [llvm-objcopy][NFC] More error cleanupJordan Rupprecht2019-02-213-89/+177
| | | | | | | | | | | | | | | | | | | Summary: This removes calls to `error()`/`reportError()` in the main driver (llvm-objcopy.cpp) as well as the associated argv-parsing (CopyConfig.cpp). `logAllUnhandledErrors()` is now the main way to print errors. There are still a few uses from within the per-arch drivers, so we can't delete them yet... but almost! Reviewers: jhenderson, alexshap, espindola Reviewed By: jhenderson Subscribers: emaste, arichardson, jakehehrlich, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58316 llvm-svn: 354600
* [llvm-objcopy] Make removeSectionReferences batchedJordan Rupprecht2019-02-212-17/+31
| | | | | | | | | | | | | | | | | | | | | | Summary: Removing a large number of sections from a file with a lot of symbols can have abysmal (i.e. O(n^2)) performance, e.g. when running `--only-section` to extract one section out of a large file. This comes from iterating over all symbols in the symbol table each time we remove a section, to remove references to the section we just removed. Instead, do just one pass of symbol removal by passing a hash set of all the sections we'd like to remove references to. This fixes a regression when running llvm-objcopy -j <one section> on an object file with many sections and symbols -- on my machine, running `objcopy -j .keep_me huge-input.o /tmp/foo.o` takes .3s with GNU objcopy, 1.3s with an updated llvm-objcopy, and 7+ minutes with llvm-objcopy prior to this patch. Reviewers: MaskRay, jhenderson, jakehehrlich, alexshap, espindola Reviewed By: MaskRay, jhenderson Subscribers: echristo, emaste, arichardson, mgrang, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58296 llvm-svn: 354597
* [yaml2obj][obj2yaml] - Support SHT_GNU_verdef (.gnu.version_d) section.George Rimar2019-02-212-9/+120
| | | | | | | | | | This patch adds support for parsing/dumping the .gnu.version section. Description of the section is: https://refspecs.linuxfoundation.org/LSB_1.3.0/gLSB/gLSB/symverdefs.html Differential revision: https://reviews.llvm.org/D58437 llvm-svn: 354574
* [llvm-readobj] Change "SHT_MIPS_DWARF" to "MIPS_DWARF"Fangrui Song2019-02-211-2/+2
| | | | | | | | | | | | | | | | | | | | Summary: This is to be consistent with the display of other MIPS section types. This string is also used by binutils-gdb/binutils/readelf.c:get_mips_section_type_name Since we are here, reorder the two enum constatns because SHT_MIPS_DWARF < SHT_MIPS_ABIFLAGS. Reviewers: jhenderson, atanasyan Reviewed By: jhenderson Subscribers: aprantl, sdardis, arichardson, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58496 llvm-svn: 354571
* Fix file header issues in fuzzers. NFCFangrui Song2019-02-217-7/+7
| | | | llvm-svn: 354551
* Fix some include order and file headers issues. NFCFangrui Song2019-02-217-8/+8
| | | | llvm-svn: 354550
* [MCA][ResourceManager] Add a table that maps processor resource indices to ↵Andrea Di Biagio2019-02-202-6/+11
| | | | | | | | | | | | processor resource identifiers. This patch adds a lookup table to speed up resource queries in the ResourceManager. This patch also moves helper function 'getResourceStateIndex()' from ResourceManager.cpp to Support.h, so that we can reuse that logic in the SummaryView (and potentially other views in llvm-mca). No functional change intended. llvm-svn: 354470
* Fix the build with gcc/libstdc++ 4.8.2 after r354441Hans Wennborg2019-02-201-2/+2
| | | | llvm-svn: 354469
* [yaml2elf] - Rename a variable. NFC.George Rimar2019-02-201-2/+2
| | | | | | Was suggested during review of D58441. llvm-svn: 354463
* [yaml2obj] - Simplify implementation. NFCI.George Rimar2019-02-201-21/+15
| | | | | | | | | | | | Knowing about how types are declared for 32/64 bit platforms: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/BinaryFormat/ELF.h#L28 it is possible to simplify code that writes a binary a bit. The patch does that. Differential revision: https://reviews.llvm.org/D58441 llvm-svn: 354462
* [llvm-exegesis] Opcode stabilization / reclusterization (PR40715)Roman Lebedev2019-02-206-16/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then, to facilitate further analysis we group the benchmarks with *similar* characteristics into clusters. Now, this is all not entirely deterministic. Some instructions have variable characteristics, depending on their arguments. And thus, if we do several benchmarks of the same instruction `Opcode`, we may end up with *different* performance characteristics measurements. And when we then do clustering, these several benchmarks of the same instruction `Opcode` may end up being clustered into *different* clusters. This is not great for further analysis. We shall find every `Opcode` with benchmarks not in just one cluster, and move *all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`. I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit, and introducing `-analysis-display-unstable-clusters` switch to toggle between displaying stable-only clusters and unstable-only clusters. The reclusterization is deterministically stable, produces identical reports between runs. (Or at least that is what i'm seeing, maybe it isn't) Timings/comparisons: old (current trunk/head) {F8303582} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% ) 172 context-switches # 25.965 M/sec ( +- 29.89% ) 0 cpu-migrations # 0.042 M/sec ( +- 56.54% ) 31073 page-faults # 4690.754 M/sec ( +- 0.08% ) 26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%) 2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%) 13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%) 19770706799 instructions # 0.74 insn per cycle # 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%) 4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%) 121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%) 6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% ) ``` patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs): 6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 213 context-switches # 32.952 M/sec ( +- 23.81% ) 1 cpu-migrations # 0.130 M/sec ( +- 43.84% ) 31287 page-faults # 4832.057 M/sec ( +- 0.08% ) 25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%) 1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%) 13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%) 19752995402 instructions # 0.76 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%) 4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%) 121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%) 6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% ) ``` Funnily, *this* measurement shows that said reclustering actually improved performance. patch, with reclustering, only the stable clusters {F8303594} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs): 6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% ) 133 context-switches # 20.792 M/sec ( +- 23.39% ) 0 cpu-migrations # 0.063 M/sec ( +- 61.24% ) 31318 page-faults # 4903.256 M/sec ( +- 0.08% ) 25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%) 1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%) 13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%) 19767554347 instructions # 0.77 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%) 4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%) 118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%) 6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% ) ``` Performance improved even further?! Makes sense i guess, less clusters to print. patch, with reclustering, only the unstable clusters {F8303601} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs): 6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% ) 194 context-switches # 31.709 M/sec ( +- 20.46% ) 0 cpu-migrations # 0.039 M/sec ( +- 49.77% ) 31413 page-faults # 5129.261 M/sec ( +- 0.06% ) 24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%) 1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%) 13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%) 18260877653 instructions # 0.74 insn per cycle # 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%) 4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%) 114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%) 6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% ) ``` This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps) (Also, wow this is fast, it used to take several minutes originally) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D58355 llvm-svn: 354441
* [WebAssembly] Update MC for bulk memoryThomas Lively2019-02-192-3/+7
| | | | | | | | | | | | | | | | | | Summary: Rename MemoryIndex to InitFlags and implement logic for determining data segment layout in ObjectYAML and MC. Also adds a "passive" flag for the .section assembler directive although this cannot be assembled yet because the assembler does not support data sections. Reviewers: sbc100, aardappel, aheejin, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57938 llvm-svn: 354397
* [llvm-cov] Add support for gcov --hash-filenames optionVedant Kumar2019-02-191-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds support for --hash-filenames to llvm-cov. This option adds md5 hash of the source path to the name of the generated .gcov file. The option is crucial for cases where you have multiple files with the same name but can't use --preserve-paths as resulting filenames exceed the limit. from gcov(1): ``` -x --hash-filenames By default, gcov uses the full pathname of the source files to to create an output filename. This can lead to long filenames that can overflow filesystem limits. This option creates names of the form source-file##md5.gcov, where the source-file component is the final filename part and the md5 component is calculated from the full mangled name that would have been used otherwise. ``` Patch by Igor Ignatev! Differential Revision: https://reviews.llvm.org/D58370 llvm-svn: 354379
OpenPOWER on IntegriCloud