summaryrefslogtreecommitdiffstats
path: root/llvm/tools/llvm-pdbutil/llvm-pdbutil.h
Commit message (Collapse)AuthorAgeFilesLines
* [pdb] Add -type-stats and sort stats by descending sizeReid Kleckner2019-03-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: It prints this on chromium browser_tests.exe.pdb: Types Total: 5647475 entries ( 371,897,512 bytes, 65.85 avg) -------------------------------------------------------------------------- LF_CLASS: 397894 entries ( 119,537,780 bytes, 300.43 avg) LF_STRUCTURE: 236351 entries ( 83,208,084 bytes, 352.05 avg) LF_FIELDLIST: 291003 entries ( 66,087,920 bytes, 227.10 avg) LF_MFUNCTION: 1884176 entries ( 52,756,928 bytes, 28.00 avg) LF_POINTER: 1149030 entries ( 13,877,344 bytes, 12.08 avg) LF_ARGLIST: 789980 entries ( 12,436,752 bytes, 15.74 avg) LF_METHODLIST: 361498 entries ( 8,351,008 bytes, 23.10 avg) LF_ENUM: 16069 entries ( 6,108,340 bytes, 380.13 avg) LF_PROCEDURE: 269374 entries ( 4,309,984 bytes, 16.00 avg) LF_MODIFIER: 235602 entries ( 2,827,224 bytes, 12.00 avg) LF_UNION: 9131 entries ( 2,072,168 bytes, 226.94 avg) LF_VFTABLE: 323 entries ( 207,784 bytes, 643.29 avg) LF_ARRAY: 6639 entries ( 106,380 bytes, 16.02 avg) LF_VTSHAPE: 126 entries ( 6,472 bytes, 51.37 avg) LF_BITFIELD: 278 entries ( 3,336 bytes, 12.00 avg) LF_LABEL: 1 entries ( 8 bytes, 8.00 avg) The PDB is overall 1.9GB, so the LF_CLASS and LF_STRUCTURE declarations account for about 10% of the overall file size. I was surprised to find that on average LF_FIELDLIST records are short. Maybe this is because there are many more types with short member lists than there are instantiations with lots of members, like std::vector. Reviewers: aganea, zturner Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59672 llvm-svn: 356813
* [llvm-pdbutil] Add -type-ref-stats to help find unused type infoReid Kleckner2019-03-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This considers module symbol streams and the global symbol stream to be roots. Most types that this considers "unreferenced" are referenced by LF_UDT_MOD_SRC_LINE id records, which VC seems to always include. Essentially, they are types that the user can only find in the debugger if they call them by name, they cannot be found by traversing a symbol. In practice, around 80% of type information in a PDB is referenced by a symbol. That seems like a reasonable number. I don't really plan to do anything with this tool. It mostly just exists for informational purposes, and to confirm that we probably don't need to implement type reference tracking in LLD. We can continue to merge all types as we do today without wasting space. Reviewers: zturner, aganea Subscribers: mgorny, hiraditya, arphaman, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59620 llvm-svn: 356692
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* Dump public symbol records in pdb2yaml modeZachary Turner2018-10-261-0/+1
| | | | llvm-svn: 345348
* [PDB] Add the ability to lookup global symbols by name.Zachary Turner2018-10-081-0/+1
| | | | | | | | | | The Globals table is a hash table keyed on symbol name, so it's possible to lookup symbols by name in O(1) time. Add a function to the globals stream to do this, and add an option to llvm-pdbutil to exercise this, then use it to write some tests to verify correctness. llvm-svn: 343951
* [PDB] Add support for parsing VFTable Shape records.Zachary Turner2018-10-011-0/+1
| | | | | | This allows them to be returned from the native API. llvm-svn: 343506
* [PDB] Add native support for dumping array types.Zachary Turner2018-09-301-0/+1
| | | | llvm-svn: 343412
* [PDB] Better native API support for pointers.Zachary Turner2018-09-291-0/+1
| | | | | | | | | | | | We didn't properly detect when a pointer was a member pointer, and when that was the case we were not properly returning class parent info. This caused member pointers to render incorrectly in pretty mode. However, we didn't even have pretty tests for pointers in native mode, so those are also added now to ensure this. llvm-svn: 343393
* [NativePDB] Add support for reading function signatures.Zachary Turner2018-09-211-0/+1
| | | | | | | This adds support for parsing function signature records and returning them through the native DIA interface. llvm-svn: 342780
* [PDB] Add the ability to map forward references to full decls.Zachary Turner2018-09-201-0/+1
| | | | | | | | | | | | | | | | | | | | | Some records point to an LF_CLASS, LF_UNION, LF_STRUCTURE, or LF_ENUM which is a forward reference and doesn't contain complete debug information. In these cases, we'd like to be able to quickly locate the full record. The TPI stream stores an array of pre-computed record hash values, one for each type record. If we pre-process this on startup, we can build a mapping from hash value -> {list of possible matching type indices}. Since hashes of full records are only based on the name and or unique name and not the full record contents, we can then use forward ref record to compute the hash of what *would* be the full record by just hashing the name, use this to get the list of possible matches, and iterate those looking for a match on name or unique name. llvm-pdbutil is updated to resolve forward references for the purposes of testing (plus it's just useful). Differential Revision: https://reviews.llvm.org/D52283 llvm-svn: 342656
* [PDB] Write FPO Data to the PDB.Zachary Turner2018-09-111-0/+1
| | | | llvm-svn: 342003
* [PDB] One more fix for hasing GSI records.Zachary Turner2018-07-061-0/+2
| | | | | | | | | | | | | | | | The reference implementation uses a case-insensitive string comparison for strings of equal length. This will cause the string "tEo" to compare less than "VUo". However we were using a case sensitive comparison, which would generate the opposite outcome. Switch to a case insensitive comparison. Also, when one of the strings contains non-ascii characters, fallback to a straight memcmp. The only way to really test this is with a DIA test. Before this patch, the test will fail (but succeed if link.exe is used instead of lld-link). After the patch, it succeeds even with lld-link. llvm-svn: 336464
* [llvm-pdbutil] Dump more info about globals.Zachary Turner2018-07-061-0/+1
| | | | | | | | | | | | | | | We add an option to dump the entire global / public symbol record stream. Previously we would dump globals or publics, but not both. And when we did dump them, we would always dump them in the order they were referenced by the corresponding hash streams, not in the order they were serialized in. This patch adds a lower level mode that just dumps the whole stream in serialization order. Additionally, when dumping global-extras, we now dump the hash bitmap as well as the record offset instead of dumping all zeros for the offsets. llvm-svn: 336407
* [llvm-pdbutil] Add the ability to explain binary files.Zachary Turner2018-04-041-0/+3
| | | | | | | | | | Using this, you can use llvm-pdbutil to export the contents of a stream to a binary file, then run explain on the binary file so that it treats the offset as an offset into the stream instead of an offset into a file. This makes it easy to compare the contents of the same stream from two different files. llvm-svn: 329207
* [llvm-pdbutil] Add an export subcommand.Zachary Turner2018-04-021-0/+6
| | | | | | | | | | | | | | | | | This command can dump the binary contents of a stream to a file. This is useful when you want to do side-by-side comparisons of a specific stream from two PDBs to examine the differences between them. You can export both of them to a file, then open them up side by side in a hex editor (for example), so as to eliminate any differences that might arise from the contents being on different blocks in the PDB. In subsequent patches I plan to improve the "explain" subcommand so that you can explain the contents of a binary file that isn't necessarily a full PDB, but one of these dumped streams, by telling the subcommand how to interpret the contents. llvm-svn: 329002
* [llvm-pdbutil] Dig deeper into the PDB and DBI streams when explaining.Zachary Turner2018-03-301-1/+1
| | | | | | | | | This will show more detail when using `llvm-pdbutil explain` on an offset in the DBI or PDB streams. Specifically, it will dig into individual header fields and substreams to give a more precise description of what the byte represents. llvm-svn: 328878
* [PDB] Add an explain subcommand.Zachary Turner2018-03-291-0/+4
| | | | | | | | | | | | | | | | | | | When investigating various things, we often have a file offset and what to know what's in the PDB at that address. For example we may be doing a binary comparison of two LLD-generated PDBs to look for sources of non-determinism, or we may wish to compare an LLD-generated PDB with a Microsoft generated PDB for sources of byte-for-byte incompatibility. In these cases, we can do a binary diff of the two files, and once we find a mismatched byte we can use explain to figure out what that byte is, immediately honining in on the problem. This patch implements this by trying to narrow the meaning of a particular file offset down as much as possible. Differential Revision: https://reviews.llvm.org/D44959 llvm-svn: 328799
* Delete pdbutil diff mode.Zachary Turner2018-03-261-7/+0
| | | | | | | | This has been made obsolete by the fact that almost all of the things it previously checked for are no longer relevant since we can just compare bytes in a lot of places. llvm-svn: 328562
* [PDB] Make our PDBs look more like MS PDBs.Zachary Turner2018-03-231-0/+2
| | | | | | | | | | | | | | | | | | When investigating bugs in PDB generation, the first step is often to do the same link with link.exe and then compare PDBs. But comparing PDBs is hard because two completely different byte sequences can both be correct, so it hampers the investigation when you also have to spend time figuring out not just which bytes are different, but also if the difference is meaningful. This patch fixes a couple of cases related to string table emission, hash table emission, and the order in which we emit strings that makes more of our bytes the same as the bytes generated by MS PDBs. Differential Revision: https://reviews.llvm.org/D44810 llvm-svn: 328348
* Revert "Resubmit "Support embedding natvis files in PDBs.""Zachary Turner2018-03-201-1/+0
| | | | | | | | This is still failing on a different bot this time due to some issue related to hashing absolute paths. Reverting until I can figure it out. llvm-svn: 328014
* Resubmit "Support embedding natvis files in PDBs."Zachary Turner2018-03-201-0/+1
| | | | | | | | | | | The issue causing this to fail in certain configurations should be fixed. It was due to the fact that DIA apparently expects there to be a null string at ID 1 in the string table. I'm not sure why this is important but it seems to make a difference, so set it. llvm-svn: 328002
* Revert "Support embedding natvis files in PDBs."Zachary Turner2018-03-191-1/+0
| | | | | | | This is causing a test failure on a certain bot, so I'm removing this temporarily until we can figure out the source of the error. llvm-svn: 327903
* Support embedding natvis files in PDBs.Zachary Turner2018-03-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Natvis is a debug language supported by Visual Studio for specifying custom visualizers. The /NATVIS option is an undocumented link.exe flag which will take a .natvis file and "inject" it into the PDB. This way, you can ship the debug visualizers for a program along with the PDB, which is very useful for postmortem debugging. This is implemented by adding a new "named stream" to the PDB with a special name of /src/files/<natvis file name> and simply copying the contents of the xml into this file. Additionally, we need to emit a single stream named /src/headerblock which contains a hash table of embedded files to records describing them. This patch adds this functionality, including the /NATVIS option to lld-link. Differential Revision: https://reviews.llvm.org/D44328 llvm-svn: 327895
* [llvm-pdbutil] Support dumping CodeView from object files.Zachary Turner2017-09-011-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | We have llvm-readobj for dumping CodeView from object files, and llvm-pdbutil has always been more focused on PDB. However, llvm-pdbutil has a lot of useful options for summarizing debug information in aggregate and presenting high level statistical views. Furthermore, it's arguably better as a testing tool since we don't have to write tests to conform to a state-machine like structure where you match multiple lines in succession, each depending on a previous match. llvm-pdbutil dumps much more concisely, so it's possible to use single-line matches in many cases where as with readobj tests you have to use multi-line matches with an implicit state machine. Because of this, I'm adding object file support to llvm-pdbutil. In fact, this mirrors the cvdump tool from Microsoft, which also supports both object files and pdb files. In the future we could perhaps rename this tool llvm-cvutil. In the meantime, this allows us to deep dive into object files the same way we already can with PDB files. llvm-svn: 312358
* [llvm-pdbutil] Print detailed S_UDT stats.Zachary Turner2017-08-311-1/+2
| | | | | | | | | | | | | | This adds a new command line option, -udt-stats, which breaks down the stats of S_UDT records. These are one of the biggest contributors to the size of /DEBUG:FASTLINK PDBs, so they need some additional tools to be able to analyze their usage. This option will dig into each S_UDT record and determine what kind of record it points to, and then break down the statistics by the target type. The goal here is to identify how our object files differ from MSVC object files in S_UDT records, so that we can output fewer of them and reach size parity. llvm-svn: 312276
* [llvm-pdbutil] Add support for dumping detailed module stats.Zachary Turner2017-08-211-0/+2
| | | | | | | | | | | | | | | This adds support for dumping a summary of module symbols and CodeView debug chunks. This option prints a table for each module of all of the symbols that occurred in the module and the number of times it occurred and total byte size. Then at the end it prints the totals for the entire file. Additionally, this patch adds the -jmc (just my code) option, which suppresses modules which are from external libraries or linker imports, so that you can focus only on the object files and libraries that originate from your own source code. llvm-svn: 311338
* [llvm-pdbutil] Dump image section headers.Zachary Turner2017-08-041-0/+1
| | | | | | | | | | Image section headers are stored in the DBI stream, but we had no way to dump them. This patch adds dumping support, along with some tests that LLD actually dumps them correctly. Differential Revision: https://reviews.llvm.org/D36332 llvm-svn: 310107
* [llvm-pdbutil] Add an option to only dump specific module indices.Zachary Turner2017-08-031-0/+1
| | | | | | | | | Often something interesting (like a symbol) is in a particular module, and you don't want to dump symbols from all other 300 modules to see the one you want. This adds a -modi option so that we only dump the specified module. llvm-svn: 310000
* [llvm-pdbutil] Allow diff to force module equivalencies.Zachary Turner2017-08-031-0/+2
| | | | | | | | | | | Sometimes the normal module equivalence detection algorithm doesn't quite work. For example, you might build the same program with MSVC and clang-cl, outputting to different object files, exes, and PDBs, then compare them. If the object files have different names though, then they won't be treated as equivalent. This way we can force specific module indices to be treated as equivalent. llvm-svn: 309983
* [pdbutil] Add a command to dump the FPM.Zachary Turner2017-08-021-0/+2
| | | | | | | | | | | | | | Recently problems have been discovered in the way we write the FPM (free page map). In order to fix this, we first need to establish a baseline about what a correct FPM looks like using an MSVC generated PDB, so that we can then make our own generated PDBs match. And in order to do this, the dumper needs a mode where it can dump an FPM so that we can write tests for it. This patch adds a command to dump the FPM, as well as a test against a known-good PDB. llvm-svn: 309894
* [PDB] Improve GSI hash table dumping for publics and globalsReid Kleckner2017-07-261-0/+2
| | | | | | | | | | | | | | | The PDB "symbol stream" actually contains symbol records for the publics and the globals stream. The globals and publics streams are essentially hash tables that point into a single stream of records. In order to match cvdump's behavior, we need to only dump symbol records referenced from the hash table. This patch implements that, and then implements global stream dumping, since it's just a subset of public stream dumping. Now we shouldn't see S_PROCREF or S_GDATA32 records when dumping publics, and instead we should see those record in the globals stream. llvm-svn: 309066
* [PDB] Dump extra info about the publics streamReid Kleckner2017-07-211-0/+1
| | | | | | | | | | This includes the hash table, the address map, and the thunk table and section offset table. The last two are only used for incremental linking, which LLD doesn't support, so they are less interesting. The hash table is particularly important to get right, since this is the one of the streams that debuggers use to translate addresses to symbols. llvm-svn: 308764
* Resubmit "Add pdb-diff test."Zachary Turner2017-07-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | This was originally reverted because of two issues. 1) Printing ANSI color escape codes even when outputting to a file 2) Module name comparisons were failing when comparing a PDB generated on one machine to a PDB generated on another machine. I attempted to fix #2 by adding command line options which let you specify prefixes to strip from the beginning of embedded paths, which effectively lets us specify a path to "base" each PDB from and only compare the parts under the base. But this is tricky because PDB paths always use Windows path syntax, even when they are created on non-Windows hosts. A problem still existed when constructing the prefix to strip, where we were accidentally using a host-specific path separator instead of a Windows path separator. This resubmission fixes the issue on Linux (and I have verified that the test now passes on Linux). llvm-svn: 307571
* Revert "Build fixes for pdb-diff test."Zachary Turner2017-07-101-2/+0
| | | | | | | | This reverts commit 180af3fdbdb17ec35b45ec1f925fd743b28d37e1. This is still breaking due to linux-specific path differences. llvm-svn: 307559
* Fix pdb-diff test.Zachary Turner2017-07-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | A test was checked in on Friday that worked by checking in an object file and PDB generated locally by MSVC, and then having the test run lld-link on the object file and diffing LLD's PDB against the checked in PDB. This failed because part of the diffing algorithm involves determining if two modules are the same, and if so drilling into the module and diffing individual fields of the module. The only thing we can use to make this determination though is the "name" of the module, which is a path to where the module (obj file) was read from on the machine where it was linked. This fails for obvious reasons when comparing a PDB generated on one machine to a PDB on another machine. The fix employed here is to add two command line options to the diff subcommand, which allow the user to specify a "binary root path". The bin root path, if specified, is stripped from the beginning of any embedded PDB paths. The test is updated to specify the user's local test output directory for the left PDB, and is hardcoded to the location where the original PDB was created for the right PDB. This way all the equivalence comparisons should succeed. llvm-svn: 307555
* Fix some differences between lld and MSVC generated PDBs.Zachary Turner2017-07-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A couple of things were different about our generated PDBs. 1) We were outputting the wrong Version on the PDB Stream. The version we were setting was newer than what MSVC is setting. It's not clear what the implications are, but we change LLD to use PdbImplVC70, as MSVC does. 2) For the optional debug stream indices in the DBI Stream, we were outputting 0 to mean "the stream is not present". MSVC outputs uint16_t(-1), which is the "correct" way to specify that a stream is not present. So we fix that as well. 3) We were setting the PDB Stream signature to 0. This is supposed to be the result of calling time(nullptr). Although this leads to non-deterministic builds, a better way to solve that is by having a command line option explicitly for generating a reproducible build, and have the default behavior of lld-link match the default behavior of link. To test this, I'm making use of the new and improved `pdb diff` sub command. To make it suitable for writing tests against, I had to modify the diff subcommand slightly to print less verbose output. Previously it would always print | <column> | <value1> | <value2> | which is quite verbose, and the values are fragile. All we really want to know is "did we produce the same value as link?" So I added command line options to print a single character representing the result status (different, identical, equivalent), and another to hide the value display. Note that just inspecting the diff output used to write the test, you can see some things that are obviously wrong. That is just reflective of the fact that this is the state of affairs today, not that we're asserting that this is "correct". We can use this as a starting point to discover differences, fix them, and update the test. Differential Revision: https://reviews.llvm.org/D35086 llvm-svn: 307422
* [llvm-pdbutil] Add the ability to dump the dependency tree for a typeZachary Turner2017-06-301-0/+1
| | | | | | | | | | | | | | | Previously we had the -type-index option which would dump the record of a single, but we had no way to follow the dependency graph backwards and also dump all dependent types. Having this option makes test-writing better, because we can limit the test to only those records that are of importance for the thing we're trying to test, which allows us to use things like CHECK-NEXT to reduce fragility. Differential Revision: https://reviews.llvm.org/D34899 llvm-svn: 306852
* [llvm-pdbutil] Add a mode to `bytes` for dumping split debug chunks.Zachary Turner2017-06-261-1/+1
| | | | llvm-svn: 306309
* [llvm-pdbutil] Dump raw bytes of module symbols and debug chunks.Zachary Turner2017-06-231-0/+5
| | | | llvm-svn: 306179
* [llvm-pdbutil] Dump raw bytes of type and id records.Zachary Turner2017-06-231-0/+3
| | | | llvm-svn: 306167
* [llvm-pdbutil] Dump raw bytes of various DBI stream subsections.Zachary Turner2017-06-231-0/+8
| | | | llvm-svn: 306160
* [llvm-pdbutil] Show what blocks a stream occupies.Zachary Turner2017-06-231-0/+1
| | | | | | | | This is useful when you want to look at a specific chunk of a stream or look for discontinuities, and you need to know the list of blocks occupied by a stream. llvm-svn: 306150
* [llvm-pdbutil] Dump raw bytes of pdb name map.Zachary Turner2017-06-231-0/+1
| | | | | | | | This patch dumps the raw bytes of the pdb name map which contains the mapping of stream name to stream index for the string table and other reserved streams. llvm-svn: 306148
* [llvm-pdbutil] Add the ability to dump raw bytes from the file.Zachary Turner2017-06-231-4/+6
| | | | | | | | | | Normally we can only make sense of the content of a PDB in terms of streams and blocks, but in some cases it may be useful to dump bytes at a specific absolute file offset. For example, if you know that some interesting data is at a particular location and you want to see some surrounding data. llvm-svn: 306146
* [llvm-pdbutil] Create a "bytes" subcommand.Zachary Turner2017-06-221-3/+6
| | | | | | | | | | | | | | | | | | | This idea originally came about when I was doing some deep investigation of why certain bytes in a PDB that we round-tripped differed from their original bytes in the source PDB. I found myself having to hack up the code in many places to dump the bytes of this substream, or that record. It would be nice if we could just do this for every possible stream, substream, debug chunk type, etc. It doesn't make sense to put this under dump because there's just so many options that would detract from the more common use case of just dumping deserialized records. So making a new subcommand seems like the most logical course of action. In doing so, we already have two command line options that are suitable for this new subcommand, so start out by moving them there. llvm-svn: 306056
* [llvm-pdbutil] Rename "raw" to "dump".Zachary Turner2017-06-221-1/+1
| | | | | | | | | Now you run llvm-pdbutil dump <options>. This is a followup after having renamed the tool, whereas before raw was obviously just the style of dumping, whereas now "dump" is the action to perform with the "util". llvm-svn: 306055
* Remove diff pedantic mode.Zachary Turner2017-06-201-4/+0
| | | | llvm-svn: 305818
* Remove some dead code / includes.Zachary Turner2017-06-161-0/+3
| | | | | | | I'm trying to get rid of the TypeDatabase class, so the first step is to minimize its footprint. llvm-svn: 305611
* [llvm-pdbutil] Add support for dumping cross module imports/exports.Zachary Turner2017-06-161-0/+2
| | | | llvm-svn: 305532
* [llvm-pdbutil] Add support for dumping lines and inlinee lines.Zachary Turner2017-06-151-0/+3
| | | | llvm-svn: 305529
OpenPOWER on IntegriCloud