summaryrefslogtreecommitdiffstats
path: root/llvm/tools/llvm-pdbutil/llvm-pdbutil.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Dump public symbol records in pdb2yaml modeZachary Turner2018-10-261-0/+5
| | | | llvm-svn: 345348
* [PDB] Add the ability to lookup global symbols by name.Zachary Turner2018-10-081-0/+5
| | | | | | | | | | The Globals table is a hash table keyed on symbol name, so it's possible to lookup symbols by name in O(1) time. Add a function to the globals stream to do this, and add an option to llvm-pdbutil to exercise this, then use it to write some tests to verify correctness. llvm-svn: 343951
* [PDB] Add support for dumping Typedef records.Zachary Turner2018-10-011-0/+4
| | | | | | | | | | These work a little differently because they are actually in the globals stream and are treated as symbol records, even though DIA presents them as types. So this also adds the necessary infrastructure to cache records that live somewhere other than the TPI stream as well. llvm-svn: 343507
* [PDB] Add support for parsing VFTable Shape records.Zachary Turner2018-10-011-1/+8
| | | | | | This allows them to be returned from the native API. llvm-svn: 343506
* [PDB] Add native support for dumping array types.Zachary Turner2018-09-301-2/+9
| | | | llvm-svn: 343412
* [PDB] Better native API support for pointers.Zachary Turner2018-09-291-1/+6
| | | | | | | | | | | | We didn't properly detect when a pointer was a member pointer, and when that was the case we were not properly returning class parent info. This caused member pointers to render incorrectly in pretty mode. However, we didn't even have pretty tests for pointers in native mode, so those are also added now to ensure this. llvm-svn: 343393
* llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)Fangrui Song2018-09-271-4/+2
| | | | | | | | | | | | Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163
* [NativePDB] Add support for reading function signatures.Zachary Turner2018-09-211-11/+28
| | | | | | | This adds support for parsing function signature records and returning them through the native DIA interface. llvm-svn: 342780
* [PDB] Add native reading support for UDT / class types.Zachary Turner2018-09-211-0/+4
| | | | | | | | | | | | This allows the native reader to find records of class/struct/ union type and dump them. This behavior is tested by using the diadump subcommand against golden output produced by actual DIA SDK on the same PDB file, and again using pretty -native to confirm that we actually dump the classes. We don't find class members or anything like that yet, for now it's just the class itself. llvm-svn: 342779
* [PDB] Add the ability to map forward references to full decls.Zachary Turner2018-09-201-0/+8
| | | | | | | | | | | | | | | | | | | | | Some records point to an LF_CLASS, LF_UNION, LF_STRUCTURE, or LF_ENUM which is a forward reference and doesn't contain complete debug information. In these cases, we'd like to be able to quickly locate the full record. The TPI stream stores an array of pre-computed record hash values, one for each type record. If we pre-process this on startup, we can build a mapping from hash value -> {list of possible matching type indices}. Since hashes of full records are only based on the name and or unique name and not the full record contents, we can then use forward ref record to compute the hash of what *would* be the full record by just hashing the name, use this to get the list of possible matches, and iterate those looking for a match on name or unique name. llvm-pdbutil is updated to resolve forward references for the purposes of testing (plus it's just useful). Differential Revision: https://reviews.llvm.org/D52283 llvm-svn: 342656
* [PDB] Better support for enumerating pointer types.Zachary Turner2018-09-181-2/+24
| | | | | | | | | | | | | | | | | | | There were several issues with the previous implementation. 1) There were no tests. 2) We didn't support creating PDBSymbolTypePointer records for builtin types since those aren't described by LF_POINTER records. 3) We didn't support a wide enough variety of builtin types even ignoring pointers. This patch fixes all of these issues. In order to add tests, it's helpful to be able to ignore the symbol index id hierarchy because it makes the golden output from the DIA version not match our output, so I've extended the dumper to disable dumping of id fields. llvm-svn: 342493
* [PDB] Make the native reader support enumerators.Zachary Turner2018-09-171-0/+8
| | | | | | | | | | | Previously we would dump the names of enum types, but not their enumerator values. This adds support for enumerator values. In doing so, we have to introduce a general purpose mechanism for caching symbol indices of field list members. Unlike global types, FieldList members do not have a TypeIndex. So instead, we identify them by the pair {TypeIndexOfFieldList, IndexInFieldList}. llvm-svn: 342415
* Give InfoStreamBuilder an opt-in method to write a hash of the PDB as GUID.Nico Weber2018-09-151-2/+5
| | | | | | | | | | | | | Naively computing the hash after the PDB data has been generated is in practice as fast as other approaches I tried. I also tried online-computing the hash as parts of the PDB were written out (https://reviews.llvm.org/D51887; that's also where all the measuring data is) and computing the hash in parallel (https://reviews.llvm.org/D51957). This approach here is simplest, without being slower. Differential Revision: https://reviews.llvm.org/D51956 llvm-svn: 342333
* [PDB] Write FPO Data to the PDB.Zachary Turner2018-09-111-0/+4
| | | | llvm-svn: 342003
* [PDB] Support pointer types in the native reader.Zachary Turner2018-09-071-0/+50
| | | | | | | | | | In order to start testing this, I've added a new mode to llvm-pdbutil which is only really useful for writing tests. It just dumps the value of raw fields in record format. This isn't really ideal and it won't allow us to test some important cases, but it's better than nothing for now. llvm-svn: 341729
* [PDB] Refactor the PDB symbol classes to fix a reuse bug.Zachary Turner2018-09-051-1/+3
| | | | | | | | | | | | | | | | | | | | The way DIA SDK works is that when you request a symbol, it gets assigned an internal identifier that is unique for the life of the session. You can then use this identifier to get back the same symbol, with all of the same internal state that it had before, even if you "destroyed" the original copy of the object you had. This didn't work properly in our native implementation, and if you destroyed an object for a particular symbol, then requested the same symbol again, it would get assigned a new ID and you'd get a fresh copy of the object. In order to fix this some refactoring had to happen to properly reuse cached objects. Some unittests are added to verify that symbol reuse is taking place, making use of the new unittest input feature. llvm-svn: 341503
* [DebugInfo] Common behavior for error typesAlexandre Ganea2018-08-311-2/+1
| | | | | | | | | | | | | | | Following D50807, and heading towards D50664, this intermediary change does the following: 1. Upgrade all custom Error types in llvm/trunk/lib/DebugInfo/ to use the new StringError behavior (D50807). 2. Implement std::is_error_code_enum and make_error_code() for DebugInfo error enumerations. 3. Rename GenericError -> PDBError (the file will be renamed in a subsequent commit) 4. Update custom error messages to follow the same formatting: (\w\s*)+\. 5. Keep generic "file not found" (ENOENT) errors as they are in PDB code. Previously, there used to be a custom enumeration for that purpose. 6. Remove a few extraneous LF in log() implementations. Printing LF is a responsability at a higher level, not at the error level. Differential Revision: https://reviews.llvm.org/D51499 llvm-svn: 341228
* [PDB] One more fix for hasing GSI records.Zachary Turner2018-07-061-0/+88
| | | | | | | | | | | | | | | | The reference implementation uses a case-insensitive string comparison for strings of equal length. This will cause the string "tEo" to compare less than "VUo". However we were using a case sensitive comparison, which would generate the opposite outcome. Switch to a case insensitive comparison. Also, when one of the strings contains non-ascii characters, fallback to a straight memcmp. The only way to really test this is with a DIA test. Before this patch, the test will fail (but succeed if link.exe is used instead of lld-link). After the patch, it succeeds even with lld-link. llvm-svn: 336464
* [llvm-pdbutil] Dump more info about globals.Zachary Turner2018-07-061-0/+4
| | | | | | | | | | | | | | | We add an option to dump the entire global / public symbol record stream. Previously we would dump globals or publics, but not both. And when we did dump them, we would always dump them in the order they were referenced by the corresponding hash streams, not in the order they were serialized in. This patch adds a lower level mode that just dumps the whole stream in serialization order. Additionally, when dumping global-extras, we now dump the hash bitmap as well as the record offset instead of dumping all zeros for the offsets. llvm-svn: 336407
* Define InitLLVM to do common initialization all at once.Rui Ueyama2018-04-131-14/+4
| | | | | | | | | | | We have a few functions that virtually all command wants to run on process startup/shutdown. This patch adds InitLLVM class to do that all at once, so that we don't need to copy-n-paste boilerplate code to each llvm command's main() function. Differential Revision: https://reviews.llvm.org/D45602 llvm-svn: 330046
* [llvm-pdbutil] Add the ability to explain binary files.Zachary Turner2018-04-041-3/+17
| | | | | | | | | | Using this, you can use llvm-pdbutil to export the contents of a stream to a binary file, then run explain on the binary file so that it treats the offset as an offset into the stream instead of an offset into a file. This makes it easy to compare the contents of the same stream from two different files. llvm-svn: 329207
* [llvm-pdbutil] Add an export subcommand.Zachary Turner2018-04-021-0/+64
| | | | | | | | | | | | | | | | | This command can dump the binary contents of a stream to a file. This is useful when you want to do side-by-side comparisons of a specific stream from two PDBs to examine the differences between them. You can export both of them to a file, then open them up side by side in a hex editor (for example), so as to eliminate any differences that might arise from the contents being on different blocks in the PDB. In subsequent patches I plan to improve the "explain" subcommand so that you can explain the contents of a binary file that isn't necessarily a full PDB, but one of these dumped streams, by telling the subcommand how to interpret the contents. llvm-svn: 329002
* [tools] Change std::sort to llvm::sort in response to r327219Mandeep Singh Grang2018-04-011-4/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: JDevlieghere, zturner, echristo, dberris, friss Reviewed By: echristo Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D45141 llvm-svn: 328943
* [llvm-pdbutil] Dig deeper into the PDB and DBI streams when explaining.Zachary Turner2018-03-301-5/+7
| | | | | | | | | This will show more detail when using `llvm-pdbutil explain` on an offset in the DBI or PDB streams. Specifically, it will dig into individual header fields and substreams to give a more precise description of what the byte represents. llvm-svn: 328878
* [PDB] Add an explain subcommand.Zachary Turner2018-03-291-0/+24
| | | | | | | | | | | | | | | | | | | When investigating various things, we often have a file offset and what to know what's in the PDB at that address. For example we may be doing a binary comparison of two LLD-generated PDBs to look for sources of non-determinism, or we may wish to compare an LLD-generated PDB with a Microsoft generated PDB for sources of byte-for-byte incompatibility. In these cases, we can do a binary diff of the two files, and once we find a mismatched byte we can use explain to figure out what that byte is, immediately honining in on the problem. This patch implements this by trying to narrow the meaning of a particular file offset down as much as possible. Differential Revision: https://reviews.llvm.org/D44959 llvm-svn: 328799
* Delete pdbutil diff mode.Zachary Turner2018-03-261-73/+0
| | | | | | | | This has been made obsolete by the fact that almost all of the things it previously checked for are no longer relevant since we can just compare bytes in a lot of places. llvm-svn: 328562
* [PDB] Make our PDBs look more like MS PDBs.Zachary Turner2018-03-231-0/+9
| | | | | | | | | | | | | | | | | | When investigating bugs in PDB generation, the first step is often to do the same link with link.exe and then compare PDBs. But comparing PDBs is hard because two completely different byte sequences can both be correct, so it hampers the investigation when you also have to spend time figuring out not just which bytes are different, but also if the difference is meaningful. This patch fixes a couple of cases related to string table emission, hash table emission, and the order in which we emit strings that makes more of our bytes the same as the bytes generated by MS PDBs. Differential Revision: https://reviews.llvm.org/D44810 llvm-svn: 328348
* Revert "Resubmit "Support embedding natvis files in PDBs.""Zachary Turner2018-03-201-4/+0
| | | | | | | | This is still failing on a different bot this time due to some issue related to hashing absolute paths. Reverting until I can figure it out. llvm-svn: 328014
* Resubmit "Support embedding natvis files in PDBs."Zachary Turner2018-03-201-0/+4
| | | | | | | | | | | The issue causing this to fail in certain configurations should be fixed. It was due to the fact that DIA apparently expects there to be a null string at ID 1 in the string table. I'm not sure why this is important but it seems to make a difference, so set it. llvm-svn: 328002
* Revert "Support embedding natvis files in PDBs."Zachary Turner2018-03-191-4/+0
| | | | | | | This is causing a test failure on a certain bot, so I'm removing this temporarily until we can figure out the source of the error. llvm-svn: 327903
* Support embedding natvis files in PDBs.Zachary Turner2018-03-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Natvis is a debug language supported by Visual Studio for specifying custom visualizers. The /NATVIS option is an undocumented link.exe flag which will take a .natvis file and "inject" it into the PDB. This way, you can ship the debug visualizers for a program along with the PDB, which is very useful for postmortem debugging. This is implemented by adding a new "named stream" to the PDB with a special name of /src/files/<natvis file name> and simply copying the contents of the xml into this file. Additionally, we need to emit a single stream named /src/headerblock which contains a hash table of embedded files to records describing them. This patch adds this functionality, including the /NATVIS option to lld-link. Differential Revision: https://reviews.llvm.org/D44328 llvm-svn: 327895
* [PDB] Support dumping injected sources via the DIA reader.Zachary Turner2018-03-131-0/+78
| | | | | | | | | | | | | | | | | | Injected sources are basically a way to add actual source file content to your PDB. Presumably you could use this for shipping your source code with your debug information, but in practice I can only find this being used for embedding natvis files inside of PDBs. In order to effectively test LLVM's natvis file injection, we need a way to dump the injected sources of a PDB in a way that is authoritative (i.e. based on Microsoft's understanding of the PDB format, and not LLVM's). To this end, I've added support for dumping injected sources via DIA. I made a PDB file that used the /natvis option to generate a test case. Differential Revision: https://reviews.llvm.org/D44405 llvm-svn: 327428
* [llvm-pdbdump] Add guard for null pointers and remove unused codeAaron Smith2018-03-071-47/+55
| | | | | | | | | | | | | | Summary: This avoids crashing when a user tries to dump a pdb with the `-native` option. Reviewers: zturner, llvm-commits, rnk Reviewed By: zturner Subscribers: mgrang Differential Revision: https://reviews.llvm.org/D44117 llvm-svn: 326863
* Remove redundant includes from tools.Michael Zolotukhin2017-12-131-4/+0
| | | | llvm-svn: 320631
* Split TypeTableBuilder into two classes.Zachary Turner2017-11-301-4/+5
| | | | llvm-svn: 319456
* Make TypeTableBuilder inherit from TypeCollection.Zachary Turner2017-11-291-4/+4
| | | | | | | | | | | | | | A couple of places in LLD were passing references to TypeTableCollections around, which makes it hard to change the implementation at runtime. However, these cases only needed to iterate over the types in the collection, and TypeCollection already provides a handy abstract interface for this purpose. By implementing this interface, we can get rid of the need to pass TypeTableBuilder references around, which should allow us to swap the implementation at runtime in subsequent patches. llvm-svn: 319345
* Fix line endings in llvm-pdbutil.cppZachary Turner2017-11-291-11/+11
| | | | llvm-svn: 319340
* [CodeView] Refactor / Rewrite TypeSerializer and TypeTableBuilder.Zachary Turner2017-11-281-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The motivation behind this patch is that future directions require us to be able to compute the hash value of records independently of actually using them for de-duplication. The current structure of TypeSerializer / TypeTableBuilder being a single entry point that takes an unserialized type record, and then hashes and de-duplicates it is not flexible enough to allow this. At the same time, the existing TypeSerializer is already extremely complex for this very reason -- it tries to be too many things. In addition to serializing, hashing, and de-duplicating, ti also supports splitting up field list records and adding continuations. All of this functionality crammed into this one class makes it very complicated to work with and hard to maintain. To solve all of these problems, I've re-written everything from scratch and split the functionality into separate pieces that can easily be reused. The end result is that one class TypeSerializer is turned into 3 new classes SimpleTypeSerializer, ContinuationRecordBuilder, and TypeTableBuilder, each of which in isolation is simple and straightforward. A quick summary of these new classes and their responsibilities are: - SimpleTypeSerializer : Turns a non-FieldList leaf type into a series of bytes. Does not do any hashing. Every time you call it, it will re-serialize and return bytes again. The same instance can be re-used over and over to avoid re-allocations, and in exchange for this optimization the bytes returned by the serializer only live until the caller attempts to serialize a new record. - ContinuationRecordBuilder : Turns a FieldList-like record into a series of fragments. Does not do any hashing. Like SimpleTypeSerializer, returns references to privately owned bytes, so the storage is invalidated as soon as the caller tries to re-use the instance. Works equally well for LF_FIELDLIST as it does for LF_METHODLIST, solving a long-standing theoretical limitation of the previous implementation. - TypeTableBuilder : Accepts sequences of bytes that the user has already serialized, and inserts them by de-duplicating with a hash table. For the sake of convenience and efficiency, this class internally stores a SimpleTypeSerializer so that it can accept unserialized records. The same is not true of ContinuationRecordBuilder. The user is required to create their own instance of ContinuationRecordBuilder. Differential Revision: https://reviews.llvm.org/D40518 llvm-svn: 319198
* Add llvm::for_each as a range-based extensions to <algorithm> and make use ↵Aaron Ballman2017-11-031-14/+11
| | | | | | of it in some cases where it is a more clear alternative to std::for_each. llvm-svn: 317356
* COFF: PDB: Allow multiple modules with the same name.Peter Collingbourne2017-09-071-1/+1
| | | | | | | | | | It is possible for two modules to have the same name if they are archive members with the same name, or if we are doing LTO (in which case all modules will have the name "lto.tmp"). Differential Revision: https://reviews.llvm.org/D37589 llvm-svn: 312744
* [llvm-pdbutil] Support dumping CodeView from object files.Zachary Turner2017-09-011-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | We have llvm-readobj for dumping CodeView from object files, and llvm-pdbutil has always been more focused on PDB. However, llvm-pdbutil has a lot of useful options for summarizing debug information in aggregate and presenting high level statistical views. Furthermore, it's arguably better as a testing tool since we don't have to write tests to conform to a state-machine like structure where you match multiple lines in succession, each depending on a previous match. llvm-pdbutil dumps much more concisely, so it's possible to use single-line matches in many cases where as with readobj tests you have to use multi-line matches with an implicit state machine. Because of this, I'm adding object file support to llvm-pdbutil. In fact, this mirrors the cvdump tool from Microsoft, which also supports both object files and pdb files. In the future we could perhaps rename this tool llvm-cvutil. In the meantime, this allows us to deep dive into object files the same way we already can with PDB files. llvm-svn: 312358
* [llvm-pdbutil] Print detailed S_UDT stats.Zachary Turner2017-08-311-13/+19
| | | | | | | | | | | | | | This adds a new command line option, -udt-stats, which breaks down the stats of S_UDT records. These are one of the biggest contributors to the size of /DEBUG:FASTLINK PDBs, so they need some additional tools to be able to analyze their usage. This option will dig into each S_UDT record and determine what kind of record it points to, and then break down the statistics by the target type. The goal here is to identify how our object files differ from MSVC object files in S_UDT records, so that we can output fewer of them and reach size parity. llvm-svn: 312276
* [llvm-pdbutil] Add support for dumping detailed module stats.Zachary Turner2017-08-211-0/+12
| | | | | | | | | | | | | | | This adds support for dumping a summary of module symbols and CodeView debug chunks. This option prints a table for each module of all of the symbols that occurred in the module and the number of times it occurred and total byte size. Then at the end it prints the totals for the entire file. Additionally, this patch adds the -jmc (just my code) option, which suppresses modules which are from external libraries or linker imports, so that you can focus only on the object files and libraries that originate from your own source code. llvm-svn: 311338
* [llvm-pdbutil] Dump image section headers.Zachary Turner2017-08-041-0/+4
| | | | | | | | | | Image section headers are stored in the DBI stream, but we had no way to dump them. This patch adds dumping support, along with some tests that LLD actually dumps them correctly. Differential Revision: https://reviews.llvm.org/D36332 llvm-svn: 310107
* [llvm-pdbutil] Add an option to only dump specific module indices.Zachary Turner2017-08-031-0/+4
| | | | | | | | | Often something interesting (like a symbol) is in a particular module, and you don't want to dump symbols from all other 300 modules to see the one you want. This adds a -modi option so that we only dump the specified module. llvm-svn: 310000
* [llvm-pdbutil] Allow diff to force module equivalencies.Zachary Turner2017-08-031-0/+22
| | | | | | | | | | | Sometimes the normal module equivalence detection algorithm doesn't quite work. For example, you might build the same program with MSVC and clang-cl, outputting to different object files, exes, and PDBs, then compare them. If the object files have different names though, then they won't be treated as equivalent. This way we can force specific module indices to be treated as equivalent. llvm-svn: 309983
* [pdbutil] Add a command to dump the FPM.Zachary Turner2017-08-021-0/+2
| | | | | | | | | | | | | | Recently problems have been discovered in the way we write the FPM (free page map). In order to fix this, we first need to establish a baseline about what a correct FPM looks like using an MSVC generated PDB, so that we can then make our own generated PDBs match. And in order to do this, the dumper needs a mode where it can dump an FPM so that we can write tests for it. This patch adds a command to dump the FPM, as well as a test against a known-good PDB. llvm-svn: 309894
* [PDB] Improve GSI hash table dumping for publics and globalsReid Kleckner2017-07-261-0/+5
| | | | | | | | | | | | | | | The PDB "symbol stream" actually contains symbol records for the publics and the globals stream. The globals and publics streams are essentially hash tables that point into a single stream of records. In order to match cvdump's behavior, we need to only dump symbol records referenced from the hash table. This patch implements that, and then implements global stream dumping, since it's just a subset of public stream dumping. Now we shouldn't see S_PROCREF or S_GDATA32 records when dumping publics, and instead we should see those record in the globals stream. llvm-svn: 309066
* [PDB] Dump extra info about the publics streamReid Kleckner2017-07-211-0/+3
| | | | | | | | | | This includes the hash table, the address map, and the thunk table and section offset table. The last two are only used for incremental linking, which LLD doesn't support, so they are less interesting. The hash table is particularly important to get right, since this is the one of the streams that debuggers use to translate addresses to symbols. llvm-svn: 308764
* [codeview] Remove TypeServerHandler and PDBTypeServerHandlerReid Kleckner2017-07-171-2/+2
| | | | | | | | | | | | | | | | | Summary: Instead of wiring these through the CVTypeVisitor interface, clients should inspect the CVTypeArray before visiting it and potentially load up the type server's TPI stream if they need it. No tests relied on this functionality because LLD was the only client. Reviewers: ruiu Subscribers: mgorny, hiraditya, zturner, llvm-commits Differential Revision: https://reviews.llvm.org/D35394 llvm-svn: 308212
OpenPOWER on IntegriCloud