diff options
author | Zachary Turner <zturner@google.com> | 2019-04-05 22:09:30 +0000 |
---|---|---|
committer | Zachary Turner <zturner@google.com> | 2019-04-05 22:09:30 +0000 |
commit | cb70fe1c69a2eadda02682383e1f3e877409090d (patch) | |
tree | b537720cd9df93e313c67b9cd1a3de7d8eeb48ab /llvm/docs | |
parent | 91d6caf6ec1ffd9d0b6d1fd9c21bc68d5f06cf99 (diff) | |
download | bcm5719-llvm-cb70fe1c69a2eadda02682383e1f3e877409090d.tar.gz bcm5719-llvm-cb70fe1c69a2eadda02682383e1f3e877409090d.zip |
[PDB Docs] Add documentation for the hash table format.
llvm-svn: 357826
Diffstat (limited to 'llvm/docs')
-rw-r--r-- | llvm/docs/PDB/HashTable.rst | 101 |
1 files changed, 101 insertions, 0 deletions
diff --git a/llvm/docs/PDB/HashTable.rst b/llvm/docs/PDB/HashTable.rst index fbf4a1de528..c296f3efadb 100644 --- a/llvm/docs/PDB/HashTable.rst +++ b/llvm/docs/PDB/HashTable.rst @@ -1,2 +1,103 @@ The PDB Serialized Hash Table Format
====================================
+
+.. contents::
+ :local:
+
+.. _hash_intro:
+
+Introduction
+============
+
+One of the design goals of the PDB format is to provide accelerated access to
+debug information, and for this reason there are several occasions where hash
+tables are serialized and embedded directly to the file, rather than requiring
+a consumer to read a list of values and reconstruct the hash table on the fly.
+
+The serialization format supports hash tables of arbitrarily large size and
+capacity, as well as value types and hash functions. The only supported key
+value type is a uint32. The only requirement is that the producer and consumer
+agree on the hash function. As such, the hash function can is not discussed
+further in this document, it is assumed that for a particular instance of a PDB
+file hash table, the appropriate hash function is being used.
+
+On-Disk Format
+==============
+
+.. code-block:: none
+
+ .--------------------.-- +0
+ | Size |
+ .--------------------.-- +4
+ | Capacity |
+ .--------------------.-- +8
+ | Present Bit Vector |
+ .--------------------.-- +N
+ | Deleted Bit Vector |
+ .--------------------.-- +M ─╮
+ | Key | │
+ .--------------------.-- +M+4 │
+ | Value | │
+ .--------------------.-- +M+4+sizeof(Value) │
+ ... ├─ |Capacity| Bucket entries
+ .--------------------. │
+ | Key | │
+ .--------------------. │
+ | Value | │
+ .--------------------. ─╯
+
+- **Size** - The number of values contained in the hash table.
+
+- **Capacity** - The number of buckets in the hash table. Producers should
+ maintain a load factor of no greater than ``2/3*Capacity+1``.
+
+- **Present Bit Vector** - A serialized bit vector which contains information
+ about which buckets have valid values. If the bucket has a value, the
+ corresponding bit will be set, and if the bucket doesn't have a value (either
+ because the bucket is empty or because the value is a tombstone value) the bit
+ will be unset.
+
+- **Deleted Bit Vector** - A serialized bit vector which contains information
+ about which buckets have tombstone values. If the entry in this bucket is
+ deleted, the bit will be set, otherwise it will be unset.
+
+- **Keys and Values** - A list of ``Capacity`` hash buckets, where the first
+ entry is the key (always a uint32), and the second entry is the value. The
+ state of each bucket (valid, empty, deleted) can be determined by examining
+ the present and deleted bit vectors.
+
+
+.. _hash_bit_vectors:
+
+Present and Deleted Bit Vectors
+===============================
+
+The bit vectors indicating the status of each bucket are serialized as follows:
+
+.. code-block:: none
+
+ .--------------------.-- +0
+ | Word Count |
+ .--------------------.-- +4
+ | Word_0 | ─╮
+ .--------------------.-- +8 │
+ | Word_1 | │
+ .--------------------.-- +12 ├─ |Word Count| values
+ ... │
+ .--------------------. │
+ | Word_N | │
+ .--------------------. ─╯
+
+The words, when viewed as a contiguous block of bytes, represent a bit vector with
+the following layout:
+
+.. code-block:: none
+
+ .------------. .------------.------------.
+ | Word_N | ... | Word_1 | Word_0 |
+ .------------. .------------.------------.
+ | | | | |
+ +N*32 +(N-1)*32 +64 +32 +0
+
+where the k'th bit of this bit vector represents the status of the k'th bucket
+in the hash table.
|