summaryrefslogtreecommitdiffstats
path: root/llvm/docs/PDB/MsfFile.rst
diff options
context:
space:
mode:
authorZachary Turner <zturner@google.com>2016-11-10 19:24:21 +0000
committerZachary Turner <zturner@google.com>2016-11-10 19:24:21 +0000
commit218ce83f0b1efaf6403ee580f499c5e07e515a05 (patch)
tree8ceabf6236a735100bbfd65052b2e16711634758 /llvm/docs/PDB/MsfFile.rst
parent58ddb8d0415fe3e704943118a6e3b7cf02515739 (diff)
downloadbcm5719-llvm-218ce83f0b1efaf6403ee580f499c5e07e515a05.tar.gz
bcm5719-llvm-218ce83f0b1efaf6403ee580f499c5e07e515a05.zip
[PDB] Begin adding documentation for the PDB file format.
Differential Revision: https://reviews.llvm.org/D26374 llvm-svn: 286491
Diffstat (limited to 'llvm/docs/PDB/MsfFile.rst')
-rw-r--r--llvm/docs/PDB/MsfFile.rst121
1 files changed, 121 insertions, 0 deletions
diff --git a/llvm/docs/PDB/MsfFile.rst b/llvm/docs/PDB/MsfFile.rst
new file mode 100644
index 00000000000..bdceca3aeb3
--- /dev/null
+++ b/llvm/docs/PDB/MsfFile.rst
@@ -0,0 +1,121 @@
+=====================================
+The MSF File Format
+=====================================
+
+.. contents::
+ :local:
+
+.. _msf_superblock:
+
+The Superblock
+==============
+At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
+follows:
+
+.. code-block:: c++
+
+ struct SuperBlock {
+ char FileMagic[sizeof(Magic)];
+ ulittle32_t BlockSize;
+ ulittle32_t FreeBlockMapBlock;
+ ulittle32_t NumBlocks;
+ ulittle32_t NumDirectoryBytes;
+ ulittle32_t Unknown;
+ ulittle32_t BlockMapAddr;
+ };
+
+- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
+ followed by the bytes ``1A 44 53 00 00 00``.
+- **BlockSize** - The block size of the internal file system. Valid values are
+ 512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
+ depending on the block sizes. For the purposes of LLVM, we handle only block
+ sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
+- **FreeBlockMapBlock** - The index of a block within the file, at which begins
+ a bitfield representing the set of all blocks within the file which are "free"
+ (i.e. the data within that block is not used). This bitfield is spread across
+ the MSF file at ``BlockSize`` intervals.
+ **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``! This field
+ is designed to support incremental and atomic updates of the underlying MSF
+ file. While writing to an MSF file, if the value of this field is `1`, you
+ can write your new modified bitfield to page 2, and vice versa. Only when
+ you commit the file to disk do you need to swap the value in the SuperBlock
+ to point to the new ``FreeBlockMapBlock``.
+- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
+ should equal the size of the file on disk.
+- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
+ directory contains information about each stream's size and the set of blocks
+ that it occupies. It will be described in more detail later.
+- **BlockMapAddr** - The index of a block within the MSF file. At this block is
+ an array of ``ulittle32_t``'s listing the blocks that the stream directory
+ resides on. For large MSF files, the stream directory (which describes the
+ block layout of each stream) may not fit entirely on a single block. As a
+ result, this extra layer of indirection is introduced, whereby this block
+ contains the list of blocks that the stream directory occupies, and the stream
+ directory itself can be stitched together accordingly. The number of
+ ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
+
+The Stream Directory
+====================
+The Stream Directory is the root of all access to the other streams in an MSF
+file. Beginning at byte 0 of the stream directory is the following structure:
+
+.. code-block:: c++
+
+ struct StreamDirectory {
+ ulittle32_t NumStreams;
+ ulittle32_t StreamSizes[NumStreams];
+ ulittle32_t StreamBlocks[NumStreams][];
+ };
+
+And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
+Note that each of the last two arrays is of variable length, and in particular
+that the second array is jagged.
+
+**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
+streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
+
+Stream 0: ceil(1000 / 4096) = 1 block
+
+Stream 1: ceil(8000 / 4096) = 2 blocks
+
+Stream 2: ceil(16000 / 4096) = 4 blocks
+
+Stream 3: ceil(9000 / 4096) = 3 blocks
+
+In total, 10 blocks are used. Let's see what the stream directory might look
+like:
+
+.. code-block:: c++
+
+ struct StreamDirectory {
+ ulittle32_t NumStreams = 4;
+ ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
+ ulittle32_t StreamBlocks[][] = {
+ {4},
+ {5, 6},
+ {11, 9, 7, 8},
+ {10, 15, 12}
+ };
+ };
+
+In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
+would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
+``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
+
+Note also that the streams are discontiguous, and that part of stream 3 is in the
+middle of part of stream 2. You cannot assume anything about the layout of the
+blocks!
+
+Alignment and Block Boundaries
+==============================
+As may be clear by now, it is possible for a single field (whether it be a high
+level record, a long string field, or even a single ``uint16``) to begin and
+end in separate blocks. For example, if the block size is 4096 bytes, and a
+``uint16`` field begins at the last byte of the current block, then it would
+need to end on the first byte of the next block. Since blocks are not
+necessarily contiguously laid out in the file, this means that both the consumer
+and the producer of an MSF file must be prepared to split data apart
+accordingly. In the aforementioned example, the high byte of the ``uint16``
+would be written to the last byte of block N, and the low byte would be written
+to the first byte of block N+1, which could be tens of thousands of bytes later
+(or even earlier!) in the file, depending on what the stream directory says.
OpenPOWER on IntegriCloud