======================================== Machine IR (MIR) Format Reference Manual ======================================== .. contents:: :local: .. warning:: This is a work in progress. Introduction ============ This document is a reference manual for the Machine IR (MIR) serialization format. MIR is a human readable serialization format that is used to represent LLVM's :ref:`machine specific intermediate representation `. The MIR serialization format is designed to be used for testing the code generation passes in LLVM. Overview ======== The MIR serialization format uses a YAML container. YAML is a standard data serialization language, and the full YAML language spec can be read at `yaml.org `_. A MIR file is split up into a series of `YAML documents`_. The first document can contain an optional embedded LLVM IR module, and the rest of the documents contain the serialized machine functions. .. _YAML documents: http://www.yaml.org/spec/1.2/spec.html#id2800132 High Level Structure ==================== Embedded Module --------------- When the first YAML document contains a `YAML block literal string`_, the MIR parser will treat this string as an LLVM assembly language string that represents an embedded LLVM IR module. Here is an example of a YAML document that contains an LLVM module: .. code-block:: llvm --- | define i32 @inc(i32* %x) { entry: %0 = load i32, i32* %x %1 = add i32 %0, 1 store i32 %1, i32* %x ret i32 %1 } ... .. _YAML block literal string: http://www.yaml.org/spec/1.2/spec.html#id2795688 Machine Functions ----------------- The remaining YAML documents contain the machine functions. This is an example of such YAML document: .. code-block:: llvm --- name: inc tracksRegLiveness: true liveins: - { reg: '%rdi' } body: | bb.0.entry: liveins: %rdi %eax = MOV32rm %rdi, 1, _, 0, _ %eax = INC32r killed %eax, implicit-def dead %eflags MOV32mr killed %rdi, 1, _, 0, _, %eax RETQ %eax ... The document above consists of attributes that represent the various properties and data structures in a machine function. The attribute ``name`` is required, and its value should be identical to the name of a function that this machine function is based on. The attribute ``body`` is a `YAML block literal string`_. Its value represents the function's machine basic blocks and their machine instructions. Machine Instructions Format Reference ===================================== The machine basic blocks and their instructions are represented using a custom, human readable serialization language. This language is used in the `YAML block literal string`_ that corresponds to the machine function's body. A source string that uses this language contains a list of machine basic blocks, which are described in the section below. Machine Basic Blocks -------------------- A machine basic block is defined in a single block definition source construct that contains the block's ID. The example below defines two blocks that have an ID of zero and one: .. code-block:: llvm bb.0: bb.1: A machine basic block can also have a name. It should be specified after the ID in the block's definition: .. code-block:: llvm bb.0.entry: ; This block's name is "entry" The block's name should be identical to the name of the IR block that this machine block is based on. Block References ^^^^^^^^^^^^^^^^ The machine basic blocks are identified by their ID numbers. Individual blocks are referenced using the following syntax: .. code-block:: llvm %bb.[.] Examples: .. code-block:: llvm %bb.0 %bb.1.then Successors ^^^^^^^^^^ The machine basic block's successors have to be specified before any of the instructions: .. code-block:: llvm bb.0.entry: successors: %bb.1.then, %bb.2.else bb.1.then: bb.2.else: The branch weights can be specified in brackets after the successor blocks. The example below defines a block that has two successors with branch weights of 32 and 16: .. code-block:: llvm bb.0.entry: successors: %bb.1.then(32), %bb.2.else(16) Live In Registers ^^^^^^^^^^^^^^^^^ The machine basic block's live in registers have to be specified before any of the instructions: .. code-block:: llvm bb.0.entry: liveins: %edi, %esi The list of live in registers and successors can be empty. The language also allows multiple live in register and successor lists - they are combined into one list by the parser. Miscellaneous Attributes ^^^^^^^^^^^^^^^^^^^^^^^^ The attributes ``IsAddressTaken``, ``IsLandingPad`` and ``Alignment`` can be specified in brackets after the block's definition: .. code-block:: llvm bb.0.entry (address-taken): bb.2.else (align 4): bb.3(landing-pad, align 4): .. TODO: Describe the way the reference to an unnamed LLVM IR block can be preserved. .. TODO: Describe the parsers default behaviour when optional YAML attributes are missing. .. TODO: Describe the syntax of the machine instructions. .. TODO: Describe the syntax of the immediate machine operands. .. TODO: Describe the syntax of the register machine operands. .. TODO: Describe the syntax of the virtual register operands and their YAML definitions. .. TODO: Describe the syntax of the register operand flags and the subregisters. .. TODO: Describe the machine function's YAML flag attributes. .. TODO: Describe the syntax for the global value, external symbol and register mask machine operands. .. TODO: Describe the frame information YAML mapping. .. TODO: Describe the syntax of the stack object machine operands and their YAML definitions. .. TODO: Describe the syntax of the constant pool machine operands and their YAML definitions. .. TODO: Describe the syntax of the jump table machine operands and their YAML definitions. .. TODO: Describe the syntax of the block address machine operands. .. TODO: Describe the syntax of the CFI index machine operands. .. TODO: Describe the syntax of the metadata machine operands, and the instructions debug location attribute. .. TODO: Describe the syntax of the target index machine operands. .. TODO: Describe the syntax of the register live out machine operands. .. TODO: Describe the syntax of the machine memory operands.