summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorNick Kledzik <kledzik@apple.com>2012-06-12 22:43:35 +0000
committerNick Kledzik <kledzik@apple.com>2012-06-12 22:43:35 +0000
commitb47d6ca3a5dfd874029625eed018fffcad2bb28c (patch)
tree1bff39f8745c3a2376a9de4afd487c7b418d8d47
parent79c39da1358feb547690915480aa5cd14ca51aba (diff)
downloadbcm5719-llvm-b47d6ca3a5dfd874029625eed018fffcad2bb28c.tar.gz
bcm5719-llvm-b47d6ca3a5dfd874029625eed018fffcad2bb28c.zip
Wrote initial doc on how to create a Reader
llvm-svn: 158374
-rw-r--r--lld/docs/Readers.rst163
-rw-r--r--lld/docs/development.rst10
2 files changed, 172 insertions, 1 deletions
diff --git a/lld/docs/Readers.rst b/lld/docs/Readers.rst
new file mode 100644
index 00000000000..a40a23c62b7
--- /dev/null
+++ b/lld/docs/Readers.rst
@@ -0,0 +1,163 @@
+.. _Readers:
+
+Developing lld Readers
+======================
+
+Introduction
+------------
+
+One goal of lld is to be file format independent. This is done
+through a plug-in model for reading object files. The lld::Reader is the base
+class for all object file readers. A Reader follows the factory method pattern.
+A Reader instantiates an lld::File object (which is a graph of Atoms) from a
+given object file (on disk or in-memory).
+
+Every Reader subclass defines its own "options" class (for instance the mach-o
+Reader defines the class ReaderOptionsMachO). This options class is the
+one-and-only way to control how the Reader operates when parsing an input file
+into an Atom graph. For instance, you may want the Reader to only accept
+certain architectures. The options class can be instantiated from command
+line options, or it can be subclassed and the ivars programmatically set.
+
+
+Where to start
+--------------
+
+The lld project already has a skeleton of source code for Readers of ELF, COFF,
+mach-o, and the lld native object file format. If your file format is a
+variant of one of those, you should modify the existing Reader to support
+your variant. This is done by adding new ivar(s) to the Options class for that
+Reader which specifies which file format variant to expect. And then modifying
+the Reader to check those ivars and respond parse the object file accordingly.
+
+If your object file format is not a variant of any existing Reader, you'll need
+to create a new Reader subclass. If your file format is called "Foo", you'll
+need to create these files::
+
+ ./include/lld/ReaderWriter/ReaderFoo.h
+ ./lib/ReaderWriter/Foo/ReaderFoo.cpp
+
+The public interface for you reader is just the ReaderOptions subclass
+(e.g. ReaderOptionsFoo) and the function to create a Reader given the options::
+
+ Reader* createReaderFoo(const ReaderOptionsFoo &options);
+
+In the implementation, you can define a ReaderFoo class, but that class is
+private to your ReaderWriter directory.
+
+
+Readers are factories
+---------------------
+
+The linker will usually only instantiate your Reader once. That one Reader will
+have its parseFile() method called many times with different input files.
+To support a multithreaded linking, the Reader may be parsing multiple input
+files in parallel. Therefore, there should be no parsing state in you Reader
+object. Any parsing state should be in ivars of your File subclass or in
+some temporary object.
+
+The key method to implement in a reader is::
+
+ virtual error_code parseFile(std::unique_ptr<MemoryBuffer> mb,
+ std::vector<std::unique_ptr<File>> &result);
+
+It takes a memory buffer (which contains the contents of the object file
+being read) and returns an instantiated lld::File object which is
+a collection of Atoms. The result is a vector of File pointers (instead of
+simple a File pointer) because some file formats allow multiple object
+"files" to be encoded in one file system file.
+
+
+Memory Ownership
+----------------
+
+If parseFile() is successful, it either passes ownership of the MemoryBuffer
+to the File object, or it deletes the MemoryBuffer. The former is done if the
+Atoms contain pointers into the MemoryBuffer (e.g. StringRefs for symbols
+or ArrayRefs for section content). If parseFile() fails, the MemoryBuffer
+must be deleted by the Reader.
+
+Atoms objects are always owned by their File object. During core linking
+when Atoms are coalesced or dead stripped away, core linking does not delete
+those Atoms. Core linking just removes those unused Atoms from its internal
+list. The destructor of a File object is responsible for deleting all Atoms
+it owns, and if ownership of the MemoryBuffer was passed to it, the File
+destructor needs to delete that too.
+
+
+Making Atoms
+------------
+
+The internal model of lld is purely Atom based. But most object files do not
+have an explicit concept of Atoms, instead most have "sections". The way
+to think of this, is that a section is just list of Atoms with common
+attributes.
+
+The first step in parsing section based object files is to cleave each
+section into a list of Atoms. The technique may vary by section type. For
+code sections (e.g. .text), there are usually symbols at the start of each
+function. Those symbol address are the points at which the section is cleaved
+into discrete Atoms. Some file formats (like ELF) also include the
+length of each symbol in the symbol table. Otherwise, the length of each
+Atom is calculated to run to the start of the next symbol or the end of the
+section.
+
+Other sections types can be implicitly cleaved. For instance c-string literals
+or unwind info (e.g. .eh_frame) can be cleaved by having the Reader look at
+the content of the section. It is important to cleave sections into Atoms
+to remove false dependencies. For instance the .eh_frame section often
+has no symbols, but contains "pointers" to the functions for which it
+has unwind info. If the .eh_frame section was not cleaved (but left as one
+big Atom), there would always be a reference (from the eh_frame Atom) to
+each function. So the linker would be unable to coalesce or dead stripped
+away the function atoms.
+
+The lld Atom model also requires that a reference to an undefined symbol be
+modeled as a Reference to an UndefinedAtom. So the Reader also needs to
+create an UndefinedAtom for each undefined symbol in the object file.
+
+Once all Atoms have been created, the second step is to create References
+(recall that Atoms are "nodes" and References are "edges"). Most References
+are created by looking at the "relocation records" in the object file. If
+a function contains a call to "malloc", there is usually a relocation record
+specifying the address in the section and the symbol table index. Your
+Reader will need to convert the address to an Atom and offset and the symbol
+table index into a target Atom. If "malloc" is not defined in the object file,
+the target Atom of the Reference will be an UndefinedAtom.
+
+
+Performance
+-----------
+Once you have the above working to parse an object file into Atoms and
+References, you'll want to look at performance. Some techniques that can
+help performance are:
+
+* Use llvm::BumpPtrAllocator or pre-allocate one big vector<Reference> and then
+ just have each atom point to its subrange of References in that vector.
+ This can be faster that allocating each Reference as separate object.
+* Pre-scan the symbol table and determine how many atoms are in each section
+ then allocate space for all the Atom objects at once.
+* Don't copy symbol names or section content to each Atom, instead use
+ StringRef and ArrayRef in each Atom to point to its name and content in the
+ MemoryBuffer.
+
+
+Testing
+-------
+
+We are still working on infrastructure to test Readers. The issue is that
+you don't want to check in binary files to the test suite. And the tools
+for creating your object file from assembly source may not be available on
+every OS.
+
+We are investigating a way to use yaml to describe the section, symbols,
+and content of a file. Then have some code which will write out an object
+file from that yaml description.
+
+Once that is in place, you can write test cases that contain section/symbols
+yaml and is run through the linker to produce Atom/References based yaml which
+is then run through FileCheck to verify the Atoms and References are as
+expected.
+
+
+
diff --git a/lld/docs/development.rst b/lld/docs/development.rst
index c24e83b64a4..c32f8c9e357 100644
--- a/lld/docs/development.rst
+++ b/lld/docs/development.rst
@@ -5,7 +5,15 @@ Development
lld is developed as part of the `LLVM <http://llvm.org>`_ project.
-See the :ref:`getting started <getting_started>` guide.
+Creating a Reader
+-----------------
+
+See the :ref:`Creating a Reader <Readers>` guide.
+
+
+
+Documentation
+-------------
The project documentation is written in reStructuredText and generated using the
`Sphinx <http://sphinx.pocoo.org/>`_ documentation generator. For more
OpenPOWER on IntegriCloud