5 files changed, 503 insertions, 6 deletions
diff --git a/lld/docs/_templates/index.html b/lld/docs/_templates/index.html
index eda2b1d36f4..15744f5774f 100644
--- a/lld/docs/_templates/index.html
+++ b/lld/docs/_templates/index.html
@@ -1,23 +1,56 @@
 {% extends "layout.html" %}
 {% set title = 'lld' %}
 {% block body %}
-<h1>lld - The LLVM Linker</h1>
+<h1>lld: a linker for LLVM</h1>
 
 <p>
-  lld is LLVM's linker.
+  lld is a new set of moduler code for creating linker tools.
 </p>
 
+<h2 id="goals">Features and Goals</h2>
+<p><b>End-User Features:</b></p>
+<ul>
+  <li>Compatible with existing linker options</li>
+  <li>Reads standard Object Files (e.g. ELF, mach-o, PE/COFF)</li>
+  <li>Writes standard Executable Files (e.g. ELF, mach-o, PE)</li>
+  <li>Fast link times</li>
+  <li>Minimal memory use</li>
+  <li>Remove clang's reliance on "the system linker"</li>
+  <li>Uses the LLVM 'BSD' License</li>
+</ul>
+
+<p><b>Applications:</b></p>
+<ul>
+  <li>Modular design</li>
+  <li>Support cross linking</li>
+  <li>Easy to add new CPU support</li>
+  <li>Can be built as static tool or library</li>
+</ul>
+
+<p><b>Design and Implementation:</b></p>
+<ul>
+  <li>Extensive unit tests</li>
+  <li>Internal linker model can be dumped/read to textual format</li>
+  <li>Internal linker model can be dumped/read to new native format</li>
+  <li>Native format designed to be fast to read and write</li>
+  <li>Additional linking features can be plugged in as "passes"</li>
+  <li>OS specific and CPU specific code factored out</li>
+</ul>
+
+For more information, see the <a href="{{pathto('intro')}}">introduction</a>
+available as part of the <i>lld</i> documentation below.
+
 <h2>Documentation</h2>
 <table class="contentstable" align="center" style="margin-left: 30px">
   <tr>
     <td width="50%">
-      <p class="biglink"><a class="biglink" href="{{ pathto("contents") }}">
+      <p class="biglink"><a class="biglink" href="{{ pathto('contents') }}">
           Contents</a><br/>
         <span class="linkdescr">for a complete overview</span></p>
-      <p class="biglink"><a class="biglink" href="{{ pathto("search") }}">
+      <p class="biglink"><a class="biglink" href="{{ pathto('search') }}">
           Search page</a><br/>
         <span class="linkdescr">search the documentation</span></p>
-      <p class="biglink"><a class="biglink" href="{{ pathto("genindex") }}">
+      <p class="biglink"><a class="biglink" href="{{ pathto('genindex') }}">
           General Index</a><br/>
         <span class="linkdescr">all functions, classes, terms</span></p>
   </td></tr>
diff --git a/lld/docs/contents.rst b/lld/docs/contents.rst
index 13806e8b37c..fa998dad8fb 100644
--- a/lld/docs/contents.rst
+++ b/lld/docs/contents.rst
@@ -7,6 +7,7 @@ Contents
    :maxdepth: 2
 
    intro
+   design
 
 Indices and tables
 ==================
diff --git a/lld/docs/design.rst b/lld/docs/design.rst
new file mode 100644
index 00000000000..22e33f9fc48
--- /dev/null
+++ b/lld/docs/design.rst
@@ -0,0 +1,405 @@
+.. _design:
+
+Linker Design
+=============
+
+Introduction
+------------
+
+lld is a new generation of linker.  It is not "section" based like traditional
+linkers which mostly just interlace sections from multiple object files into the
+output file.  Instead, lld is based on "Atoms".  Traditional section based
+linking work well for simple linking, but their model makes advanced linking
+features difficult to implement.  Features like dead code stripping, reordering
+functions for locality, and C++ coalescing require the linker to work at a finer
+grain.
+
+An atom is an indivisible chunk of code or data.  An atom has a set of
+attributes, such as: name, scope, content-type, alignment, etc.  An atom also
+has a list of References.  A Reference contains: a kind, an optional offset, an
+optional addend, and an optional target atom.
+
+The Atom model allows the linker to use standard graph theory models for linking
+data structures.  Each atom is a node, and each Reference is an edge.  The
+feature of dead code stripping is implemented by following edges to mark all
+live atoms, and then delete the non-live atoms.
+
+
+Atom Model
+----------
+
+An atom is an indivisible chuck of code or data.  Typically each user written
+function or global variable is an atom.  In addition, the compiler may emit
+other atoms, such as for literal c-strings or floating point constants, or for
+runtime data structures like dwarf unwind info or pointers to initializers.
+
+A simple "hello world" object file would be modeled like this:
+
+.. image:: hello.png
+
+There are three atoms: main, a proxy for printf, and an anonymous atom
+containing the c-string literal "hello world".  The Atom "main" has two
+references. One is the call site for the call to printf, and the other is a
+refernce for the instruction that loads the address of the c-string literal.
+
+File Model
+----------
+
+The linker views the input files as basically containers of Atoms and
+References, and just a few attributes of their own.  The linker works with three
+kinds of files: object files, static libraries, and dynamic shared libraries.
+Each kind of file has reader object which presents the file in the model
+expected by the linker.
+
+Object File
+~~~~~~~~~~~
+
+An object file is just a container of atoms.  When linking an object file, a
+reader is instantiated which parses the object file and instantiates a set of
+atoms representing all content in the .o file.  The linker adds all those atoms
+to a master graph.
+
+Static Library (Archive)
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+This is the traditional unix static archive which is just a collection of object
+files with a "table of contents". When linking with a static library, by default
+nothing is added to the master graph of atoms. Instead, if after merging all
+atoms from object files into a master graph, if any "undefined" atoms are left
+remaining in the master graph, the linker reads the table of contents for each
+static library to see if any have the needed definitions. If so, the set of
+atoms from the specified object file in the static library is added to the
+master graph of atoms.
+
+Dynamic Library (Shared Object)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Dynamic libraries are different than object files and static libraries in that
+they don't directly add any content.  Their purpose is to check at build time
+that the remaining undefined references can be resolved at runtime, and provide
+a list of dynamic libraries (SO_NEEDED) that will be needed at runtime.  The way
+this is modeled in the linker is that a dynamic library contributes no atoms to
+the initial graph of atoms.  Instead, (like static libraries) if there are
+"undefined" atoms in the master graph of all atoms, then each dynamic library is
+checked to see if exports the required symbol. If so, a "shared library" atom is
+instantiated by the by the reader which the linker uses to replace the
+"undefined" atom.
+
+Linking Steps
+-------------
+
+Through the use of abstract Atoms, the core of linking is architecture 
+independent and file format independent.  All command line parsing is factored
+out into a separate "options" abstraction which enables the linker to be driven
+with different command line sets.
+
+The overall steps in linking are:
+
+  #. Command line processing
+
+  #. Parsing input files
+
+  #. Resolving
+
+  #. Passes/Optimizations
+
+  #. Generate output file
+
+The Resolving and Passes steps are done purely on the master graph of atoms, so
+they have no notion of file formats such as mach-o or ELF.
+
+Resolving
+~~~~~~~~~
+
+The resolving step takes all the atoms graphs from each object file and combines
+them into one master object graph.  Unfortunately, it is not as simple as
+appending the atom list from each file into one big list.  There are many cases
+where atoms need to be coalesced.  That is, two or more atoms need to be
+coalesced into one atom.  This is necessary to support: C language "tentative
+definitions", C++ weak symbols for templates and inlines defined in headers,
+replacing undefined atoms with actual definition atoms, and for merging copies
+of constants like c-strings and floating point constants.
+
+The linker support coalescing by-name and by-content. By-name is used for
+tentative definitions and weak symbols.  By-content is used for constant data
+that can be merged.
+
+The resolving process maintains some global linking "state", including a "symbol
+table" which is a map from llvm::StringRef to lld::Atom*.  With these data
+structures, the linker iterates all atoms in all input files. F or each atom, it
+checks if the atom is named and has a global or hidden scope.  If so, the atom
+is added to the symbol table map.  If there already is a matching atom in that
+table, that means the current atom needs to be coalesced with the found atom, or
+it is a multiple definition error.
+
+When all initial input file atoms have been processed by the resolver, a scan is
+made to see if there are any undefined atoms in the graph.  If there are, the
+linker scans all libraries (both static and dynamic) looking for definitions to
+replace the undefined atoms.  It is an error if any undefined atoms are left
+remaining.
+
+Dead code stripping (if requested) is done at the end of resolving.  The linker
+does a simple mark-and-sweep. It starts with "root" atoms (like "main" in a main
+executable) and follows each references and marks each Atom that it visits as
+"live".  When done, all atoms not marked "live" are removed.
+
+The result of the Resolving phase is the creation of an lld::File object.  The
+goal is that the lld::File model is <b>the</b> internal representation
+throughout the linker. The file readers parse (mach-o, ELF, COFF) into an
+lld::File.  The file writers (mach-o, ELF, COFF) taken an lld::File and produce
+their file kind, and every Pass only operates on an lld::File.  This is not only
+a simpler, consistent model, but it enables the state of the linker to be dumped
+at any point in the link for testing purposes.
+
+
+Passes
+~~~~~~
+
+The Passes step is an open ended set of routines that each get a change to
+modify or enhance the current lld::File object. Some example Passes are:
+
+  * stub (PLT) generation
+
+  * GOT instantiation
+
+  * order_file optimization
+
+  * branch island generation
+
+  * branch shim generation
+
+  * Objective-C optimizations (Darwin specific)
+
+  * TLV instantiation (Darwin specific)
+
+  * dtrace probe processing (Darwin specific)
+
+  * compact unwind encoding (Darwin specific)
+
+
+Some of these passes are specific to Darwin's runtime environments.  But many of
+the passes are applicable to any OS (such as generating branch island for out of
+range branch instructions).
+
+The general structure of a pass is to iterate through the atoms in the current
+lld::File object, inspecting each atom and doing something.  For instance, the
+stub pass, looks for call sites to shared library atoms (e.g. call to printf).
+It then instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for
+each proxy atom needed, and these new atoms are added to the current lld::File
+object.  Next, all the noted call sites to shared library atoms have their
+References altered to point to the stub atom instead of the shared library atom.
+
+Generate Output File
+~~~~~~~~~~~~~~~~~~~~
+
+Once the passes are done, the output file writer is given current lld::File
+object.  The writer's job is to create the executable content file wrapper and
+place the content of the atoms into it.
+
+lld::File representations
+-------------------------
+
+Just as LLVM has three representations of its IR model, lld has three
+representations of its File/Atom/Reference model:
+
+ * In memory, abstract C++ classes (lld::Atom, lld::Reference, and lld::File).
+
+ * textual (in YAML)
+
+ * binary format ("native")
+
+Binary File Format
+~~~~~~~~~~~~~~~~~~
+
+In theory, lld::File objects could be written to disk in an existing Object File
+format standard (e.g. ELF).  Instead we choose to define a new binary file
+format. There are two main reasons for this: fidelity and performance.  In order
+for lld to work as a linker on all platforms, its internal model must be rich
+enough to model all CPU and OS linking features.  But if we choose an existing
+Object File format as the lld binary format, that means an on going need to
+retrofit each platform specific feature needed from alternate platforms into the
+existing Object File format.  Having our own "native" binary format side steps
+that issue.  We still need to be able to binary encode all the features, but
+once the in-memory model can represent the feature, it is straight forward to
+binary encode it.
+
+The reason to use a binary file format at all, instead of a textual file format,
+is speed.  You want the binary format to be as fast as possible to read into the
+in-memory model. Given that we control the in-memory model and the binary
+format, the obvious way to make reading super fast it to make the file format be
+basically just an array of atoms.  The reader just mmaps in the file and looks
+at the header to see how many atoms there are and instantiate that many atom
+objects with the atom attribute information coming from that array.  The trick
+is designing this in a way that can be extended as the Atom mode evolves and new
+attributes are added.
+
+The native object file format starts with a header that lists how many "chunks"
+are in the file.  A chunk is an array of "ivar data".  The native file reader
+instantiates an array of Atom objects (with one large malloc call).  Each atom
+contains just a pointer to its vtable and a pointer to its ivar data.  All
+methods on lld::Atom are virtual, so all the method implementations return
+values based on the ivar data to which it has a pointer.  If a new linking
+features is added which requires a change to the lld::Atom model, a new native
+reader class (e.g. version 2) is defined which knows how to read the new feature
+information from the new ivar data.  The old reader class (e.g. version 1) is
+updated to do its best to model (the lack of the new feature) given the old ivar
+data in existing native object files.
+
+With this model for the native file format, files can be read and turned
+into the in-memory graph of lld::Atoms with just a few memory allocations.  
+And the format can easily adapt over time to new features
+
+
+Textual representations in YAML
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In designing a textual format we want something easy for humans to read and easy
+for the linker to parse.  Since an atom has lots of attributes most of which are
+usually just the default, we should define default values for every attribute so
+that those can be omitted from the text representation.  Here is the atoms for a
+simple hello world program expressed in YAML::
+
+  target-triple:   x86_64-apple-darwin11
+  
+  atoms:
+      - name:    _main
+        scope:   global
+        type:    code
+        content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00,
+                   00, 00, 31, c0, 5d, c3 ]
+        fixups:
+        - offset: 07
+          kind:   pcrel32
+          target: 2
+        - offset: 0E
+          kind:   call32
+          target: _fprintf
+  
+      - type:    c-string
+        content: [ 73, 5A, 00 ]
+  
+  ...
+
+The biggest use for the textual format will be writing test cases.  Writing test
+cases in C is problematic because the compiler may vary its output over time for
+its own optimization reasons which my inadvertently disable or break the linker
+feature trying to be tested. By writing test cases in the linkers own textual
+format, we can exactly specify every attribute of every atom and thus target
+specific linker logic.
+
+Testing
+~~~~~~~
+
+The lld project contains a test suite which is being built up as new code is
+added to lld.  All new lld functionality should have a tests added to the test
+suite.  The test suite is `lit <http://llvm.org/cmds/lit.html/>`_ driven.  Each
+test is a text file with comments telling lit how to run the test and check the
+result To facilitate testing, the lld project builds a tool called lld-core.
+This tool reads a YAML file (default from stdin), parses it into one or more
+lld::File objects in memory and then feeds those lld::File objects to the
+resolver phase.  The output of the resolver is written as a native object file.
+It is then read back in using the native object file reader and then pass to the
+YAML writer.  This round-about path means that all three representations
+(in-memory, binary, and text) are exercised, and any new feature has to work in
+all the representations to pass the test.
+
+
+Resolver testing
+~~~~~~~~~~~~~~~~
+
+Basic testing is the "core linking" or resolving phase.  That is where the
+linker merges object files.  All test cases are written in YAML.  One feature of
+YAML is that it allows multiple "documents" to be encoding in one YAML stream.
+That means one text file can appear to the linker as multiple .o files - the
+normal case for the linker.
+
+Here is a simple example of a core linking test case. It checks that an
+undefined atom from one file will be replaced by a definition from another
+file::
+
+  # RUN: lld-core %s | FileCheck %s
+  
+  #
+  # Test that undefined atoms are replaced with defined atoms.
+  #
+  
+  ---
+  atoms:
+      - name:              foo
+        definition:        undefined
+  ---
+  atoms:
+      - name:              foo
+        scope:             global
+        type:              code
+  ...
+  
+  # CHECK:       name:       foo
+  # CHECK:       scope:      global
+  # CHECK:       type:       code
+  # CHECK-NOT:   name:       foo
+  # CHECK:       ...
+
+
+Passes testing
+~~~~~~~~~~~~~~
+
+Since Passes just operate on an lld::File object, the lld-core tool has the
+option to run a particular pass (after resolving).  Thus, you can write a YAML
+test case with carefully crafted input to exercise areas of a Pass and the check
+the resulting lld::File object as represented in YAML.
+
+
+Design Issues
+-------------
+
+There are a number of open issues in the design of lld.  The plan is to wait and
+make these design decisions when we need to.
+
+
+Debug Info
+~~~~~~~~~~
+
+Currently, the lld model says nothing about debug info.  But the most popular
+debug format is DWARF and there is some impedance mismatch with the lld model
+and DWARF.  In lld there are just Atoms and only Atoms that need to be in a
+special section at runtime have an associated section.  Also, Atoms do not have
+addresses.  The way DWARF is spec'ed different parts of DWARF are supposed to go
+into specially named sections and the DWARF references function code by address.
+
+CPU and OS specific functionality
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Currently, lld has an abstract "Platform" that deals with any CPU or OS specific
+differences in linking.  We just keep adding virtual methods to the base
+Platform class as we find linking areas that might need customization.  At some
+point we'll need to structure this better.
+
+
+File Attributes
+~~~~~~~~~~~~~~~
+
+Currently, lld::File just has a path and a way to iterate its atoms. We will
+need to add mores attributes on a File.  For example, some equivalent to the
+target triple.  There is also a number of cached or computed attributes that
+could make various Passes more efficient.  For instance, on Darwin there are a
+number of Objective-C optimizations that can be done by a Pass.  But it would
+improve the plain C case if the Objective-C optimization Pass did not have to
+scan all atoms looking for any Objective-C data structures.  This could be done
+if the lld::File object had an attribute that said if the file had any
+Objective-C data in it. The Resolving phase would then be required to "merge"
+that attribute as object files are added.
+
+
+Command Line Processing
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Eventually, we may want this linker to be able to be a drop in replacement
+linker for existing linker tools.  That means being able to handle command line
+arguments for different platforms (e.g. darwin or linux).  Currently, there is
+no command line processing code in lld. If clang winds up incorporating the lld
+libraries into the clang binary, lld may be able to punt this work because clang
+will be responsible for setting up the state for lld.
+
+
+
diff --git a/lld/docs/hello.png b/lld/docs/hello.png
new file mode 100644
index 00000000000..70df111f1ab
--- /dev/null
+++ b/lld/docs/hello.png
diff --git a/lld/docs/intro.rst b/lld/docs/intro.rst
index 8611189bf73..32daf126101 100644
--- a/lld/docs/intro.rst
+++ b/lld/docs/intro.rst
@@ -3,4 +3,62 @@
 Introduction
 ============
 
-*lld* is the LLVM linker.
+lld is a new set of modular code for creating linker tools.
+
+ * End-User Features:
+
+   * Compatible with existing linker options
+
+   * Reads standard Object Files (e.g. ELF, mach-o, PE/COFF)
+
+   * Writes standard Executable Files (e.g. ELF, mach-o, PE)
+
+   * Fast link times
+
+   * Minimal memory use
+
+   * Remove clang's reliance on "the system linker"
+
+   * Uses the LLVM 'BSD' License
+
+ * Applications:
+
+   * Modular design
+
+   * Support cross linking
+
+   * Easy to add new CPU support
+
+   * Can be built as static tool or library
+
+ * Design and Implementation:
+
+   * Extensive unit tests
+
+   * Internal linker model can be dumped/read to textual format
+
+   * Internal linker model can be dumped/read to new native format
+
+   * Native format designed to be fast to read and write
+
+   * Additional linking features can be plugged in as "passes"
+
+   * OS specific and CPU specific code factored out
+
+
+Why a new linker?
+-----------------
+
+The fact that clang relies on whatever linker tool you happen to have installed
+means that clang has been very conservative adopting features which require a
+recent linker.
+
+In the same way that the MC layer of LLVM has removed clang's reliance on the
+system assembler tool, the lld project will remove clang's reliance on the
+system linker tool.
+
+
+Current Status
+--------------
+
+lld is in its very early stages of development.