diff options
Diffstat (limited to 'lld/docs')
-rw-r--r-- | lld/docs/AtomLLD.rst | 61 | ||||
-rw-r--r-- | lld/docs/NewLLD.rst | 309 | ||||
-rw-r--r-- | lld/docs/index.rst | 73 |
3 files changed, 374 insertions, 69 deletions
diff --git a/lld/docs/AtomLLD.rst b/lld/docs/AtomLLD.rst new file mode 100644 index 00000000000..4d36ac60a7d --- /dev/null +++ b/lld/docs/AtomLLD.rst @@ -0,0 +1,61 @@ +ATOM-based lld +============== + +ATOM-based lld is a new set of modular code for creating linker tools. +Currently it supports Mach-O. + +* End-User Features: + + * Compatible with existing linker options + * Reads standard Object Files + * Writes standard Executable Files + * Remove clang's reliance on "the system linker" + * Uses the LLVM `"UIUC" BSD-Style license`__. + +* Applications: + + * Modular design + * Support cross linking + * Easy to add new CPU support + * Can be built as static tool or library + +* Design and Implementation: + + * Extensive unit tests + * Internal linker model can be dumped/read to textual format + * Additional linking features can be plugged in as "passes" + * OS specific and CPU specific code factored out + +Why a new linker? +----------------- + +The fact that clang relies on whatever linker tool you happen to have installed +means that clang has been very conservative adopting features which require a +recent linker. + +In the same way that the MC layer of LLVM has removed clang's reliance on the +system assembler tool, the lld project will remove clang's reliance on the +system linker tool. + + +Contents +-------- + +.. toctree:: + :maxdepth: 2 + + design + getting_started + ReleaseNotes + development + windows_support + open_projects + sphinx_intro + +Indices and tables +------------------ + +* :ref:`genindex` +* :ref:`search` + +__ http://llvm.org/docs/DeveloperPolicy.html#license diff --git a/lld/docs/NewLLD.rst b/lld/docs/NewLLD.rst new file mode 100644 index 00000000000..891160cded5 --- /dev/null +++ b/lld/docs/NewLLD.rst @@ -0,0 +1,309 @@ +The ELF and COFF Linkers +======================== + +We started rewriting the ELF (Unix) and COFF (Windows) linkers in May 2015. +Since then, we have been making a steady progress towards providing +drop-in replacements for the system linkers. + +Currently, the Windows support is mostly complete and is about 2x faster +than the linker that comes as a part of Micrsoft Visual Studio toolchain. + +The ELF support is in progress and is able to link large programs +such as Clang or LLD itself. Unless your program depends on linker scripts, +you can expect it to be linkable with LLD. +It is currently about 1.2x to 2x faster than GNU gold linker. +We aim to make it a drop-in replacement for the GNU linker. + +We expect that FreeBSD is going to be the first large system +to adopt LLD as the system linker. +We are working on it in collaboration with the FreeBSD project. + +The linkers are notably small; as of March 2016, +the COFF linker is under 7k LOC and the ELF linker is about 10k LOC. + +The linkers are designed to be as fast and simple as possible. +Because it is simple, it is easy to extend it to support new features. +There a few key design choices that we made to achieve these goals. +We will describe them in this document. + +The ELF Linker as a Library +--------------------------- + +You can embed LLD to your program by linking against it and calling the linker's +entry point function lld::elf::link. + +The current policy is that it is your reponsibility to give trustworthy object +files. The function is guaranteed to return as long as you do not pass corrupted +or malicious object files. A corrupted file could cause a fatal error or SEGV. +That being said, you don't need to worry too much about it if you create object +files in the usual way and give them to the linker. It is naturally expected to +work, or otherwise it's a linker's bug. + +Design +====== + +We will describe the design of the linkers in the rest of the document. + +Key Concepts +------------ + +Linkers are fairly large pieces of software. +There are many design choices you have to make to create a complete linker. + +This is a list of design choices we've made for ELF and COFF LLD. +We believe that these high-level design choices achieved a right balance +between speed, simplicity and extensibility. + +* Implement as native linkers + + We implemented the linkers as native linkers for each file format. + + The two linkers share the same design but do not share code. + Sharing code makes sense if the benefit is worth its cost. + In our case, ELF and COFF are different enough that we thought the layer to + abstract the differences wouldn't worth its complexity and run-time cost. + Elimination of the abstract layer has greatly simplified the implementation. + +* Speed by design + + One of the most important thing in archiving high performance is to + do less rather than do it efficiently. + Therefore, the high-level design matters more than local optimizations. + Since we are trying to create a high-performance linker, + it is very important to keep the design as efficient as possible. + + Broadly speaking, we do not do anything until we have to do it. + For example, we do not read section contents or relocations + until we need them to continue linking. + When we need to do some costly operation (such as looking up + a hash table for each symbol), we do it only once. + We obtain a handler (which is typically just a pointer to actual data) + on the first operation and use it throughout the process. + +* Efficient archive file handling + + LLD's handling of archive files (the files with ".a" file extension) is different + from the traditional Unix linkers and pretty similar to Windows linkers. + We'll describe how the traditional Unix linker handles archive files, + what the problem is, and how LLD approached the problem. + + The traditional Unix linker maintains a set of undefined symbols during linking. + The linker visits each file in the order as they appeared in the command line + until the set becomes empty. What the linker would do depends on file type. + + - If the linker visits an object file, the linker links object files to the result, + and undefined symbols in the object file are added to the set. + + - If the linker visits an archive file, it checks for the archive file's symbol table + and extracts all object files that have definitions for any symbols in the set. + + This algorithm sometimes leads to a counter-intuitive behavior. + If you give archive files before object files, nothing will happen + because when the linker visits archives, there is no undefined symbols in the set. + As a result, no files are extracted from the first archive file, + and the link is done at that point because the set is empty after it visits one file. + + You can fix the problem by reordering the files, + but that cannot fix the issue of mutually-dependent archive files. + + Linking mutually-dependent archive files is tricky. + You may specify the same archive file multiple times to + let the linker visit it more than once. + Or, you may use the special command line options, `-(` and `-)`, + to let the linker loop over the files between the options until + no new symbols are added to the set. + + Visiting the same archive files multiple makes the linker slower. + + Here is how LLD approached the problem. Instead of memorizing only undefined symbols, + we program LLD so that it memorizes all symbols. + When it sees an undefined symbol that can be resolved by extracting an object file + from an archive file it previously visited, it immediately extracts the file and link it. + It is doable because LLD does not forget symbols it have seen in archive files. + + We believe that the LLD's way is efficient and easy to justify. + + The semantics of LLD's archive handling is different from the traditional Unix's. + You can observe it if you carefully craft archive files to exploit it. + However, in reality, we don't know any program that cannot link + with our algorithm so far, so we are not too worried about the incompatibility. + +Important Data Strcutures +------------------------- + +We will describe the key data structures in LLD in this section. +The linker can be understood as the interactions between them. +Once you understand their functions, the code of the linker should look obvious to you. + +* SymbolBody + + SymbolBody is a class to represent symbols. + They are created for symbols in object files or archive files. + The linker creates linker-defined symbols as well. + + There are basically three types of SymbolBodies: Defined, Undefined, or Lazy. + + - Defined symbols are for all symbols that are considered as "resolved", + including real defined symbols, COMDAT symbols, common symbols, + absolute symbols, linker-created symbols, etc. + - Undefined symbols represent undefined symbols, which need to be replaced by + Defined symbols by the resolver until the link is complete. + - Lazy symbols represent symbols we found in archive file headers + which can turn into Defined if we read archieve members. + +* Symbol + + Symbol is a pointer to a SymbolBody. There's only one Symbol for + each unique symbol name (this uniqueness is guaranteed by the symbol table). + Because SymbolBodies are created for each file independently, + there can be many SymbolBodies for the same name. + Thus, the relationship between Symbols and SymbolBodies is 1:N. + You can think of Symbols as handles for SymbolBodies. + + The resolver keeps the Symbol's pointer to always point to the "best" SymbolBody. + Pointer mutation is the resolve operation of this linker. + + SymbolBodies have pointers to their Symbols. + That means you can always find the best SymbolBody from + any SymbolBody by following pointers twice. + This structure makes it very easy and cheap to find replacements for symbols. + For example, if you have an Undefined SymbolBody, you can find a Defined + SymbolBody for that symbol just by going to its Symbol and then to SymbolBody, + assuming the resolver have successfully resolved all undefined symbols. + +* SymbolTable + + SymbolTable is basically a hash table from strings to Symbols + with a logic to resolve symbol conflicts. It resolves conflicts by symbol type. + + - If we add Undefined and Defined symbols, the symbol table will keep the latter. + - If we add Defined and Lazy symbols, it will keep the former. + - If we add Lazy and Undefined, it will keep the former, + but it will also trigger the Lazy symbol to load the archive member + to actually resolve the symbol. + +* Chunk (COFF specific) + + Chunk represents a chunk of data that will occupy space in an output. + Each regular section becomes a chunk. + Chunks created for common or BSS symbols are not backed by sections. + The linker may create chunks to append additional data to an output as well. + + Chunks know about their size, how to copy their data to mmap'ed outputs, + and how to apply relocations to them. + Specifically, section-based chunks know how to read relocation tables + and how to apply them. + +* InputSection (ELF specific) + + Since we have less synthesized data for ELF, we don't abstract slices of + input files as Chunks for ELF. Instead, we directly use the input section + as an internal data type. + + InputSection knows about their size and how to copy themselves to + mmap'ed outputs, just like COFF Chunks. + +* OutputSection + + OutputSection is a container of InputSections (ELF) or Chunks (COFF). + An InputSection or Chunk belongs to at most one OutputSection. + +There are mainly three actors in this linker. + +* InputFile + + InputFile is a superclass of file readers. + We have a different subclass for each input file type, + such as regular object file, archive file, etc. + They are responsible for creating and owning SymbolBodies and + InputSections/Chunks. + +* Writer + + The writer is responsible for writing file headers and InputSections/Chunks to a file. + It creates OutputSections, put all InputSections/Chunks into them, + assign unique, non-overlapping addresses and file offsets to them, + and then write them down to a file. + +* Driver + + The linking process is drived by the driver. The driver + + - processes command line options, + - creates a symbol table, + - creates an InputFile for each input file and put all symbols in it into the symbol table, + - checks if there's no remaining undefined symbols, + - creates a writer, + - and passes the symbol table to the writer to write the result to a file. + +Link-Time Optimization +---------------------- + +LTO is implemented by handling LLVM bitcode files as object files. +The linker resolves symbols in bitcode files normally. If all symbols +are successfully resolved, it then calls an LLVM libLTO function +with all bitcode files to convert them to one big regular ELF/COFF file. +Finally, the linker replaces bitcode symbols with ELF/COFF symbols, +so that we link the input files as if they were in the native +format from the beginning. + +The details are described in this document. +http://llvm.org/docs/LinkTimeOptimization.html + +Glossary +-------- + +* RVA (COFF) + + Short for Relative Virtual Address. + + Windows executables or DLLs are not position-independent; they are + linked against a fixed address called an image base. RVAs are + offsets from an image base. + + Default image bases are 0x140000000 for executables and 0x18000000 + for DLLs. For example, when we are creating an executable, we assume + that the executable will be loaded at address 0x140000000 by the + loader, so we apply relocations accordingly. Result texts and data + will contain raw absolute addresses. + +* VA + + Short for Virtual Address. For COFF, it is equivalent to RVA + image base. + +* Base relocations (COFF) + + Relocation information for the loader. If the loader decides to map + an executable or a DLL to a different address than their image + bases, it fixes up binaries using information contained in the base + relocation table. A base relocation table consists of a list of + locations containing addresses. The loader adds a difference between + RVA and actual load address to all locations listed there. + + Note that this run-time relocation mechanism is much simpler than ELF. + There's no PLT or GOT. Images are relocated as a whole just + by shifting entire images in memory by some offsets. Although doing + this breaks text sharing, I think this mechanism is not actually bad + on today's computers. + +* ICF + + Short for Identical COMDAT Folding (COFF) or Identical Code Folding (ELF). + + ICF is an optimization to reduce output size by merging read-only sections + by not only their names but by their contents. If two read-only sections + happen to have the same metadata, actual contents and relocations, + they are merged by ICF. It is known as an effective technique, + and it usually reduces C++ program's size by a few percent or more. + + Note that this is not entirely sound optimization. C/C++ require + different functions have different addresses. If a program depends on + that property, it would fail at runtime. + + On Windows, that's not really an issue because MSVC link.exe enabled + the optimization by default. As long as your program works + with the linker's default settings, your program should be safe with ICF. + + On Unix, your program is generally not guaranteed to be safe with ICF, + although large programs happen to work correctly. + LLD works fine with ICF for example. diff --git a/lld/docs/index.rst b/lld/docs/index.rst index f109610e4d3..d019c4f9fd8 100644 --- a/lld/docs/index.rst +++ b/lld/docs/index.rst @@ -4,55 +4,12 @@ lld - The LLVM Linker ===================== lld contains two linkers whose architectures are different from each other. -One is a linker that implements native features directly. -They are in `COFF` or `ELF` directories. Other directories contains the other -implementation that is designed to be a set of modular code for creating -linker tools. This document covers mainly the latter. -For the former, please read README.md in `COFF` directory. -* End-User Features: - - * Compatible with existing linker options - * Reads standard Object Files (e.g. ELF, Mach-O, PE/COFF) - * Writes standard Executable Files (e.g. ELF, Mach-O, PE) - * Remove clang's reliance on "the system linker" - * Uses the LLVM `"UIUC" BSD-Style license`__. - -* Applications: - - * Modular design - * Support cross linking - * Easy to add new CPU support - * Can be built as static tool or library - -* Design and Implementation: - - * Extensive unit tests - * Internal linker model can be dumped/read to textual format - * Additional linking features can be plugged in as "passes" - * OS specific and CPU specific code factored out - -Why a new linker? ------------------ - -The fact that clang relies on whatever linker tool you happen to have installed -means that clang has been very conservative adopting features which require a -recent linker. - -In the same way that the MC layer of LLVM has removed clang's reliance on the -system assembler tool, the lld project will remove clang's reliance on the -system linker tool. - - -Current Status --------------- - -lld can self host on x86-64 FreeBSD and Linux and x86 Windows. - -All SingleSource tests in test-suite pass on x86-64 Linux. +.. toctree:: + :maxdepth: 1 -All SingleSource and MultiSource tests in the LLVM test-suite -pass on MIPS 32-bit little-endian Linux. + NewLLD + AtomLLD Source ------ @@ -66,25 +23,3 @@ lld is also available via the read-only git mirror:: git clone http://llvm.org/git/lld.git Put it in llvm's tools/ directory, rerun cmake, then build target lld. - -Contents --------- - -.. toctree:: - :maxdepth: 2 - - design - getting_started - ReleaseNotes - development - windows_support - open_projects - sphinx_intro - -Indices and tables ------------------- - -* :ref:`genindex` -* :ref:`search` - -__ http://llvm.org/docs/DeveloperPolicy.html#license |