diff options
author | Lang Hames <lhames@gmail.com> | 2016-05-30 19:03:26 +0000 |
---|---|---|
committer | Lang Hames <lhames@gmail.com> | 2016-05-30 19:03:26 +0000 |
commit | db0551e3abb54f29d5955af57c81986c9dbae06d (patch) | |
tree | ce3fe5eed96a2f7a89502ca7fdcae5a1bf03ab48 /llvm/docs/tutorial | |
parent | 24da61058a8ed87ef8956a6f958dbeb54dca9919 (diff) | |
download | bcm5719-llvm-db0551e3abb54f29d5955af57c81986c9dbae06d.tar.gz bcm5719-llvm-db0551e3abb54f29d5955af57c81986c9dbae06d.zip |
[Kaleidoscope][BuildingAJIT] Finish off Chapter 1.
* Various tidy-up and streamlining of existing discussion.
* Describes findSymbol and removeModule.
Chapter 1 is now rough but essentially complete in terms of content.
Feedback, patches etc. very welcome.
llvm-svn: 271225
Diffstat (limited to 'llvm/docs/tutorial')
-rw-r--r-- | llvm/docs/tutorial/BuildingAJIT1.rst | 196 |
1 files changed, 105 insertions, 91 deletions
diff --git a/llvm/docs/tutorial/BuildingAJIT1.rst b/llvm/docs/tutorial/BuildingAJIT1.rst index 549432706aa..ebaea499bd7 100644 --- a/llvm/docs/tutorial/BuildingAJIT1.rst +++ b/llvm/docs/tutorial/BuildingAJIT1.rst @@ -5,10 +5,6 @@ Building a JIT: Starting out with KaleidoscopeJIT .. contents:: :local: -**This tutorial is under active development. It is incomplete and details may -change frequently.** Nonetheless we invite you to try it out as it stands, and -we welcome any feedback. - Chapter 1 Introduction ====================== @@ -141,24 +137,25 @@ usual include guards and #includes [2]_, we get to the definition of our class: Our class begins with four members: A TargetMachine, TM, which will be used to build our LLVM compiler instance; A DataLayout, DL, which will be used for -symbol mangling (more on that later), and two ORC *layers*: An -ObjectLinkingLayer, and an IRCompileLayer. The ObjectLinkingLayer is the -foundation of our JIT: it takes in-memory object files produced by a -compiler and links them on the fly to make them executable. This -JIT-on-top-of-a-linker design was introduced in MCJIT, where the linker was -hidden inside the MCJIT class itself. In ORC we expose the linker as a visible, -reusable component so that clients can access and configure it directly -if they need to. In this tutorial our ObjectLinkingLayer will just be used to -support the next layer in our stack: the IRCompileLayer, which will be -responsible for taking LLVM IR, compiling it, and passing the resulting -in-memory object files down to the object linking layer below. - -After our member variables comes typedef: ModuleHandle. This is the handle -type that will be returned from our JIT's addModule method, and which can be -used to remove a module again using the removeModule method. The IRCompileLayer -class already provides a convenient handle type -(IRCompileLayer::ModuleSetHandleT), so we will just provide a type-alias for -this. +symbol mangling (more on that later), and two ORC *layers*: an +ObjectLinkingLayer and a IRCompileLayer. We'll be talking more about layers in +the next chapter, but for now you can think of them as analogous to LLVM +Passes: they wrap up useful JIT utilities behind an easy to compose interace. +The first layer, ObjectLinkingLayer, is the foundation of our JIT: it takes +in-memory object files produced by a compiler and links them on the fly to make +them executable. This JIT-on-top-of-a-linker design was introduced in MCJIT, +however the linker was hidden inside the MCJIT class. In ORC we expose the +linker so that clients can access and configure it directly if they need to. In +this tutorial our ObjectLinkingLayer will just be used to support the next layer +in our stack: the IRCompileLayer, which will be responsible for taking LLVM IR, +compiling it, and passing the resulting in-memory object files down to the +object linking layer below. + +That's it for member variables, after that we have a single typedef: +ModuleHandle. This is the handle type that will be returned from our JIT's +addModule method, and can be passed to the removeModule method to remove a +module. The IRCompileLayer class already provides a convenient handle type +(IRCompileLayer::ModuleSetHandleT), so we just alias our ModuleHandle to this. .. code-block:: c++ @@ -176,8 +173,7 @@ the current process. Next we use our newly created TargetMachine to initialize DL, our DataLayout. Then we initialize our IRCompileLayer. Our IRCompile layer needs two things: (1) A reference to our object linking layer, and (2) a compiler instance to use to perform the actual compilation from IR to object -files. We use the off-the-shelf SimpleCompiler instance for now, but in later -chapters we will substitute our own configurable compiler classes. Finally, in +files. We use the off-the-shelf SimpleCompiler instance for now. Finally, in the body of the constructor, we call the DynamicLibrary::LoadLibraryPermanently method with a nullptr argument. Normally the LoadLibraryPermanently method is called with the path of a dynamic library to load, but when passed a null @@ -215,68 +211,62 @@ available for execution. std::move(Resolver)); } -Now we come to the first of our central JIT API methods: addModule. This method -is responsible for adding IR to the JIT and making it available for execution. -In this initial implementation of our JIT we will make our modules "available -for execution" by compiling them immediately as they are added to the JIT. In -later chapters we will teach our JIT to be lazier and instead add the Modules -to a "pending" list to be compiled if and when they are first executed. - -To add our module to the IRCompileLayer we need to supply two auxiliary -objects: a memory manager and a symbol resolver. The memory manager will be -responsible for managing the memory allocated to JIT'd machine code, applying -memory protection permissions, and registering JIT'd exception handling tables -(if the JIT'd code uses exceptions). In our simple use-case we can just supply -an off-the-shelf SectionMemoryManager instance. The memory, exception handling -tables, etc. will be released when we remove the module from the JIT again -(using removeModule) or, if removeModule is never called, when the JIT class -itself is destructed. - -The second auxiliary class, the symbol resolver, is more interesting for us. It -exists to tell the JIT where to look when it encounters an *external symbol* in -the module we are adding. External symbols are any symbol not defined within the -module itself, including calls to functions outside the JIT and calls to -functions defined in other modules that have already been added to the JIT. It -may seem as though modules added to the JIT should "know about one another" by -default, but since we would still have to supply a symbol resolver for -references to code outside the JIT it turns out to re-use this one mechanism -for all symbol resolution. This has the added benefit that the user has full -control over the symbol resolution process. Should we search for definitions -within the JIT first, then fall back on external definitions? Or should we -prefer external definitions where available and only JIT code if we don't -already have an available implementation? By using a single symbol resolution -scheme we are free to choose whatever makes the most sense for any given use -case. - -Building a symbol resolver is made especially easy by the -*createLambdaResolver* function. This function takes two lambdas (actually -they don't have to be lambdas, any object with a call operator will do) and -returns a RuntimeDyld::SymbolResolver instance. The first lambda is used as -the implementation of the resolver's findSymbolInLogicalDylib method. This -method searches for symbol definitions that should be thought of as being part -of the same "logical" dynamic library as this Module. If you are familiar with -static linking: this means that findSymbolInLogicalDylib should expose symbols -with common linkage and hidden visibility. If all this sounds foreign you can -ignore the details and just remember that this is the first method that the -linker will use to try to find a symbol definition. If the -findSymbolInLogicalDylib method returns a null result then the linker will -call the second symbol resolver method, called findSymbol. This searches for -symbols that should be thought of as external to (but visibile from) the module -and its logical dylib. - -In this tutorial we will use the following simple breakdown: All modules added -to the JIT will behave as if they were linked into a single, ever-growing -logical dylib. To implement this our first lambda (the one defining -findSymbolInLogicalDylib) will just search for JIT'd code by calling the -CompileLayer's findSymbol method. If we don't find a symbol in the JIT itself -we'll fall back to our second lambda, which implements findSymbol. This will -use the RTDyldMemoyrManager::getSymbolAddressInProcess method to search for -the symbol within the program itself. If we can't find a symbol definition -via either of these paths the JIT will refuse to accept our moudle, returning -a "symbol not found" error. +Now we come to the first of our JIT API methods: addModule. This method is +responsible for adding IR to the JIT and making it available for execution. In +this initial implementation of our JIT we will make our modules "available for +execution" by adding them straight to the IRCompileLayer, which will +immediately compile them. In later chapters we will teach our JIT to be lazier +and instead add the Modules to a "pending" list to be compiled if and when they +are first executed. + +To add our module to the IRCompileLayer we need to supply two auxiliary objects +(as well as the module itself): a memory manager and a symbol resolver. The +memory manager will be responsible for managing the memory allocated to JIT'd +machine code, setting memory permissions, and registering exception handling +tables (if the JIT'd code uses exceptions). For our memory manager we will use +the SectionMemoryManager class: another off-the-shelf utility that provides all +the basic functionality we need. The second auxiliary class, the symbol +resolver, is more interesting for us. It exists to tell the JIT where to look +when it encounters an *external symbol* in the module we are adding. External +symbols are any symbol not defined within the module itself, including calls to +functions outside the JIT and calls to functions defined in other modules that +have already been added to the JIT. It may seem as though modules added to the +JIT should "know about one another" by default, but since we would still have to +supply a symbol resolver for references to code outside the JIT it turns out to +be easier to just re-use this one mechanism for all symbol resolution. This has +the added benefit that the user has full control over the symbol resolution +process. Should we search for definitions within the JIT first, then fall back +on external definitions? Or should we prefer external definitions where +available and only JIT code if we don't already have an available +implementation? By using a single symbol resolution scheme we are free to choose +whatever makes the most sense for any given use case. + +Building a symbol resolver is made especially easy by the *createLambdaResolver* +function. This function takes two lambdas [3]_ and returns a +RuntimeDyld::SymbolResolver instance. The first lambda is used as the +implementation of the resolver's findSymbolInLogicalDylib method, which searches +for symbol definitions that should be thought of as being part of the same +"logical" dynamic library as this Module. If you are familiar with static +linking: this means that findSymbolInLogicalDylib should expose symbols with +common linkage and hidden visibility. If all this sounds foreign you can ignore +the details and just remember that this is the first method that the linker will +use to try to find a symbol definition. If the findSymbolInLogicalDylib method +returns a null result then the linker will call the second symbol resolver +method, called findSymbol, which searches for symbols that should be thought of +as external to (but visibile from) the module and its logical dylib. In this +tutorial we will adopt the following simple scheme: All modules added to the JIT +will behave as if they were linked into a single, ever-growing logical dylib. To +implement this our first lambda (the one defining findSymbolInLogicalDylib) will +just search for JIT'd code by calling the CompileLayer's findSymbol method. If +we don't find a symbol in the JIT itself we'll fall back to our second lambda, +which implements findSymbol. This will use the +RTDyldMemoyrManager::getSymbolAddressInProcess method to search for the symbol +within the program itself. If we can't find a symbol definition via either of +these paths the JIT will refuse to accept our module, returning a "symbol not +found" error. Now that we've built our symbol resolver we're ready to add our module to the -JIT. We do this by calling the CompileLayer's addModuleSet method [3]_. Since +JIT. We do this by calling the CompileLayer's addModuleSet method [4]_. Since we only have a single Module and addModuleSet expects a collection, we will create a vector of modules and add our module as the only member. Since we have already typedef'd our ModuleHandle type to be the same as the @@ -296,11 +286,34 @@ directly from our addModule method. CompileLayer.removeModuleSet(H); } -*To be done: describe findSymbol and removeModule -- why do we mangle? what's -the relationship between findSymbol and resolvers, why remove modules...* +Now that we can add code to our JIT, we need a way to find the symbols we've +added to it. To do that we call the findSymbol method on our IRCompileLayer, +but with a twist: We have to *mangle* the name of the symbol we're searching +for first. The reason for this is that the ORC JIT components use mangled +symbols internally the same way a static compiler and linker would, rather +than using plain IR symbol names. The kind of mangling will depend on the +DataLayout, which in turn depends on the target platform. To allow us to +remain portable and search based on the un-mangled name, we just re-produce +this mangling ourselves. + +We now come to the last method in our JIT API: removeModule. This method is +responsible for destructing the MemoryManager and SymbolResolver that were +added with a given module, freeing any resources they were using in the +process. In our Kaleidoscope demo we rely on this method to remove the module +representing the most recent top-level expression, preventing it from being +treated as a duplicate definition when the next top-level expression is +entered. It is generally good to free any module that you know you won't need +to call further, just to free up the resources dedicated to it. However, you +don't strictly need to do this: All resources will be cleaned up when your +JIT class is destructed, if the haven't been freed before then. + +This brings us to the end of Chapter 1 of Building a JIT. You now have a basic +but fully functioning JIT stack that you can use to take LLVM IR and make it +executable within the context of your JIT process. In the next chapter we'll +look at how to extend this JIT to produce better quality code, and in the +process take a deeper look at the ORC layer concept. -*To be done: Conclusion, exercises (maybe a utility for a standalone IR JIT, -like a mini-LLI), feed to next chapter.* +`Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_ Full Code Listing ================= @@ -320,8 +333,6 @@ Here is the code: .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter1/KaleidoscopeJIT.h :language: c++ -`Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_ - .. [1] Actually we use a cut-down version of KaleidoscopeJIT that makes a simplifying assumption: symbols cannot be re-defined. This will make it impossible to re-define symbols in the REPL, but will make our symbol @@ -356,6 +367,9 @@ Here is the code: | | makes symbols in the host process searchable. | +-----------------------+-----------------------------------------------+ -.. [3] ORC layers accept sets of Modules, rather than individual ones, so that +.. [3] Actually they don't have to be lambdas, any object with a call operator + will do, including plain old functions or std::functions. + +.. [4] ORC layers accept sets of Modules, rather than individual ones, so that all Modules in the set could be co-located by the memory manager, though this feature is not yet implemented. |