diff options
Diffstat (limited to 'llvm/docs/tutorial/LangImpl10.rst')
-rw-r--r-- | llvm/docs/tutorial/LangImpl10.rst | 257 |
1 files changed, 5 insertions, 252 deletions
diff --git a/llvm/docs/tutorial/LangImpl10.rst b/llvm/docs/tutorial/LangImpl10.rst index b1d19c2cdd8..1ff4dc8af44 100644 --- a/llvm/docs/tutorial/LangImpl10.rst +++ b/llvm/docs/tutorial/LangImpl10.rst @@ -1,254 +1,7 @@ -====================================================== -Kaleidoscope: Conclusion and other useful LLVM tidbits -====================================================== +:orphan: -.. contents:: - :local: - -Tutorial Conclusion -=================== - -Welcome to the final chapter of the "`Implementing a language with -LLVM <index.html>`_" tutorial. In the course of this tutorial, we have -grown our little Kaleidoscope language from being a useless toy, to -being a semi-interesting (but probably still useless) toy. :) - -It is interesting to see how far we've come, and how little code it has -taken. We built the entire lexer, parser, AST, code generator, an -interactive run-loop (with a JIT!), and emitted debug information in -standalone executables - all in under 1000 lines of (non-comment/non-blank) -code. - -Our little language supports a couple of interesting features: it -supports user defined binary and unary operators, it uses JIT -compilation for immediate evaluation, and it supports a few control flow -constructs with SSA construction. - -Part of the idea of this tutorial was to show you how easy and fun it -can be to define, build, and play with languages. Building a compiler -need not be a scary or mystical process! Now that you've seen some of -the basics, I strongly encourage you to take the code and hack on it. -For example, try adding: - -- **global variables** - While global variables have questional value - in modern software engineering, they are often useful when putting - together quick little hacks like the Kaleidoscope compiler itself. - Fortunately, our current setup makes it very easy to add global - variables: just have value lookup check to see if an unresolved - variable is in the global variable symbol table before rejecting it. - To create a new global variable, make an instance of the LLVM - ``GlobalVariable`` class. -- **typed variables** - Kaleidoscope currently only supports variables - of type double. This gives the language a very nice elegance, because - only supporting one type means that you never have to specify types. - Different languages have different ways of handling this. The easiest - way is to require the user to specify types for every variable - definition, and record the type of the variable in the symbol table - along with its Value\*. -- **arrays, structs, vectors, etc** - Once you add types, you can start - extending the type system in all sorts of interesting ways. Simple - arrays are very easy and are quite useful for many different - applications. Adding them is mostly an exercise in learning how the - LLVM `getelementptr <../LangRef.html#getelementptr-instruction>`_ instruction - works: it is so nifty/unconventional, it `has its own - FAQ <../GetElementPtr.html>`_! -- **standard runtime** - Our current language allows the user to access - arbitrary external functions, and we use it for things like "printd" - and "putchard". As you extend the language to add higher-level - constructs, often these constructs make the most sense if they are - lowered to calls into a language-supplied runtime. For example, if - you add hash tables to the language, it would probably make sense to - add the routines to a runtime, instead of inlining them all the way. -- **memory management** - Currently we can only access the stack in - Kaleidoscope. It would also be useful to be able to allocate heap - memory, either with calls to the standard libc malloc/free interface - or with a garbage collector. If you would like to use garbage - collection, note that LLVM fully supports `Accurate Garbage - Collection <../GarbageCollection.html>`_ including algorithms that - move objects and need to scan/update the stack. -- **exception handling support** - LLVM supports generation of `zero - cost exceptions <../ExceptionHandling.html>`_ which interoperate with - code compiled in other languages. You could also generate code by - implicitly making every function return an error value and checking - it. You could also make explicit use of setjmp/longjmp. There are - many different ways to go here. -- **object orientation, generics, database access, complex numbers, - geometric programming, ...** - Really, there is no end of crazy - features that you can add to the language. -- **unusual domains** - We've been talking about applying LLVM to a - domain that many people are interested in: building a compiler for a - specific language. However, there are many other domains that can use - compiler technology that are not typically considered. For example, - LLVM has been used to implement OpenGL graphics acceleration, - translate C++ code to ActionScript, and many other cute and clever - things. Maybe you will be the first to JIT compile a regular - expression interpreter into native code with LLVM? - -Have fun - try doing something crazy and unusual. Building a language -like everyone else always has, is much less fun than trying something a -little crazy or off the wall and seeing how it turns out. If you get -stuck or want to talk about it, feel free to email the `llvm-dev mailing -list <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_: it has lots -of people who are interested in languages and are often willing to help -out. - -Before we end this tutorial, I want to talk about some "tips and tricks" -for generating LLVM IR. These are some of the more subtle things that -may not be obvious, but are very useful if you want to take advantage of -LLVM's capabilities. - -Properties of the LLVM IR -========================= - -We have a couple of common questions about code in the LLVM IR form - -let's just get these out of the way right now, shall we? - -Target Independence -------------------- - -Kaleidoscope is an example of a "portable language": any program written -in Kaleidoscope will work the same way on any target that it runs on. -Many other languages have this property, e.g. lisp, java, haskell, -javascript, python, etc (note that while these languages are portable, -not all their libraries are). - -One nice aspect of LLVM is that it is often capable of preserving target -independence in the IR: you can take the LLVM IR for a -Kaleidoscope-compiled program and run it on any target that LLVM -supports, even emitting C code and compiling that on targets that LLVM -doesn't support natively. You can trivially tell that the Kaleidoscope -compiler generates target-independent code because it never queries for -any target-specific information when generating code. - -The fact that LLVM provides a compact, target-independent, -representation for code gets a lot of people excited. Unfortunately, -these people are usually thinking about C or a language from the C -family when they are asking questions about language portability. I say -"unfortunately", because there is really no way to make (fully general) -C code portable, other than shipping the source code around (and of -course, C source code is not actually portable in general either - ever -port a really old application from 32- to 64-bits?). - -The problem with C (again, in its full generality) is that it is heavily -laden with target specific assumptions. As one simple example, the -preprocessor often destructively removes target-independence from the -code when it processes the input text: - -.. code-block:: c - - #ifdef __i386__ - int X = 1; - #else - int X = 42; - #endif - -While it is possible to engineer more and more complex solutions to -problems like this, it cannot be solved in full generality in a way that -is better than shipping the actual source code. - -That said, there are interesting subsets of C that can be made portable. -If you are willing to fix primitive types to a fixed size (say int = -32-bits, and long = 64-bits), don't care about ABI compatibility with -existing binaries, and are willing to give up some other minor features, -you can have portable code. This can make sense for specialized domains -such as an in-kernel language. - -Safety Guarantees ------------------ - -Many of the languages above are also "safe" languages: it is impossible -for a program written in Java to corrupt its address space and crash the -process (assuming the JVM has no bugs). Safety is an interesting -property that requires a combination of language design, runtime -support, and often operating system support. - -It is certainly possible to implement a safe language in LLVM, but LLVM -IR does not itself guarantee safety. The LLVM IR allows unsafe pointer -casts, use after free bugs, buffer over-runs, and a variety of other -problems. Safety needs to be implemented as a layer on top of LLVM and, -conveniently, several groups have investigated this. Ask on the `llvm-dev -mailing list <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ if -you are interested in more details. - -Language-Specific Optimizations -------------------------------- - -One thing about LLVM that turns off many people is that it does not -solve all the world's problems in one system. One specific -complaint is that people perceive LLVM as being incapable of performing -high-level language-specific optimization: LLVM "loses too much -information". Here are a few observations about this: - -First, you're right that LLVM does lose information. For example, as of -this writing, there is no way to distinguish in the LLVM IR whether an -SSA-value came from a C "int" or a C "long" on an ILP32 machine (other -than debug info). Both get compiled down to an 'i32' value and the -information about what it came from is lost. The more general issue -here, is that the LLVM type system uses "structural equivalence" instead -of "name equivalence". Another place this surprises people is if you -have two types in a high-level language that have the same structure -(e.g. two different structs that have a single int field): these types -will compile down into a single LLVM type and it will be impossible to -tell what it came from. - -Second, while LLVM does lose information, LLVM is not a fixed target: we -continue to enhance and improve it in many different ways. In addition -to adding new features (LLVM did not always support exceptions or debug -info), we also extend the IR to capture important information for -optimization (e.g. whether an argument is sign or zero extended, -information about pointers aliasing, etc). Many of the enhancements are -user-driven: people want LLVM to include some specific feature, so they -go ahead and extend it. - -Third, it is *possible and easy* to add language-specific optimizations, -and you have a number of choices in how to do it. As one trivial -example, it is easy to add language-specific optimization passes that -"know" things about code compiled for a language. In the case of the C -family, there is an optimization pass that "knows" about the standard C -library functions. If you call "exit(0)" in main(), it knows that it is -safe to optimize that into "return 0;" because C specifies what the -'exit' function does. - -In addition to simple library knowledge, it is possible to embed a -variety of other language-specific information into the LLVM IR. If you -have a specific need and run into a wall, please bring the topic up on -the llvm-dev list. At the very worst, you can always treat LLVM as if it -were a "dumb code generator" and implement the high-level optimizations -you desire in your front-end, on the language-specific AST. - -Tips and Tricks -=============== - -There is a variety of useful tips and tricks that you come to know after -working on/with LLVM that aren't obvious at first glance. Instead of -letting everyone rediscover them, this section talks about some of these -issues. - -Implementing portable offsetof/sizeof -------------------------------------- - -One interesting thing that comes up, if you are trying to keep the code -generated by your compiler "target independent", is that you often need -to know the size of some LLVM type or the offset of some field in an -llvm structure. For example, you might need to pass the size of a type -into a function that allocates memory. - -Unfortunately, this can vary widely across targets: for example the -width of a pointer is trivially target-specific. However, there is a -`clever way to use the getelementptr -instruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_ -that allows you to compute this in a portable way. - -Garbage Collected Stack Frames ------------------------------- - -Some languages want to explicitly manage their stack frames, often so -that they are garbage collected or to allow easy implementation of -closures. There are often better ways to implement these features than -explicit stack frames, but `LLVM does support -them, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_ -if you want. It requires your front-end to convert the code into -`Continuation Passing -Style <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and -the use of tail calls (which LLVM also supports). +===================== +Kaleidoscope Tutorial +===================== +The Kaleidoscope Tutorial has `moved to another location <MyFirstLanguageFrontend/index>`_ . |