diff options
Diffstat (limited to 'llvm/docs/tutorial/LangImpl09.rst')
-rw-r--r-- | llvm/docs/tutorial/LangImpl09.rst | 468 |
1 files changed, 5 insertions, 463 deletions
diff --git a/llvm/docs/tutorial/LangImpl09.rst b/llvm/docs/tutorial/LangImpl09.rst index d81f9fa0001..1ff4dc8af44 100644 --- a/llvm/docs/tutorial/LangImpl09.rst +++ b/llvm/docs/tutorial/LangImpl09.rst @@ -1,465 +1,7 @@ -====================================== -Kaleidoscope: Adding Debug Information -====================================== +:orphan: -.. contents:: - :local: - -Chapter 9 Introduction -====================== - -Welcome to Chapter 9 of the "`Implementing a language with -LLVM <index.html>`_" tutorial. In chapters 1 through 8, we've built a -decent little programming language with functions and variables. -What happens if something goes wrong though, how do you debug your -program? - -Source level debugging uses formatted data that helps a debugger -translate from binary and the state of the machine back to the -source that the programmer wrote. In LLVM we generally use a format -called `DWARF <http://dwarfstd.org>`_. DWARF is a compact encoding -that represents types, source locations, and variable locations. - -The short summary of this chapter is that we'll go through the -various things you have to add to a programming language to -support debug info, and how you translate that into DWARF. - -Caveat: For now we can't debug via the JIT, so we'll need to compile -our program down to something small and standalone. As part of this -we'll make a few modifications to the running of the language and -how programs are compiled. This means that we'll have a source file -with a simple program written in Kaleidoscope rather than the -interactive JIT. It does involve a limitation that we can only -have one "top level" command at a time to reduce the number of -changes necessary. - -Here's the sample program we'll be compiling: - -.. code-block:: python - - def fib(x) - if x < 3 then - 1 - else - fib(x-1)+fib(x-2); - - fib(10) - - -Why is this a hard problem? -=========================== - -Debug information is a hard problem for a few different reasons - mostly -centered around optimized code. First, optimization makes keeping source -locations more difficult. In LLVM IR we keep the original source location -for each IR level instruction on the instruction. Optimization passes -should keep the source locations for newly created instructions, but merged -instructions only get to keep a single location - this can cause jumping -around when stepping through optimized programs. Secondly, optimization -can move variables in ways that are either optimized out, shared in memory -with other variables, or difficult to track. For the purposes of this -tutorial we're going to avoid optimization (as you'll see with one of the -next sets of patches). - -Ahead-of-Time Compilation Mode -============================== - -To highlight only the aspects of adding debug information to a source -language without needing to worry about the complexities of JIT debugging -we're going to make a few changes to Kaleidoscope to support compiling -the IR emitted by the front end into a simple standalone program that -you can execute, debug, and see results. - -First we make our anonymous function that contains our top level -statement be our "main": - -.. code-block:: udiff - - - auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>()); - + auto Proto = llvm::make_unique<PrototypeAST>("main", std::vector<std::string>()); - -just with the simple change of giving it a name. - -Then we're going to remove the command line code wherever it exists: - -.. code-block:: udiff - - @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() { - /// top ::= definition | external | expression | ';' - static void MainLoop() { - while (1) { - - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: - return; - @@ -1184,7 +1183,6 @@ int main() { - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - - fprintf(stderr, "ready> "); - getNextToken(); - -Lastly we're going to disable all of the optimization passes and the JIT so -that the only thing that happens after we're done parsing and generating -code is that the LLVM IR goes to standard error: - -.. code-block:: udiff - - @@ -1108,17 +1108,8 @@ static void HandleExtern() { - static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (auto FnAST = ParseTopLevelExpr()) { - - if (auto *FnIR = FnAST->codegen()) { - - // We're just doing this to make sure it executes. - - TheExecutionEngine->finalizeObject(); - - // JIT the function, returning a function pointer. - - void *FPtr = TheExecutionEngine->getPointerToFunction(FnIR); - - - - // Cast it to the right type (takes no arguments, returns a double) so we - - // can call it as a native function. - - double (*FP)() = (double (*)())(intptr_t)FPtr; - - // Ignore the return value for this. - - (void)FP; - + if (!F->codegen()) { - + fprintf(stderr, "Error generating code for top level expr"); - } - } else { - // Skip token for error recovery. - @@ -1439,11 +1459,11 @@ int main() { - // target lays out data structures. - TheModule->setDataLayout(TheExecutionEngine->getDataLayout()); - OurFPM.add(new DataLayoutPass()); - +#if 0 - OurFPM.add(createBasicAliasAnalysisPass()); - // Promote allocas to registers. - OurFPM.add(createPromoteMemoryToRegisterPass()); - @@ -1218,7 +1210,7 @@ int main() { - OurFPM.add(createGVNPass()); - // Simplify the control flow graph (deleting unreachable blocks, etc). - OurFPM.add(createCFGSimplificationPass()); - - - + #endif - OurFPM.doInitialization(); - - // Set the global so the code gen can use this. - -This relatively small set of changes get us to the point that we can compile -our piece of Kaleidoscope language down to an executable program via this -command line: - -.. code-block:: bash - - Kaleidoscope-Ch9 < fib.ks | & clang -x ir - - -which gives an a.out/a.exe in the current working directory. - -Compile Unit -============ - -The top level container for a section of code in DWARF is a compile unit. -This contains the type and function data for an individual translation unit -(read: one file of source code). So the first thing we need to do is -construct one for our fib.ks file. - -DWARF Emission Setup -==================== - -Similar to the ``IRBuilder`` class we have a -`DIBuilder <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class -that helps in constructing debug metadata for an LLVM IR file. It -corresponds 1:1 similarly to ``IRBuilder`` and LLVM IR, but with nicer names. -Using it does require that you be more familiar with DWARF terminology than -you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you -read through the general documentation on the -`Metadata Format <http://llvm.org/docs/SourceLevelDebugging.html>`_ it -should be a little more clear. We'll be using this class to construct all -of our IR level descriptions. Construction for it takes a module so we -need to construct it shortly after we construct our module. We've left it -as a global static variable to make it a bit easier to use. - -Next we're going to create a small container to cache some of our frequent -data. The first will be our compile unit, but we'll also write a bit of -code for our one type since we won't have to worry about multiple typed -expressions: - -.. code-block:: c++ - - static DIBuilder *DBuilder; - - struct DebugInfo { - DICompileUnit *TheCU; - DIType *DblTy; - - DIType *getDoubleTy(); - } KSDbgInfo; - - DIType *DebugInfo::getDoubleTy() { - if (DblTy) - return DblTy; - - DblTy = DBuilder->createBasicType("double", 64, dwarf::DW_ATE_float); - return DblTy; - } - -And then later on in ``main`` when we're constructing our module: - -.. code-block:: c++ - - DBuilder = new DIBuilder(*TheModule); - - KSDbgInfo.TheCU = DBuilder->createCompileUnit( - dwarf::DW_LANG_C, DBuilder->createFile("fib.ks", "."), - "Kaleidoscope Compiler", 0, "", 0); - -There are a couple of things to note here. First, while we're producing a -compile unit for a language called Kaleidoscope we used the language -constant for C. This is because a debugger wouldn't necessarily understand -the calling conventions or default ABI for a language it doesn't recognize -and we follow the C ABI in our LLVM code generation so it's the closest -thing to accurate. This ensures we can actually call functions from the -debugger and have them execute. Secondly, you'll see the "fib.ks" in the -call to ``createCompileUnit``. This is a default hard coded value since -we're using shell redirection to put our source into the Kaleidoscope -compiler. In a usual front end you'd have an input file name and it would -go there. - -One last thing as part of emitting debug information via DIBuilder is that -we need to "finalize" the debug information. The reasons are part of the -underlying API for DIBuilder, but make sure you do this near the end of -main: - -.. code-block:: c++ - - DBuilder->finalize(); - -before you dump out the module. - -Functions -========= - -Now that we have our ``Compile Unit`` and our source locations, we can add -function definitions to the debug info. So in ``PrototypeAST::codegen()`` we -add a few lines of code to describe a context for our subprogram, in this -case the "File", and the actual definition of the function itself. - -So the context: - -.. code-block:: c++ - - DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), - KSDbgInfo.TheCU.getDirectory()); - -giving us an DIFile and asking the ``Compile Unit`` we created above for the -directory and filename where we are currently. Then, for now, we use some -source locations of 0 (since our AST doesn't currently have source location -information) and construct our function definition: - -.. code-block:: c++ - - DIScope *FContext = Unit; - unsigned LineNo = 0; - unsigned ScopeLine = 0; - DISubprogram *SP = DBuilder->createFunction( - FContext, P.getName(), StringRef(), Unit, LineNo, - CreateFunctionType(TheFunction->arg_size(), Unit), - false /* internal linkage */, true /* definition */, ScopeLine, - DINode::FlagPrototyped, false); - TheFunction->setSubprogram(SP); - -and we now have an DISubprogram that contains a reference to all of our -metadata for the function. - -Source Locations -================ - -The most important thing for debug information is accurate source location - -this makes it possible to map your source code back. We have a problem though, -Kaleidoscope really doesn't have any source location information in the lexer -or parser so we'll need to add it. - -.. code-block:: c++ - - struct SourceLocation { - int Line; - int Col; - }; - static SourceLocation CurLoc; - static SourceLocation LexLoc = {1, 0}; - - static int advance() { - int LastChar = getchar(); - - if (LastChar == '\n' || LastChar == '\r') { - LexLoc.Line++; - LexLoc.Col = 0; - } else - LexLoc.Col++; - return LastChar; - } - -In this set of code we've added some functionality on how to keep track of the -line and column of the "source file". As we lex every token we set our current -current "lexical location" to the assorted line and column for the beginning -of the token. We do this by overriding all of the previous calls to -``getchar()`` with our new ``advance()`` that keeps track of the information -and then we have added to all of our AST classes a source location: - -.. code-block:: c++ - - class ExprAST { - SourceLocation Loc; - - public: - ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {} - virtual ~ExprAST() {} - virtual Value* codegen() = 0; - int getLine() const { return Loc.Line; } - int getCol() const { return Loc.Col; } - virtual raw_ostream &dump(raw_ostream &out, int ind) { - return out << ':' << getLine() << ':' << getCol() << '\n'; - } - -that we pass down through when we create a new expression: - -.. code-block:: c++ - - LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS), - std::move(RHS)); - -giving us locations for each of our expressions and variables. - -To make sure that every instruction gets proper source location information, -we have to tell ``Builder`` whenever we're at a new source location. -We use a small helper function for this: - -.. code-block:: c++ - - void DebugInfo::emitLocation(ExprAST *AST) { - DIScope *Scope; - if (LexicalBlocks.empty()) - Scope = TheCU; - else - Scope = LexicalBlocks.back(); - Builder.SetCurrentDebugLocation( - DebugLoc::get(AST->getLine(), AST->getCol(), Scope)); - } - -This both tells the main ``IRBuilder`` where we are, but also what scope -we're in. The scope can either be on compile-unit level or be the nearest -enclosing lexical block like the current function. -To represent this we create a stack of scopes: - -.. code-block:: c++ - - std::vector<DIScope *> LexicalBlocks; - -and push the scope (function) to the top of the stack when we start -generating the code for each function: - -.. code-block:: c++ - - KSDbgInfo.LexicalBlocks.push_back(SP); - -Also, we may not forget to pop the scope back off of the scope stack at the -end of the code generation for the function: - -.. code-block:: c++ - - // Pop off the lexical block for the function since we added it - // unconditionally. - KSDbgInfo.LexicalBlocks.pop_back(); - -Then we make sure to emit the location every time we start to generate code -for a new AST object: - -.. code-block:: c++ - - KSDbgInfo.emitLocation(this); - -Variables -========= - -Now that we have functions, we need to be able to print out the variables -we have in scope. Let's get our function arguments set up so we can get -decent backtraces and see how our functions are being called. It isn't -a lot of code, and we generally handle it when we're creating the -argument allocas in ``FunctionAST::codegen``. - -.. code-block:: c++ - - // Record the function arguments in the NamedValues map. - NamedValues.clear(); - unsigned ArgIdx = 0; - for (auto &Arg : TheFunction->args()) { - // Create an alloca for this variable. - AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName()); - - // Create a debug descriptor for the variable. - DILocalVariable *D = DBuilder->createParameterVariable( - SP, Arg.getName(), ++ArgIdx, Unit, LineNo, KSDbgInfo.getDoubleTy(), - true); - - DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(), - DebugLoc::get(LineNo, 0, SP), - Builder.GetInsertBlock()); - - // Store the initial value into the alloca. - Builder.CreateStore(&Arg, Alloca); - - // Add arguments to variable symbol table. - NamedValues[Arg.getName()] = Alloca; - } - - -Here we're first creating the variable, giving it the scope (``SP``), -the name, source location, type, and since it's an argument, the argument -index. Next, we create an ``lvm.dbg.declare`` call to indicate at the IR -level that we've got a variable in an alloca (and it gives a starting -location for the variable), and setting a source location for the -beginning of the scope on the declare. - -One interesting thing to note at this point is that various debuggers have -assumptions based on how code and debug information was generated for them -in the past. In this case we need to do a little bit of a hack to avoid -generating line information for the function prologue so that the debugger -knows to skip over those instructions when setting a breakpoint. So in -``FunctionAST::CodeGen`` we add some more lines: - -.. code-block:: c++ - - // Unset the location for the prologue emission (leading instructions with no - // location in a function are considered part of the prologue and the debugger - // will run past them when breaking on a function) - KSDbgInfo.emitLocation(nullptr); - -and then emit a new location when we actually start generating code for the -body of the function: - -.. code-block:: c++ - - KSDbgInfo.emitLocation(Body.get()); - -With this we have enough debug information to set breakpoints in functions, -print out argument variables, and call functions. Not too bad for just a -few simple lines of code! - -Full Code Listing -================= - -Here is the complete code listing for our running example, enhanced with -debug information. To build this example, use: - -.. code-block:: bash - - # Compile - clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy - # Run - ./toy - -Here is the code: - -.. literalinclude:: ../../examples/Kaleidoscope/Chapter9/toy.cpp - :language: c++ - -`Next: Conclusion and other useful LLVM tidbits <LangImpl10.html>`_ +===================== +Kaleidoscope Tutorial +===================== +The Kaleidoscope Tutorial has `moved to another location <MyFirstLanguageFrontend/index>`_ . |