diff options
Diffstat (limited to 'mlir/docs/MLIRForGraphAlgorithms.md')
-rw-r--r-- | mlir/docs/MLIRForGraphAlgorithms.md | 403 |
1 files changed, 403 insertions, 0 deletions
diff --git a/mlir/docs/MLIRForGraphAlgorithms.md b/mlir/docs/MLIRForGraphAlgorithms.md new file mode 100644 index 00000000000..ac26e5beb9b --- /dev/null +++ b/mlir/docs/MLIRForGraphAlgorithms.md @@ -0,0 +1,403 @@ +# MLIR: Incremental Application to Graph Algorithms in ML Frameworks + +The existing documentation about MLIR focuses on long term vision, how its +pieces fit together, and the benefits of modular and composable infrastructure +in the vast and distant future. While this viewpoint appeals to some, it causes +concern for others who are more concerned about the "here and now" - why does it +make sense to make a "revolutionary" change when any individual problem can be +fixed in place? + +This document explains that adoption of MLIR to solve graph based problems +_isn't_ a revolutionary change: it is an incremental series of steps which build +on each other, each of which delivers local value. This document also addresses +some points of confusion that keep coming up. + +One note: even though a major advantage of MLIR is that it can span the full +spectrum from graph algorithms down to low-level code generation, this document +focuses on the use of MLIR for **graph-level algorithms**. MLIR will also unlock +exciting code generation opportunities (particularly given its novel approach to +integrating state of the art polyhedral techniques), but issues that touch on +MLIR's relationship to XLA, Eigen, etc, are out of scope for this particular +doc. + +This document uses TensorFlow as the example given that it is the focus of our +immediate work, but we believe that the same viewpoint could be useful for +people working in the context of other ML frameworks that may consider adopting +MLIR in the future. + +### How is MLIR relevant? + +MLIR is an overloaded acronym which unpacks as "Multi-Level Intermediate +Representation". Its high-level purpose is to provide mechanics for describing +and transforming programs and computations in a flexible way. It provides common +compiler infrastructure for things like constant folding, dead code elimination, +graph rewriting, and others - which are independent of the representational +choices picked by a given dialect (e.g. its concurrency semantics). It was built +with a specific focus on compile time and memory efficiency, accurate +propagation of source location information (important for reporting high quality +errors and warnings) and is designed for testability. + +TensorFlow has numerous subsystems (some of which are proprietary, e.g. +Tensor-RT, nGraph, CoreML, etc) as well as translation layers between these +different subsystems, and these translation layers face similar challenges. ((As +an aside, the internals of each of these subsystems could often benefit from +MLIR infrastructure, but that isn't a focus of this doc.)) + +A key observation that MLIR makes is that these subsystems often have two things +going on: they are both particular data structures and encodings (e.g. HLO +graphs, TF-Lite's flat buffer format, TensorFlow's Graph format, the ONNX +abstraction, etc) as well as an abstraction of computation (a specific way of +modeling a convolution, a set of supported operations etc). + +MLIR uses a standard IR (i.e., a set of data structures) for representing these +computations - this allows a huge amount of shared infrastructure across these +problem domains. MLIR then allows the definition of domain-specific "dialects" +that describe the set of operations that are legal and supported for a given +application. This means that the actual translations between data structures are +kept as simple as possible - and are thus relatively easy to make "correct". +This allows the common compiler infrastructure to handle the mapping problems +and the other issues within the domain. + +MLIR's design is directly informed by the experience of building (and then +living with) intermediate representations like the LLVM IR, LLVM SelectionDAG, +the LLVM machine instruction representation, Swift SIL IR, and learns new +lessons from TensorFlow and XLA HLO, as well as learning from building countless +research and production systems on top of them. Our goal is to drag the state of +the art in compilers forward, not to merely apply a few well-known techniques to +the machine learning domain. + +### What does adoption mean? + +The point of this document is not to advocate for rewriting any particular +subsystem in TensorFlow - indeed, the burden required to justify a rewrite is +high, and often very specific to that subsystem. That said, there are several +subsystems that are about to get rewritten or substantially revised anyway, so +we use those as examples to concretely describe the benefits that MLIR provides +in these cases and what it will take. The subsystems discussed are: + +1. the TF Lite TOCO translator, which we need to improve error + reporting/reliability issues and generalize it to support more ops, and +1. the TF/XLA bridge which needs to improve usability by merging some of its + usage models, support dynamic shapes and generalize guest subsystem support + to Tensor-RT and nGraph. +1. Grappler is another subsystem that is likely to get substantial revisions in + the future, and would definitely benefit from the MLIR framework, but there + are no known plans to do that work at this point, so we don't discuss it + further. + +Adopting MLIR for these works the same way - and, in fact, the work to support +TF Lite is mostly a subset of the larger work to support the functionality of +the TF/XLA bridge. TF Lite and the TF/XLA bridge include several compiler passes +(things like encapsulate, functionalize control flow, lowering of ops, fusion, +constant folding, shape inference, etc). + +MLIR supports converting from TensorFlow Graphs to MLIR and back, which means +that we can start by putting in a no-op translation to MLIR and back into the +pipeline, and verify that nothing breaks. Then we can work on replacing the +compiler transformations one by one by reimplementing them (with the improved +algorithms that we're planning). + +This is a development plan, we wouldn't actually ship a TensorFlow that just +uses MLIR for a single pass. In practice, we'll have the MLIR flag gated under +an option, build out a replacement for an entire subsystem (e.g. the TOCO +translator) and when the time is right, we'll do A/B comparisons and eventually +make a switch and phase out the old code over time. + +## What benefit does MLIR provide? + +The adoption plan above might sound like it only makes things worse in the +immediate term - we have two implementations of the same functionality, we are +dividing our efforts, etc. In order for this to be worth it, we should have a +good sense that we are building towards an improved future that will make +customers and TensorFlow engineers happier when it lands. Here we describe a few +of the benefits that MLIR provides, in no particular order: + +### A Lossless Human Editable Textual Representation + +The MLIR in-memory data structure has a human readable and writable format, as +well as [a specification](LangRef.md) for that format - built just like any +other programming language. Important properties of this format are that it is +compact, easy to read, and lossless. You can dump an MLIR program out to disk +and munge around with it, then send it through a few more passes. + +If you haven't worked with a system that works this way, it is hard to overstate +how big of a deal this in practice: it means that you can call `foo->dump()` on +an IR object to see its full contents, it means you can diff the IR before and +after a change, delta reduce IR files, and many other things. + +### A Graph Verification Pass + +Like many other popular compiler infrastructures, MLIR provides infrastructure +and implementation for a "verifier" which checks that the IR is well formed. The +MLIR verifier is a simple framework that makes it easy to provide a single +source of truth for those correctness properties and is general across all +Dialects (e.g. TF Graph, TF Lite flat buffer, XLA HLO, etc). + +A verifier pass is sort of like a 'super assertion' that catches mistakes in +program transformations early, making you as an engineer more productive, making +the product more reliable, and making it easier to track down bugs when they +appear - because the verifier can be run at any time, either as a compiler pass +or with a single function call. + +While MLIR provides a well-considered infrastructure for IR verification, and +has simple checks for existing TensorFlow operations, there is a lot that should +be added here and lots of opportunity to get involved! + +### Designed for Testability + +There are many aspects of this in MLIR, but we'll focus on compiler +transformations since they are the easiest to understand. Compiler +transformations are modeled as subclasses of the `Pass` C++ class, which are +driven by an `mlir-opt` tool. When combined with a lossless textual +representation, it becomes really easy to write unit tests for compiler +transformations, for example, this is a simple test that shows "x-x" is being +turned into zero: + +```mlir + // RUN: mlir-opt %s -canonicalize | FileCheck %s + func @test_subi_zero_cfg(%arg0: i32) -> i32 { + %y = subi %arg0, %arg0 : i32 + return %y: i32 + } + // CHECK-LABEL: func @test_subi_zero_cfg(%arg0: i32) + // CHECK-NEXT: %c0_i32 = constant 0 : i32 + // CHECK-NEXT: return %c0 +``` + +The "CHECK" comments are interpreted by the +[LLVM FileCheck tool](https://llvm.org/docs/CommandGuide/FileCheck.html), which +is sort of like a really advanced grep. This test is fully self-contained: it +feeds the input into the [canonicalize pass](Canonicalization.md), and checks +that the output matches the CHECK lines. See the `test/Transforms` directory for +more examples. In contrast, standard unit testing exposes the API of the +underlying framework to lots and lots of tests (making it harder to refactor and +move the API), typically requires a lot more code, and exacerbates issues with +link time. For examples, see +[the TEST_F functions in TensorFlow's testsuite](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/grappler/optimizers/arithmetic_optimizer_test.cc). + +MLIR has been pervasively designed with this sort of design by testability, +allowing us to put in place a culture that expects every behavior changing +commit to include a test case, and for these test cases to be stable and +reliable over time, since they are testing exactly what they are supposed to. +End to end integration tests are still super useful for some things of course! + +### Infrastructure for Warnings and Error Diagnostics and Location Tracking + +MLIR benefits from the lessons learned from building other compilers - including +Clang which +[[set the standard](http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html)](http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html) +for quality of implementation in C/C++ compiler diagnostics. Drawing from this +experience (and fixing mistakes in LLVM), MLIR requires that operations and +functions carry abstract location information, that transformations propagate +this information, and provides standardized mechanisms to emit errors and +warnings, as well as for clients to hook into them to capture and report them in +custom ways. + +Why is this important? In practice, many graph-to-graph translators can fail +(e.g. TF Lite when an unsupported op is used) and it is important to be able to +report the error up through to the user in the most precise way possible, in +order for it to be actionable. This includes tracking rewrites through fusions +and fissions of ops, mapping back into language / API specific domains, etc. + +More selfishly for infrastructure hackers, this is a huge boon because it means +that it is easy to write good tests for this: the testing tools for MLIR capture +the diagnostics produced by passes (using the standard diagnostic hooks) and +check that they match the expected diagnostics in the testcase. For example, to +test the dependence analysis infra in the code generator, Andy Davis wrote a +simple pass that checks dependencies and emits them as "notes", allowing him to +write tests like this: + +```mlir + // RUN: mlir-opt %s -memref-dependence-check -verify-diagnostics + func @different_memrefs() { + %m.a = alloc() : memref<100xf32> + %m.b = alloc() : memref<100xf32> + %c0 = constant 0 : index + %c1 = constant 1.0 : f32 + store %c1, %m.a[%c0] : memref<100xf32> + // expected-note@-1 {{dependence from memref access 0 to access 1 = false}} + %v0 = load %m.b[%c0] : memref<100xf32> + return + } +``` + +Note that a major limitation of this is that MLIR suffers from a problem of +"garbage in, garbage out": if the input locations to MLIR are imprecise, then +there is nothing that it can do to recover them. There is work underway in +TensorFlow/Python to improve the situation, and Swift for TensorFlow already has +perfect location tracking due to its design. + +### Shape Information Captured in the IR + +In TensorFlow Graphs, each op takes and returns values using a very simple type +system (TF_DataType) in which each value is a tensor of unknown rank and +dimensions. At the same time, many graphs have static shapes easily knowable for +wide swaths of the computation, and even dynamically shaped operations often +have statically knowable dimensions. Many analyses and transformations benefit +and use this information when available, but because TensorFlow graphs don't +capture this (e.g. serialize it to proto), passes have to recompute it on demand +with ShapeRefiner. + +The [MLIR Tensor Type](LangRef.md#tensor-type) directly captures shape +information, so you can have things like: + +```mlir + %x = tf.Add %x, %y : tensor<128 x 8 x ? x f32> +``` + +Capturing this in the IR is expected to speed up transformations (avoiding +recomputing the same info over and over again) which therefore makes it +practical to apply stronger shape analysis algorithms. It also makes it easier +to work with the IR, because on-the-side representations can get out of date, +and the API is easier to work with from an ergonomics perspective. + +### Unified Graph Rewriting Infrastructure + +This is still a work in progress, but we have sightlines towards a +[general rewriting infrastructure](GenericDAGRewriter.md) for transforming DAG +tiles into other DAG tiles, using a declarative pattern format. DAG to DAG +rewriting is a generalized solution for many common compiler optimizations, +lowerings, and other rewrites and having an IR enables us to invest in building +a single high-quality implementation. + +Declarative pattern rules are preferable to imperative C++ code for a number of +reasons: they are more compact, easier to reason about, can have checkers +written against them, and new tools can be built that inspect and manipulate the +declarative patterns in interesting ways - e.g. applying theorem provers to +them. It will be exciting to see this ecosystem develop as the infrastructure +matures. + +### Clarified Semantics for TensorFlow Operations + +One of the challenging things about working with TensorFlow is that there are +many invariants and behaviors that need to be preserved and known about when +working with Graphs, and these can be difficult to reason about and lead to +bugs. Things like 'dead values', Switch and Merge nodes, concurrency semantics, +nodes that execute even when passed a dead value, multiple device program +representation - etc... all add complexities that can make it challenging to +reason about whether a transformation or analysis is correct in general. Even +something as simple as constant folding or transforming integer `x-x` into `0` +is non-trivial because you need to consider control dependence edges. + +One of our major goals for the TensorFlow dialect of MLIR is to sort out these +situations and upgrade existing TensorFlow graphs to semantics that are easier +to reason about. The solutions to these problems are all still being debated, +but those discussions have already yielded a lot of potential answers: +introducing a `tf_dead_or<x>` types for switch/merge, modeling of TF operations +using futures/async semantics etc. None of these particular battles are critical +or important for MLIR to succeed (because of its "meta" nature, the abstraction +decisions of any given dialect are up for it to decide), but each one that works +out will make it easier to work with and transform TensorFlow operations. We +expect these issues to get nailed down in the next couple of months when MLIR +effort moves beyond TF Lite / TOCO support. The discussions that are happening +now are super valuable and making progress. + +### Ergonomics + +A minor-in-theory, but important-in-practice point is that MLIR is designed to +make it easy, memory efficient, and less error prone to transform code than +other systems. `TensorFlow::Graph` has implementation issues where the same +information is stored redundantly in different places (which must be manually +kept up to date), has somewhat unusual representation of certain constructs +(e.g. the function library, which makes it very difficult to add or remove +functions, e.g. during interprocedural transformations), and stores information +in the graph that is used by the executor, but isn't necessary for program +transformation. + +TensorFlow has made a lot of progress in this area over the years, and there are +lots of ideas about further improvements in the future, we are happy that MLIR +addresses these needs (making it much easier to implement correct program +transformations) today, and are committed to pushing hard to make it better. + +### Compile Time Performance and Memory Use + +MLIR has been designed to be memory and compile-time efficient in its algorithms +and data structures, using immutable and uniqued structures, low level +bit-packing, and other well-known techniques to avoid unnecessary heap +allocations, and allow simple and safe multithreaded optimization of MLIR +programs. There are other reasons to believe that the MLIR implementations of +common transformations will be more efficient than the Python and C++ +TensorFlow::Graph implementations of the same things, given the current +implementation details of TensorFlow. + +That said, this is very much a theory at this point. When the new implementation +of various subsystems are available, we will see what happens in practice: there +will be no reason to speculate - we can measure. + +## Common Questions and Concerns + +Here we address some frequently asked questions and concerns. + +### Isn't MLIR a big dependency to take on? + +We've heard that at least some people are concerned that MLIR is a "big" +dependency to take on, and could result in large code size. Here are some key +points MLIR: + +1. The entire MLIR codebase is a pretty small C++ code base in absolute terms + compared to what goes into a modern ML framework. +1. Like LLVM, MLIR is designed as a set of libraries that clients can link in + or ignore as they wish. For example, the transformations in MLIR kept + separate from the core IR abstractions, and dialect specific code (e.g. + TensorFlow, TF-Lite, XLA, etc) is all independently selectable by the build + system. Clients that don't care about XLA don't link in that code, whether + they are a TF-Lite system or a client that is completely unrelated to + TensorFlow. +1. MLIR's only third party dependency is on LLVM, but it doesn't depend on LLVM + IR or any other heavy dependency - it just depends on LLVM's support library + which provides efficient hash tables and other + [memory efficient data structures that the STL does not](http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task). + There have been discussions about splitting this set of libraries out to its + own subproject in LLVM that the LLVM IR project depends on. This would be + great for MLIR as well as other LLVM subprojects. +1. TensorFlow and many other frameworks already use LLVM - if so, MLIR would + not be pulling in an additional dependency at all. + +### How does MLIR represent {control flow, concurrency, …} semantics in TensorFlow? + +MLIR provides a dialect that is an isomorphic 1-1 mapping between TensorFlow +graphs and MLIR, as well as a pretty complete translator back and forth (the +only known gap is that a few TF_DataType enums aren't handled yet). MLIR is a +"Multi-Level IR", which allows it to represent code with different abstraction +levels, so the ability to faithfully represent TensorFlow code in a completely +backwards compatible way (even if there are some historical warts!) is critical. + +In *addition* to the isomorphic mapping, we are actively working on efforts to +raise the abstraction level for working with TensorFlow graphs in MLIR. Doing so +would make it even easier to write TensorFlow transformations than it is today, +and would provide a path to migrating TF 1.x graphs forward into the TF 2.x +world. For example, because MLIR has an extensible type system, we can directly +model whether it is impossible for a Tensor value to be a "dead" value - similar +to the use of optional types in modern programming languages. + +These discussions occasionally cause confusion because there are several issues +being mixed up into one: + +* What are the current semantics of TensorFlow graphs, and what invariants can + we rely on? +* What should the semantics be in TensorFlow 2.0? +* What do programs rely on in practice, and if it is unfriendly, can we + migrate it? +* Can we find a way to make it so transforms don't have to worry about the + complexities of Switch/Merge, by using higher level control flow + representations? (tentative answer: yes) +* How should MLIR represent async vs sync operations, what invariants are + provided, how does this dovetail with control flow? +* When is it safe and beneficial to perform optimizations that might reduce + parallelism? + +All of these questions have a "conservative/safe fallback": we can continue +providing exactly the same abstractions that TensorFlow always has. That said, +we are trying hard to level-up the representation (taking advantage of the +"Multi-Level" part of MLIR) because doing so will make it much much easier to +write analyses and transformations than it currently is in TensorFlow. + +### Non Goals + +It is important to point out things that MLIR does not aim to do. For example, +there is no runtime component to MLIR: the TensorFlow executor, the TF Lite +FlatBuffer interpreter, or other existing runtime should be used as-is. + +Another non-goal is that MLIR currently doesn't support a stable binary +encoding. We will certainly add this at some point, but existing formats should +be used for serialization and distribution in the meantime. |