bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Replace the walkOps/visitOperationInst variants from the InstWalkers with ↵	River Riddle	2019-03-29	13	-19/+19
\| \| \| \| \| \|	the Instruction variants. PiperOrigin-RevId: 232322030
*	Update dma-generate pass to (1) work on blocks of instructions (instead of just	Uday Bondhugula	2019-03-29	4	-108/+262
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	loops), (2) take into account fast memory space capacity and lower 'dmaDepth' to fit, (3) add location information for debug info / errors - change dma-generate pass to work on blocks of instructions (start/end iterators) instead of 'for' loops; complete TODOs - allows DMA generation for straightline blocks of operation instructions interspersed b/w loops - take into account fast memory capacity: check whether memory footprint fits in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA generation is performed until it does fit in the provided memory - add location information to MemRefRegion; any insufficient fast memory capacity errors or debug info w.r.t dma generation shows location information - allow DMA generation pass to be instantiated with a fast memory capacity option (besides command line flag) - change getMemRefRegion to return unique_ptr's - change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst' - other helper methods; add postDomInstFilter option for replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods Eg. output $ mlir-opt -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir /tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) { ^ $ mlir-opt -debug-only=dma-generate -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir /tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) { PiperOrigin-RevId: 232297044
*	Fold the functionality of OperationInst into Instruction. OperationInst ↵	River Riddle	2019-03-29	3	-2/+3
\| \| \| \| \| \|	still exists as a forward declaration and will be removed incrementally in a set of followup cleanup patches. PiperOrigin-RevId: 232198540
*	Fix the handling of the resizable operands bit of OperationState in a few ↵	River Riddle	2019-03-29	1	-2/+2
\| \| \| \| \| \|	places. PiperOrigin-RevId: 232163738
*	Promote local buffers created post fusion to higher memory space	Uday Bondhugula	2019-03-29	1	-8/+54
\| \| \| \| \| \| \| \| \| \| \| \| \|	- fusion already includes the necessary analysis to create small/local buffers post fusion; allocate these buffers in a higher memory space if the necessary pass parameters are provided (threshold size, memory space id) - although there will be a separate utility at some point to directly detect and promote small local buffers to higher memory spaces, doing it while fusion when possible is much less expensive, comes free with fusion analysis, and covers a key common case. PiperOrigin-RevId: 232063894
*	Define the AffineForOp and replace ForInst with it. This patch is largely ↵	River Riddle	2019-03-29	14	-428/+473
\| \| \| \| \| \|	mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body. PiperOrigin-RevId: 232060516
*	Cleanup EDSCs and start a functional auto-generated library of custom Ops	Nicolas Vasilache	2019-03-29	1	-35/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL applies the following simplifications to EDSCs: 1. Rename Block to StmtList because an MLIR Block is a different, not yet supported, notion; 2. Rework Bindable to drop specific storage and just use it as a simple wrapper around Expr. The only value of Bindable is to force a static cast when used by the user to bind into the emitter. For all intended purposes, Bindable is just a lightweight check that an Expr is Unbound. This simplifies usage and reduces the API footprint. After playing with it for some time, it wasn't worth the API cognition overhead; 3. Replace makeExprs and makeBindables by makeNewExprs and copyExprs which is more explicit and less easy to misuse; 4. Add generally useful functionality to MLIREmitter: a. expose zero and one for the ubiquitous common lower bounds and step; b. add support to create already bound Exprs for all function arguments as well as shapes and views for Exprs bound to memrefs. 5. Delete Stmt::operator= and replace by a `Stmt::set` method which is more explicit. 6. Make Stmt::operator Expr() explicit. 7. Indexed.indices assertions are removed to pave the way for expressing slices and views as well as to work with 0-D memrefs. The CL plugs those simplifications with TableGen and allows emitting a full MLIR function for pointwise add. This "x.add" op is both type and rank-agnostic (by allowing ArrayRef of Expr passed to For loops) and opens the door to spinning up a composable library of existing and custom ops that should automate a lot of the tedious work in TF/XLA -> MLIR. Testing needs to be significantly improved but can be done in a separate CL. PiperOrigin-RevId: 231982325
*	Define an detail::OperandStorage class to handle managing instruction ↵	River Riddle	2019-03-29	2	-0/+2
\| \| \| \| \| \|	operands. This class stores operands in a similar way to SmallVector except for two key differences. The first is the inline storage, which is a trailing objects array. The second is that being able to dynamically resize the operand list is optional. This means that we can enable the cases where operations need to change the number of operands after construction without losing the spatial locality benefits of the common case (operation instructions / non-control flow instructions with a lifetime fixed number of operands). PiperOrigin-RevId: 231910497
*	Address Performance issue in NestedMatcher	Nicolas Vasilache	2019-03-29	3	-134/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A performance issue was reported due to the usage of NestedMatcher in ComposeAffineMaps. The main culprit was the ubiquitous copies that were occuring when appending even a single element in `matchOne`. This CL generally simplifies the implementation and removes one level of indirection by getting rid of auxiliary storage as well as simplifying the API. The users of the API are updated accordingly. The implementation was tested on a heavily unrolled example with ComposeAffineMaps and is now close in performance with an implementation based on stateless InstWalker. As a reminder, the whole ComposeAffineMaps pass is slated to disappear but the bug report was very useful as a stress test for NestedMatchers. Lastly, the following cleanups reported by @aminim were addressed: 1. make NestedPatternContext scoped within runFunction rather than at the Pass level. This was caused by a previous misunderstanding of Pass lifetime; 2. use defensive assertions in the constructor of NestedPatternContext to make it clear a unique such locally scoped context is allowed to exist. PiperOrigin-RevId: 231781279
*	Fix ASAN issue: snapshot edge list before loop which can modify this list.	MLIR Team	2019-03-29	1	-3/+15
\| \| \| \|	PiperOrigin-RevId: 231686040
*	LoopFusion: insert the source loop nest slice at a depth in the destination ↵	MLIR Team	2019-03-29	1	-23/+90
\| \| \| \| \| \|	loop nest which preserves dependences (above any loop carried or other dependences). This is accomplished by updating the maximum destination loop depth based on dependence checks between source loop nest loads and stores which access the memref on which the source loop nest has a store op. In addition, prevent fusing in source loop nests which write to memrefs which escape or are live out. PiperOrigin-RevId: 231684492
*	3000x speed improvement on compose-affine-maps by dropping NestedMatcher for	Uday Bondhugula	2019-03-29	1	-28/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a trivial inst walker :-) (reduces pass time from several minutes non-terminating to 120ms) - (fixes b/123541184) - use a simple 7-line inst walker to collect affine_apply op's instead of the nested matcher; -compose-affine-maps pass runs in 120ms now instead of 5 minutes + (non- terminating / out of memory) - on a realistic test case that is 20,000 lines 12-d loop nest - this CL is also pushing for simple existing/standard patterns unless there is a real efficiency issue (OTOH, fixing nested matcher to address this issue requires cl/231400521) - the improvement is from swapping out the nested walker as opposed to from a bug or anything else that this CL changes - update stale comment PiperOrigin-RevId: 231623619
*	Standardize the spelling of debug info to "debuginfo" in opt flags.	River Riddle	2019-03-29	1	-2/+1
\| \| \| \|	PiperOrigin-RevId: 231610337
*	Fix getFullMemRefAsRegion() and FlatAffineConstraints::reset	Uday Bondhugula	2019-03-29	1	-9/+15
\| \| \| \|	PiperOrigin-RevId: 231426734
*	Support fusing loop nests which require insertion into a new instruction ↵	MLIR Team	2019-03-29	1	-89/+203
\| \| \| \| \| \| \| \|	Block position while preserving dependences, opening up additional fusion opportunities. - Adds SSA Value edges to the data dependence graph used in the loop fusion pass. PiperOrigin-RevId: 231417649
*	Recommit: Define a AffineOps dialect as well as an AffineIfOp operation. ↵	River Riddle	2019-03-29	6	-79/+67
\| \| \| \| \| \|	Replace all instances of IfInst with AffineIfOp and delete IfInst. PiperOrigin-RevId: 231342063
*	Automated rollback of changelist 231318632.	Nicolas Vasilache	2019-03-29	6	-67/+79
\| \| \| \|	PiperOrigin-RevId: 231327161
*	Define a AffineOps dialect as well as an AffineIfOp operation. Replace all ↵	River Riddle	2019-03-29	6	-79/+67
\| \| \| \| \| \|	instances of IfInst with AffineIfOp and delete IfInst. PiperOrigin-RevId: 231318632
*	Replace too obscure usage of functional::map by declare + reserve + loop.	Nicolas Vasilache	2019-03-29	1	-15/+21
\| \| \| \| \| \| \|	Cleanup a usage of functional::map that is deemed too obscure in `reindexAffineIndices`. Also fix a stale comment in `reindexAffineIndices`. PiperOrigin-RevId: 231211184
*	Change AffineApplyOp to produce a single result, simplifying the code that	Chris Lattner	2019-03-29	9	-42/+27
\| \| \| \| \| \|	works with it, and updating the g3docs. PiperOrigin-RevId: 231120927
*	Change the ForInst induction variable to be a block argument of the body ↵	River Riddle	2019-03-29	7	-27/+37
\| \| \| \| \| \|	instead of the ForInst itself. This is a necessary step in converting ForInst into an operation. PiperOrigin-RevId: 231064139
*	Drop AffineMap::Null and IntegerSet::Null	Nicolas Vasilache	2019-03-29	3	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Addresses b/122486036 This CL addresses some leftover crumbs in AffineMap and IntegerSet by removing the Null method and cleaning up the constructors. As the ::Null uses were tracked down, opportunities appeared to untangle some of the Parsing logic and make it explicit where AffineMap/IntegerSet have ambiguous syntax. Previously, ambiguous cases were hidden behind the implicit pointer values of AffineMap* and IntegerSet* that were passed as function parameters. Depending the values of those pointers one of 3 behaviors could occur. This parsing logic convolution is one of the rare cases where I would advocate for code duplication. The more proper fix would be to make the syntax unambiguous or to allow some lookahead. PiperOrigin-RevId: 231058512
*	Cleanup resource management and rename recursive matchers	Nicolas Vasilache	2019-03-29	5	-27/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL follows up on a memory leak issue related to SmallVector growth that escapes the BumpPtrAllocator. The fix is to properly use ArrayRef and placement new to define away the issue. The following renaming is also applied: 1. MLFunctionMatcher -> NestedPattern 2. MLFunctionMatches -> NestedMatch As a consequence all allocations are now guaranteed to live on the BumpPtrAllocator. PiperOrigin-RevId: 231047766
*	Wrap cl::opt flags within passes in a category with the pass name. This ↵	River Riddle	2019-03-29	7	-25/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	improves the help output of tools like mlir-opt. Example: dma-generate options: -dma-fast-mem-capacity - Set fast memory space ... -dma-fast-mem-space=<uint> - Set fast memory space ... loop-fusion options: -fusion-compute-tolerance=<number> - Fractional increase in ... -fusion-maximal - Enables maximal loop fusion loop-tile options: -tile-size=<uint> - Use this tile size for ... loop-unroll options: -unroll-factor=<uint> - Use this unroll factor ... -unroll-full - Fully unroll loops -unroll-full-threshold=<uint> - Unroll all loops with ... -unroll-num-reps=<uint> - Unroll innermost loops ... loop-unroll-jam options: -unroll-jam-factor=<uint> - Use this unroll jam factor ... PiperOrigin-RevId: 231019363
*	Update replaceAllMemRefUsesWith to generate single result affine_apply's for	Uday Bondhugula	2019-03-29	2	-16/+21
\| \| \| \| \| \| \| \| \| \| \|	index remapping - generate a sequence of single result affine_apply's for the index remapping (instead of one multi result affine_apply) - update dma-generate and loop-fusion test cases; while on this, change test cases to use single result affine apply ops - some fusion comment fix/cleanup PiperOrigin-RevId: 230985830
*	Update createAffineComputationSlice to generate single result affine maps	Uday Bondhugula	2019-03-29	2	-21/+31
\| \| \| \| \| \| \| \| \| \|	- Update createAffineComputationSlice to generate a sequence of single result affine apply ops instead of one multi-result affine apply - update pipeline-data-transfer test case; while on this, also update the test case to use only single result affine maps, and make it more robust to change. PiperOrigin-RevId: 230965478
*	Allow operations to hold a blocklist and add support for parsing/printing a ↵	River Riddle	2019-03-29	4	-11/+38
\| \| \| \| \| \|	block list for verbose printing. PiperOrigin-RevId: 230951462
*	Generic dialect conversion pass exercised by LLVM IR lowering	Alex Zinenko	2019-03-29	1	-0/+313
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit introduces a generic dialect conversion/lowering/legalization pass and illustrates it on StandardOps->LLVMIR conversion. It partially reuses the PatternRewriter infrastructure and adds the following functionality: - an actual pass; - non-default pattern constructors; - one-to-many rewrites; - rewriting terminators with successors; - not applying patterns iteratively (unlike the existing greedy rewrite driver); - ability to change function signature; - ability to change basic block argument types. The latter two things required, given the existing API, to create new functions in the same module. Eventually, this should converge with the rest of PatternRewriter. However, we may want to keep two pass versions: "heavy" with function/block argument conversion and "light" that only touches operations. This pass creates new functions within a module as a means to change function signature, then creates new blocks with converted argument types in the new function. Then, it traverses the CFG in DFS-preorder to make sure defs are converted before uses in the dominated blocks. The generic pass has a minimal interface with two hooks: one to fill in the set of patterns, and another one to convert types for functions and blocks. The patterns are defined as separate classes that can be table-generated in the future. The LLVM IR lowering pass partially inherits from the existing LLVM IR translator, in particular for type conversion. It defines a conversion pattern template, instantiated for different operations, and is a good candidate for tablegen. The lowering does not yet support loads and stores and is not connected to the translator as it would have broken the existing flows. Future patches will add missing support before switching the translator in a single patch. PiperOrigin-RevId: 230951202
*	Fix return value logic / error reporting in -dma-generate	Uday Bondhugula	2019-03-29	1	-4/+6
\| \| \| \|	PiperOrigin-RevId: 230906158
*	Change the dependence check in the loop fusion pass to use the MLIR ↵	MLIR Team	2019-03-29	1	-13/+32
\| \| \| \| \| \|	instruction list ordering (instead of the dependence graph node id ordering). This breaks the overloading of dependence graph node ids as both edge endpoints and instruction list position. PiperOrigin-RevId: 230849232
*	Update dma-generate: update for multiple load/store op's per memref	Uday Bondhugula	2019-03-29	2	-18/+130
\| \| \| \| \| \| \| \| \| \|	- introduce a way to compute union using symbolic rectangular bounding boxes - handle multiple load/store op's to the same memref by taking a union of the regions - command-line argument to provide capacity of the fast memory space - minor change to replaceAllMemRefUsesWith to not generate affine_apply if the supplied index remap was identity PiperOrigin-RevId: 230848185
*	loop-fusion: debug info cleanup	Uday Bondhugula	2019-03-29	1	-27/+33
\| \| \| \|	PiperOrigin-RevId: 230817383
*	Introduce a new operation hook point for implementing simple local	Chris Lattner	2019-03-29	1	-7/+44
\| \| \| \| \| \| \| \| \| \| \|	canonicalizations of operations. The ultimate important user of this is going to be a funcBuilder->foldOrCreate<YourOp>(...) API, but for now it is just a more convenient way to write certain classes of canonicalizations (see the change in StandardOps.cpp). NFC. PiperOrigin-RevId: 230770021
*	Add cloning functionality to Block and Function, this also adds support for ↵	River Riddle	2019-03-29	3	-13/+12
\| \| \| \| \| \|	remapping successor block operands of terminator operations. We define a new BlockAndValueMapping class to simplify mapping between cloned values. PiperOrigin-RevId: 230768759
*	Minor updates + cleanup to dma-generate	Uday Bondhugula	2019-03-29	1	-10/+16
\| \| \| \| \| \| \| \| \| \| \|	- switch some debug info to emitError - use a single constant op for zero index to make it easier to write/update test cases; avoid creating new constant op's for common zero index cases - test case cleanup This is in preparation for an upcoming major update to this pass. PiperOrigin-RevId: 230728379
*	Add a function pass to strip debug info from functions and instructions.	River Riddle	2019-03-29	1	-0/+50
\| \| \| \|	PiperOrigin-RevId: 230654315
*	Migrate VectorOrTensorType/MemRefType shape api to use int64_t instead of int.	River Riddle	2019-03-29	7	-16/+17
\| \| \| \|	PiperOrigin-RevId: 230605756
*	Fix single producer check in loop fusion pass.	MLIR Team	2019-03-29	1	-3/+3
\| \| \| \|	PiperOrigin-RevId: 230565482
*	Update fusion cost model + some additional infrastructure and debug ↵	Uday Bondhugula	2019-03-29	2	-45/+203
\| \| \| \| \| \| \| \| \| \| \| \| \|	information for -loop-fusion - update fusion cost model to fuse while tolerating a certain amount of redundant computation; add cl option -fusion-compute-tolerance evaluate memory footprint and intermediate memory reduction - emit debug info from -loop-fusion showing what was fused and why - introduce function to compute memory footprint for a loop nest - getMemRefRegion readability update - NFC PiperOrigin-RevId: 230541857
*	loop unroll update: unroll factor one for a single iteration loop	Uday Bondhugula	2019-03-29	1	-1/+4
\| \| \| \| \| \| \| \|	- unrolling a single iteration loop by a factor of one should promote its body into its parent; this makes it consistent with the behavior/expectation that unrolling a loop by a factor equal to its trip count makes the loop go away. PiperOrigin-RevId: 230426499
*	Refactor -dma-generate walker - NFC	Uday Bondhugula	2019-03-29	1	-33/+27
\| \| \| \| \| \| \|	- ForInst::walkOps will also be used in an upcoming CL (cl/229438679); better to have this instead of deriving from the InstWalker PiperOrigin-RevId: 230413820
*	Allocate private/local buffers for slices accurately during fusion	Uday Bondhugula	2019-03-29	4	-44/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- the size of the private memref created for the slice should be based on the memref region accessed at the depth at which the slice is being materialized, i.e., symbolic in the outer IVs up until that depth, as opposed to the region accessed based on the entire domain. - leads to a significant contraction of the temporary / intermediate memref whenever the memref isn't reduced to a single scalar (through store fwd'ing). Other changes - update to promoteIfSingleIteration - avoid introducing unnecessary identity map affine_apply from IV; makes it much easier to write and read test cases and pass output for all passes that use promoteIfSingleIteration; loop-fusion test cases become much simpler - fix replaceAllMemrefUsesWith bug that was exposed by the above update - 'domInstFilter' could be one of the ops erased due to a memref replacement in it. - fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was missing (the latter need not always be 1); add lbFloorDivisors output argument - rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape PiperOrigin-RevId: 230405218
*	Handle escaping memrefs in loop fusion pass:	MLIR Team	2019-03-29	1	-5/+40
\| \| \| \| \| \| \|	) Do not remove loop nests which write to memrefs which escape the function. ) Do not remove memrefs which escape the function (e.g. are used in the return instruction). PiperOrigin-RevId: 230398630
*	Cleanup EDSCs	Nicolas Vasilache	2019-03-29	1	-2/+2
\| \| \| \| \| \| \| \|	This CL performs a bunch of cleanups related to EDSCs that are generally useful in the context of using them with a simple wrapping C API (not in this CL) and with simple language bindings to Python and Swift. PiperOrigin-RevId: 230066505
*	Mark (void)indexRemap to please compiler for unused variable check	Lei Zhang	2019-03-29	1	-0/+1
\| \| \| \|	PiperOrigin-RevId: 229957023
*	LoopFusion: Creates private MemRefs which are used only by operations in the ↵	MLIR Team	2019-03-29	1	-38/+193
\| \| \| \| \| \| \| \| \| \|	fused loop. ) Enables reduction of private memref size based on MemRef region accessed by fused slice. ) Enables maximal fusion by creating a private memref to break a fusion-preventing dependence. *) Adds maximal fusion flag to enable fusing as much as possible (though it still fuses the minimum cost computation slice). PiperOrigin-RevId: 229936698
*	Update comment in the constant folding pass as constant folding is supported ↵	Smit Hinsu	2019-03-29	1	-2/+2
\| \| \| \| \| \|	even when not all operands are constants PiperOrigin-RevId: 229670189
*	Fix improperly indexed DimOp in LowerVectorTransfers.cpp	Nicolas Vasilache	2019-03-29	1	-52/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL fixes a misunderstanding in how to build DimOp which triggered execution issues in the CPU path. The problem is that, given a `memref<?x4x?x8x?xf32>`, the expressions to construct the dynamic dimensions should be: `dim %arg, 0 : memref<?x4x?x8x?xf32>` `dim %arg, 2 : memref<?x4x?x8x?xf32>` and `dim %arg, 4 : memref<?x4x?x8x?xf32>` Before this CL, we wold construct: `dim %arg, 0 : memref<?x4x?x8x?xf32>` `dim %arg, 1 : memref<?x4x?x8x?xf32>` `dim %arg, 2 : memref<?x4x?x8x?xf32>` and expect the other dimensions to be constants. This assumption seems consistent at first glance with the syntax of alloc: ``` %tensor = alloc(%M, %N, %O) : memref<?x4x?x8x?xf32> ``` But this was actuallyincorrect. This CL also makes the relevant functions available to EDSCs and removes duplication of the incorrect function. PiperOrigin-RevId: 229622766
*	Some loop fusion code cleanup/simplification post cl/229575126	Uday Bondhugula	2019-03-29	1	-33/+16
\| \| \| \| \| \|	- enforce the assumptions better / in a simpler way PiperOrigin-RevId: 229612424
*	LoopFusion improvements:	MLIR Team	2019-03-29	1	-164/+244
\| \| \| \| \| \| \| \|	) Adds support for fusing into consumer loop nests with multiple loads from the same memref. ) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth. *) Removes dependence on src loop depth and simplifies cost model computation. PiperOrigin-RevId: 229575126