bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Enable input-reuse fusion to search function arguments for fusion candidates ↵	MLIR Team	2019-03-29	1	-0/+70
\| \| \| \| \| \|	(takes care of a TODO, enables another tutorial test case). PiperOrigin-RevId: 240979894
*	Change the vectorizer test pass to output via diagnostics instead of ↵	River Riddle	2019-03-29	3	-6/+6
\| \| \| \| \| \|	llvm::outs. This allows for the output to be deterministic when multi-threading is enabled. PiperOrigin-RevId: 240905858
*	Change the muli-return syntax for operations. The name of the operation ↵	River Riddle	2019-03-29	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	result now contains the number of results that it refers to if the number of results is greater than 1. Example: %call:2 = call @multi_return() : () -> (f32, i32) use(%calltensorflow/mlir#0, %calltensorflow/mlir#1) This cl also adds parser support for uniquely named result values. This means that a test writer can now write something like: %foo, %bar = call @multi_return() : () -> (f32, i32) use(%foo, %bar) Note: The printer will still print the collapsed form. PiperOrigin-RevId: 240860058
*	Remove overly conservative check in LoopFusion pass (enables fusion in ↵	MLIR Team	2019-03-29	1	-9/+60
\| \| \| \| \| \|	tutorial example). PiperOrigin-RevId: 240859227
*	Replace remaining usages of the Instruction class with Operation.	River Riddle	2019-03-29	3	-3/+3
\| \| \| \|	PiperOrigin-RevId: 240777521
*	Cleanup vectorize_1d.mlir test - NFC	Nicolas Vasilache	2019-03-29	1	-82/+247
\| \| \| \| \| \|	This CL splits a large monolithic test function into smaller ones that are each CHECK-LABEL'd PiperOrigin-RevId: 240684979
*	Make vectorization aware of loop semantics	Nicolas Vasilache	2019-03-29	1	-0/+16
\| \| \| \| \| \|	Now that we have a dependence analysis, we can check that loops are indeed parallel and make vectorization correct. PiperOrigin-RevId: 240682727
*	NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for' and ↵	River Riddle	2019-03-29	28	-932/+932
\| \| \| \| \| \|	set the namespace of the AffineOps dialect to 'affine'. PiperOrigin-RevId: 240165792
*	NFC: Rename the 'if' operation in the AffineOps dialect to 'affine.if'.	River Riddle	2019-03-29	6	-46/+46
\| \| \| \|	PiperOrigin-RevId: 240071154
*	Support composition of symbols in AffineApplyOp	Nicolas Vasilache	2019-03-29	2	-19/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL revisits the composition of AffineApplyOp for the special case where a symbol itself comes from an AffineApplyOp. This is achieved by rewriting such symbols into dims to allow composition to occur mathematically. The implementation is also refactored to improve readability. Rationale for locally rewriting symbols as dims: ================================================ The mathematical composition of AffineMap must always concatenate symbols because it does not have enough information to do otherwise. For example, composing `(d0)[s0] -> (d0 + s0)` with itself must produce `(d0)[s0, s1] -> (d0 + s0 + s1)`. The result is only equivalent to `(d0)[s0] -> (d0 + 2 * s0)` when applied to the same mlir::Value* for both s0 and s1. As a consequence mathematical composition of AffineMap always concatenates symbols. When AffineMaps are used in AffineApplyOp however, they may specify composition via symbols, which is ambiguous mathematically. This corner case is handled by locally rewriting such symbols that come from AffineApplyOp into dims and composing through dims. PiperOrigin-RevId: 239791597
*	Port LowerVectorTransfers from EDSC + AST to declarative builders	Nicolas Vasilache	2019-03-29	1	-43/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL removes the dependency of LowerVectorTransfers on the AST version of EDSCs which will be retired. This exhibited a pretty fundamental staging difference in AST-based vs declarative based emission. Since the delayed creation with an AST was staged, the loop order came into existence after the clipping expressions were computed. This now changes as the loops first need to be created declaratively in fixed order and then the clipping expressions are created. Also, due to lack of staging, coalescing cannot be done on the fly anymore and needs to be done either as a pre-pass (current implementation) or as a local transformation on the generated IR (future work). Tests are updated accordingly. PiperOrigin-RevId: 238971631
*	Change parallelism detection test pass to emit a note	Uday Bondhugula	2019-03-29	1	-4/+7
\| \| \| \| \| \| \|	- emit a note on the loop being parallel instead of setting a loop attribute - rename the pass -test-detect-parallel (from -detect-parallel) PiperOrigin-RevId: 238122847
*	Fix misc bugs / TODOs / other improvements to analysis utils	Uday Bondhugula	2019-03-29	2	-4/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- fix for getConstantBoundOnDimSize: floordiv -> ceildiv for extent - make getConstantBoundOnDimSize also return the identifier upper bound - fix unionBoundingBox to correctly use the divisor and upper bound identified by getConstantBoundOnDimSize - deal with loop step correctly in addAffineForOpDomain (covers most cases now) - fully compose bound map / operands and simplify/canonicalize before adding dim/symbol to FlatAffineConstraints; fixes false positives in -memref-bound-check; add test case there - expose mlir::isTopLevelSymbol from AffineOps PiperOrigin-RevId: 238050395
*	Extend loop unrolling and unroll-jamming to non-matching bound operands and	Uday Bondhugula	2019-03-29	2	-253/+297
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	multi-result upper bounds, complete TODOs, fix/improve test cases. - complete TODOs for loop unroll/unroll-and-jam. Something as simple as "for %i = 0 to %N" wasn't being unrolled earlier (unless it had been written as "for %i = ()[s0] -> (0)()[%N] to %N"; addressed now. - update/replace getTripCountExpr with buildTripCountMapAndOperands; makes it more powerful as it composes inputs into it - getCleanupLowerBound and getUnrolledLoopUpperBound actually needed the same code; refactor and remove one. - reorganize test cases, write previous ones better; most of these changes are "label replacements". - fix wrongly labeled test cases in unroll-jam.mlir PiperOrigin-RevId: 238014653
*	Clean up some stray mlfunc/cfgfunc leftovers.	MLIR Team	2019-03-29	2	-42/+24
\| \| \| \|	PiperOrigin-RevId: 237936610
*	Add a basic model to set tile sizes + some cleanup	Uday Bondhugula	2019-03-29	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- compute tile sizes based on a simple model that looks at memory footprints (instead of using the hardcoded default value) - adjust tile sizes to make them factors of trip counts based on an option - update loop fusion CL options to allow setting maximal fusion at pass creation - change an emitError to emitWarning (since it's not a hard error unless the client treats it that way, in which case, it can emit one) $ mlir-opt -debug-only=loop-tile -loop-tile test/Transforms/loop-tiling.mlir test/Transforms/loop-tiling.mlir:81:3: note: using tile sizes [4 4 5 ] for %i = 0 to 256 { for %i0 = 0 to 256 step 4 { for %i1 = 0 to 256 step 4 { for %i2 = 0 to 250 step 5 { for %i3 = #map4(%i0) to #map11(%i0) { for %i4 = #map4(%i1) to #map11(%i1) { for %i5 = #map4(%i2) to #map12(%i2) { %0 = load %arg0[%i3, %i5] : memref<8x8xvector<64xf32>> %1 = load %arg1[%i5, %i4] : memref<8x8xvector<64xf32>> %2 = load %arg2[%i3, %i4] : memref<8x8xvector<64xf32>> %3 = mulf %0, %1 : vector<64xf32> %4 = addf %2, %3 : vector<64xf32> store %4, %arg2[%i3, %i4] : memref<8x8xvector<64xf32>> } } } } } } PiperOrigin-RevId: 237461836
*	Use FlatAffineConstraints::unionBoundingBox to perform slice bounds union ↵	MLIR Team	2019-03-29	1	-0/+39
\| \| \| \| \| \| \| \| \|	for loop fusion pass (WIP). Adds utility to convert slice bounds to a FlatAffineConstraints representation. Adds utility to FlatAffineConstraints to promote loop IV symbol identifiers to dim identifiers. PiperOrigin-RevId: 236973261
*	DMA generation CL flag update	Uday Bondhugula	2019-03-29	1	-67/+67
\| \| \| \| \| \| \|	- allow mem capacity to be overridden by command-line flag - change default fast mem space to 2 PiperOrigin-RevId: 236951598
*	Add missing run command to fusion test cases - follow up to cl/236882988	Uday Bondhugula	2019-03-29	1	-1/+2
\| \| \| \|	PiperOrigin-RevId: 236947383
*	Fix and improve detectAsMod	Uday Bondhugula	2019-03-29	1	-0/+150
\| \| \| \| \| \| \| \|	- fix for the mod detection - simplify/avoid the mod at construction (if the dividend is already known to be less than the divisor), since the information is available at hand there PiperOrigin-RevId: 236882988
*	Make sure that fusion test cases don't have out of bounds accesses	Uday Bondhugula	2019-03-29	1	-7/+7
\| \| \| \| \| \| \| \|	- fix out of bounds test case - -memref-bound-check on the test/Transforms/loop-fusion.mlir no longer reports any errors, before or after -loop-fusion is run PiperOrigin-RevId: 236757658
*	Adds loop attribute as a temporary work around to prevent slice fusion of ↵	MLIR Team	2019-03-29	1	-0/+37
\| \| \| \| \| \|	loop nests containing instructions with side effects (the proper solution will be do use memref read/write regions in the future). PiperOrigin-RevId: 236733739
*	Bug fix for getConstantBoundOnDimSize	Uday Bondhugula	2019-03-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	- this was detected when memref-bound-check was run on the output of the loop-fusion pass - the addition (to represent ceildiv as a floordiv) had to be performed only for the constant term of the constraint - update test cases - memref-bound-check no longer returns an error on the output of this test case PiperOrigin-RevId: 236731137
*	Set the namespace of the StandardOps dialect to "std", but add a special ↵	River Riddle	2019-03-29	3	-20/+20
\| \| \| \| \| \|	case to the parser to allow parsing standard operations without the "std" prefix. This will now allow for the standard dialect to be looked up dynamically by name. PiperOrigin-RevId: 236493865
*	A simple pass to detect and mark all parallel loops	Uday Bondhugula	2019-03-29	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- detect all parallel loops based on dep information and mark them with a "parallel" attribute - add mlir::isLoopParallel(OpPointer<AffineForOp> ...), and refactor an existing method to use that (reuse some code from @andydavis (cl/236007073) for this) - a simple/meaningful way to test memref dep test as well Ex: $ mlir-opt -detect-parallel test/Transforms/parallelism-detection.mlir #map1 = ()[s0] -> (s0) func @foo(%arg0: index) { %0 = alloc() : memref<1024x1024xvector<64xf32>> %1 = alloc() : memref<1024x1024xvector<64xf32>> %2 = alloc() : memref<1024x1024xvector<64xf32>> for %i0 = 0 to %arg0 { for %i1 = 0 to %arg0 { for %i2 = 0 to %arg0 { %3 = load %0[%i0, %i2] : memref<1024x1024xvector<64xf32>> %4 = load %1[%i2, %i1] : memref<1024x1024xvector<64xf32>> %5 = load %2[%i0, %i1] : memref<1024x1024xvector<64xf32>> %6 = mulf %3, %4 : vector<64xf32> %7 = addf %5, %6 : vector<64xf32> store %7, %2[%i0, %i1] : memref<1024x1024xvector<64xf32>> } {parallel: false} } {parallel: true} } {parallel: true} return } PiperOrigin-RevId: 236367368
*	Loop fusion for input reuse.	MLIR Team	2019-03-29	1	-2/+149
\| \| \| \| \| \| \| \| \| \| \| \| \|	) Breaks fusion pass into multiple sub passes over nodes in data dependence graph: - first pass fuses single-use producers into their unique consumer. - second pass enables fusing for input-reuse by fusing sibling nodes which read from the same memref, but which do not share dependence edges. - third pass fuses remaining producers into their consumers (Note that the sibling fusion pass may have transformed a producer with multiple uses into a single-use producer). ) Fusion for input reuse is enabled by computing a sibling node slice using the load/load accesses to the same memref, and fusion safety is guaranteed by checking that the sibling node memref write region (to a different memref) is preserved. ) Enables output vector and output matrix computations from KFAC patches-second-moment operation to fuse into a single loop nest and reuse input from the image patches operation. ) Adds a generic loop utilitiy for finding all sequential loops in a loop nest. *) Adds and updates unit tests. PiperOrigin-RevId: 236350987
*	Analysis support for floordiv/mod's in loop bounds/	Uday Bondhugula	2019-03-29	1	-0/+46
\| \| \| \| \| \| \| \| \|	- handle floordiv/mod's in loop bounds for all analysis purposes - allows fusion slicing to be more powerful - add simple test cases based on -memref-bound-check - fusion based test cases in follow up CLs PiperOrigin-RevId: 236328551
*	Change some of the debug messages to use emitError / emitWarning / emitNote ↵	Uday Bondhugula	2019-03-29	1	-0/+1
\| \| \| \| \| \|	- NFC PiperOrigin-RevId: 236169676
*	Detect more trivially redundant constraints better	Uday Bondhugula	2019-03-29	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- detect more trivially redundant constraints in FlatAffineConstraints::removeTrivialRedundantConstraints. Redundancy due to constraints that only differ in the constant part (eg., 32i + 64j - 3 >= 0, 32 + 64j - 8 >= 0) is now detected. The method is still linear-time and does a single scan over the FlatAffineConstraints buffer. This detection is useful and needed to eliminate redundant constraints generated after FM elimination. - update GCDTightenInequalities so that we also normalize by the GCD while at it. This way more constraints will show up as redundant (232i - 203 >= 0 becomes i - 1 >= 0 instead of 232i - 232 >= 0) without having to call normalizeConstraintsByGCD. - In FourierMotzkinEliminate, call GCDTightenInequalities and normalizeConstraintsByGCD before calling removeTrivialRedundantConstraints() - so that more redundant constraints are detected. As a result, redundancy due to constraints like i - 5 >= 0, i - 7 >= 0, 2i - 5 >= 0, 232i - 203 >= 0 is now detected (here only i >= 7 is non-redundant). As a result of these, a -memref-bound-check on the added test case runs in 16ms instead of 1.35s (opt build) and no longer returns a conservative result. PiperOrigin-RevId: 235983550
*	Fix bug in memref region computation with slice loop bounds. Adds loop IV ↵	MLIR Team	2019-03-29	1	-0/+47
\| \| \| \| \| \|	values to ComputationSliceState which are used in FlatAffineConstraints::addSliceBounds, to ensure that constraints are only added for loop IV values which are present in the constraint system. PiperOrigin-RevId: 235952912
*	Rewrite the dominance info classes to allow for operating on arbitrary ↵	River Riddle	2019-03-29	1	-1/+34
\| \| \| \| \| \|	control flow within operation regions. The CSE pass is also updated to properly handle nested dominance. PiperOrigin-RevId: 235742627
*	Extend/improve getSliceBounds() / complete TODO + update unionBoundingBox	Uday Bondhugula	2019-03-29	1	-0/+58
\| \| \| \| \| \| \| \| \|	- compute slices precisely where the destination iteration depends on multiple source iterations (instead of over-approximating to the whole source loop extent) - update unionBoundingBox to deal with input with non-matching symbols - reenable disabled backend test case PiperOrigin-RevId: 234714069
*	DMA placement update - hoist loops invariant DMAs	Uday Bondhugula	2019-03-29	1	-0/+52
\| \| \| \| \| \| \|	- hoist DMAs past all loops immediately surrounding the region that the latter is invariant on - do this at DMA generation time itself PiperOrigin-RevId: 234628447
*	Misc. updates/fixes to analysis utils used for DMA generation; update DMA	Uday Bondhugula	2019-03-29	1	-7/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	generation pass to make it drop certain assumptions, complete TODOs. - multiple fixes for getMemoryFootprintBytes - pass loopDepth correctly from getMemoryFootprintBytes() - use union while computing memory footprints - bug fixes for addAffineForOpDomain - take into account loop step - add domains of other loop IVs in turn that might have been used in the bounds - dma-generate: drop assumption of "non-unit stride loops being tile space loops and skipping those and recursing to inner depths"; DMA generation is now purely based on available fast mem capacity and memory footprint's calculated - handle memory region compute failures/bailouts correctly from dma-generate - loop tiling cleanup/NFC - update some debug and error messages to use emitNote/emitError in pipeline-data-transfer pass - NFC PiperOrigin-RevId: 234245969
*	Support fusing producer loop nests which write to a memref which is live ↵	MLIR Team	2019-03-29	1	-0/+22
\| \| \| \| \| \|	out, provided that the write region of the consumer loop nest to the same memref is a super set of the producer's write region. PiperOrigin-RevId: 234240958
*	LoopFusion: perform a series of loop interchanges to increase the loop depth ↵	MLIR Team	2019-03-29	1	-1/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	at which slices of producer loop nests can be fused into constumer loop nests. ) Adds utility to LoopUtils to perform loop interchange of two AffineForOps. ) Adds utility to LoopUtils to sink a loop to a specified depth within a loop nest, using a series of loop interchanges. ) Computes dependences between all loads and stores in the loop nest, and classifies each loop as parallel or sequential. ) Computes loop interchange permutation required to sink sequential loops (and raise parallel loop nests) while preserving relative order among them. ) Checks each dependence against the permutation to make sure that dependences would not be violated by the loop interchange transformation. ) Calls loop interchange in LoopFusion pass on consumer loop nests before fusing in producers, sinking loops with loop carried dependences deeper into the consumer loop nest. *) Adds and updates related unit tests. PiperOrigin-RevId: 234158370
*	Update direction vector computation to use ↵	MLIR Team	2019-03-29	1	-17/+13
\| \| \| \| \| \| \| \|	FlatAffineConstraints::getLower/UpperBounds. Update FlatAffineConstraints::getLower/UpperBounds to project to the identifier for which bounds are being computed. This change enables computing bounds on an identifier which were previously dependent on the bounds of another identifier. PiperOrigin-RevId: 234017514
*	Generate dealloc's for alloc's of pipeline-data-transfer	Uday Bondhugula	2019-03-29	1	-7/+24
\| \| \| \| \| \| \| \| \| \| \| \|	- for the DMA transfers being pipelined through double buffering, generate deallocs for the double buffers being alloc'ed This change is along the lines of cl/233502632. We initially wanted to experiment with scoped allocation - so the deallocation's were usually not necessary; however, they are needed even with scoped allocations in some situations - for eg. when the enclosing loop gets unrolled. The dealloc serves as an end of lifetime marker. PiperOrigin-RevId: 233653463
*	Generate dealloc's for the alloc's of dma-generate.	Uday Bondhugula	2019-03-29	1	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- for the DMA buffers being allocated (and their tags), generate corresponding deallocs - minor related update to replaceAllMemRefUsesWith and PipelineDataTransfer pass Code generation for DMA transfers was being done with the initial simplifying assumption that the alloc's would map to scoped allocations, and so no deallocations would be necessary. Drop this assumption to generalize. Note that even with scoped allocations, unrolling loops that have scoped allocations could create a series of allocations and exhaustion of fast memory. Having a end of lifetime marker like a dealloc in fact allows creating new scopes if necessary when lowering to a backend and still utilize scoped allocation. DMA buffers created by -dma-generate are guaranteed to have either non-overlapping lifetimes or nested lifetimes. PiperOrigin-RevId: 233502632
*	Fix + cleanup for getMemRefRegion()	Uday Bondhugula	2019-03-29	1	-0/+23
\| \| \| \| \| \| \| \| \| \|	- determine symbols for the memref region correctly - this wasn't exposed earlier since we didn't have any test cases where the portion of the nest being DMAed for was non-hyperrectangular (i.e., bounds of one IV depending on other IVs within that part) PiperOrigin-RevId: 233493872
*	Automated rollback of changelist 232728977.	Uday Bondhugula	2019-03-29	6	-46/+46
\| \| \| \|	PiperOrigin-RevId: 232944889
*	Add verification for AffineApply/AffineFor/AffineIf dimension and symbol ↵	River Riddle	2019-03-29	1	-1/+1
\| \| \| \| \| \|	operands. This also allows a DimOp to be a valid dimension identifier if its operand is a valid dimension identifier. PiperOrigin-RevId: 232923468
*	Modify the canonicalizations of select and muli to use the fold hook.	River Riddle	2019-03-29	2	-96/+94
\| \| \| \| \| \|	This also extends the greedy pattern rewrite driver to add the operands of folded operations back to the worklist. PiperOrigin-RevId: 232878959
*	Automated rollback of changelist 232717775.	Uday Bondhugula	2019-03-29	27	-759/+759
\| \| \| \|	PiperOrigin-RevId: 232807986
*	Rename the 'if' operation in the AffineOps dialect to 'affine.if' and namespace	River Riddle	2019-03-29	6	-46/+46
\| \| \| \| \| \|	the AffineOps dialect with 'affine'. PiperOrigin-RevId: 232728977
*	NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for'. ↵	River Riddle	2019-03-29	27	-759/+759
\| \| \| \| \| \|	The is the second step to adding a namespace to the AffineOps dialect. PiperOrigin-RevId: 232717775
*	NFC: Rename affine_apply to affine.apply. This is the first step to adding a ↵	River Riddle	2019-03-29	19	-614/+614
\| \| \| \| \| \|	namespace to the affine dialect. PiperOrigin-RevId: 232707862
*	Move the AffineFor loop bound folding to a canonicalization pattern on the ↵	River Riddle	2019-03-29	1	-33/+0
\| \| \| \| \| \|	AffineForOp. PiperOrigin-RevId: 232610715
*	Refactor the affine analysis by moving some functionality to IR and some to ↵	River Riddle	2019-03-29	2	-300/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	AffineOps. This is important for allowing the affine dialect to define canonicalizations directly on the operations instead of relying on transformation passes, e.g. ComposeAffineMaps. A summary of the refactoring: * AffineStructures has moved to IR. * simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR. * makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps. * ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted. PiperOrigin-RevId: 232586468
*	Loop fusion improvements:	MLIR Team	2019-03-29	1	-15/+58
\| \| \| \| \| \| \|	) After a private memref buffer is created for a fused loop nest, dependences on the old memref are reduced, which can open up fusion opportunities. In these cases, users of the old memref are added back to the worklist to be reconsidered for fusion. ) Fixed a bug in fusion insertion point dependence check where the memref being privatized was being skipped from the check. PiperOrigin-RevId: 232477853