bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Refactor and share common code across addAffineForOpDomain / addSliceBounds	Uday Bondhugula	2019-03-29	1	-13/+1
\| \| \| \|	PiperOrigin-RevId: 237508755
*	Convert ambiguous bool returns in /Analysis to use Status instead.	River Riddle	2019-03-29	1	-39/+40
\| \| \| \|	PiperOrigin-RevId: 237390240
*	Add FlatAffineConstraints::containsId to avoid using findId when position isn't	Uday Bondhugula	2019-03-29	1	-5/+2
\| \| \| \| \| \| \|	needed + other cleanup - clean up unionBoundingBox (hoist SmallVector allocations out of loop). PiperOrigin-RevId: 237141668
*	Use FlatAffineConstraints::unionBoundingBox to perform slice bounds union ↵	MLIR Team	2019-03-29	1	-0/+49
\| \| \| \| \| \| \| \| \|	for loop fusion pass (WIP). Adds utility to convert slice bounds to a FlatAffineConstraints representation. Adds utility to FlatAffineConstraints to promote loop IV symbol identifiers to dim identifiers. PiperOrigin-RevId: 236973261
*	Adds loop attribute as a temporary work around to prevent slice fusion of ↵	MLIR Team	2019-03-29	1	-13/+18
\| \| \| \| \| \|	loop nests containing instructions with side effects (the proper solution will be do use memref read/write regions in the future). PiperOrigin-RevId: 236733739
*	Update addSliceBounds to deal with loops with floor's/mod's.	Uday Bondhugula	2019-03-29	1	-3/+6
\| \| \| \| \| \| \| \| \|	- This change only impacts the cost model for fusion, given the way addSliceBounds was being used. It so happens that the output in spite of this CL's fix is the same; however, the assertions added no longer fail. (an invalid/inconsistent memref region was being used earlier). PiperOrigin-RevId: 236405030
*	NFC. Move all of the remaining operations left in BuiltinOps to StandardOps. ↵	River Riddle	2019-03-29	1	-1/+0
\| \| \| \| \| \|	The only thing left in BuiltinOps are the core MLIR types. The standard types can't be moved because they are referenced within the IR directory, e.g. in things like Builder. PiperOrigin-RevId: 236403665
*	Use consistent names for dialect op source files	Lei Zhang	2019-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL changes dialect op source files (.h, .cpp, .td) to follow the following convention: <full-dialect-name>/<dialect-namespace>Ops.{h\|cpp\|td} Builtin and standard dialects are specially treated, though. Both of them do not have dialect namespace; the former is still named as BuiltinOps.* and the latter is named as Ops.*. Purely mechanical. NFC. PiperOrigin-RevId: 236371358
*	A simple pass to detect and mark all parallel loops	Uday Bondhugula	2019-03-29	1	-22/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- detect all parallel loops based on dep information and mark them with a "parallel" attribute - add mlir::isLoopParallel(OpPointer<AffineForOp> ...), and refactor an existing method to use that (reuse some code from @andydavis (cl/236007073) for this) - a simple/meaningful way to test memref dep test as well Ex: $ mlir-opt -detect-parallel test/Transforms/parallelism-detection.mlir #map1 = ()[s0] -> (s0) func @foo(%arg0: index) { %0 = alloc() : memref<1024x1024xvector<64xf32>> %1 = alloc() : memref<1024x1024xvector<64xf32>> %2 = alloc() : memref<1024x1024xvector<64xf32>> for %i0 = 0 to %arg0 { for %i1 = 0 to %arg0 { for %i2 = 0 to %arg0 { %3 = load %0[%i0, %i2] : memref<1024x1024xvector<64xf32>> %4 = load %1[%i2, %i1] : memref<1024x1024xvector<64xf32>> %5 = load %2[%i0, %i1] : memref<1024x1024xvector<64xf32>> %6 = mulf %3, %4 : vector<64xf32> %7 = addf %5, %6 : vector<64xf32> store %7, %2[%i0, %i1] : memref<1024x1024xvector<64xf32>> } {parallel: false} } {parallel: true} } {parallel: true} return } PiperOrigin-RevId: 236367368
*	Loop fusion for input reuse.	MLIR Team	2019-03-29	1	-5/+65
\| \| \| \| \| \| \| \| \| \| \| \| \|	) Breaks fusion pass into multiple sub passes over nodes in data dependence graph: - first pass fuses single-use producers into their unique consumer. - second pass enables fusing for input-reuse by fusing sibling nodes which read from the same memref, but which do not share dependence edges. - third pass fuses remaining producers into their consumers (Note that the sibling fusion pass may have transformed a producer with multiple uses into a single-use producer). ) Fusion for input reuse is enabled by computing a sibling node slice using the load/load accesses to the same memref, and fusion safety is guaranteed by checking that the sibling node memref write region (to a different memref) is preserved. ) Enables output vector and output matrix computations from KFAC patches-second-moment operation to fuse into a single loop nest and reuse input from the image patches operation. ) Adds a generic loop utilitiy for finding all sequential loops in a loop nest. *) Adds and updates unit tests. PiperOrigin-RevId: 236350987
*	Method to align/merge dimensional/symbolic identifiers between two ↵	Uday Bondhugula	2019-03-29	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FlatAffineConstraints - add a method to merge and align the spaces (identifiers) of two FlatAffineConstraints (both get dimension-wise and symbol-wise unique columns) - this completes several TODOs, gets rid of previous assumptions/restrictions in composeMap, unionBoundingBox, and reuses common code - remove previous workarounds / duplicated funcitonality in FlatAffineConstraints::composeMap and unionBoundingBox, use mergeAlignIds from both PiperOrigin-RevId: 236320581
*	Change some of the debug messages to use emitError / emitWarning / emitNote ↵	Uday Bondhugula	2019-03-29	1	-1/+1
\| \| \| \| \| \|	- NFC PiperOrigin-RevId: 236169676
*	Fix bug in memref region computation with slice loop bounds. Adds loop IV ↵	MLIR Team	2019-03-29	1	-1/+4
\| \| \| \| \| \|	values to ComputationSliceState which are used in FlatAffineConstraints::addSliceBounds, to ensure that constraints are only added for loop IV values which are present in the constraint system. PiperOrigin-RevId: 235952912
*	Refactor AffineExprFlattener and move FlatAffineConstraints out of IR into	Uday Bondhugula	2019-03-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Analysis - NFC - refactor AffineExprFlattener (-> SimpleAffineExprFlattener) so that it doesn't depend on FlatAffineConstraints, and so that FlatAffineConstraints could be moved out of IR/; the simplification that the IR needs for AffineExpr's doesn't depend on FlatAffineConstraints - have AffineExprFlattener derive from SimpleAffineExprFlattener to use for all Analysis/Transforms purposes; override addLocalFloorDivId in the derived class - turn addAffineForOpDomain into a method on FlatAffineConstraints - turn AffineForOp::getAsValueMap into an AffineValueMap ctor PiperOrigin-RevId: 235283610
*	Fix for getMemRefSizeInBytes: unsigned -> uint64_t	Uday Bondhugula	2019-03-29	1	-1/+1
\| \| \| \|	PiperOrigin-RevId: 234829637
*	Misc. updates/fixes to analysis utils used for DMA generation; update DMA	Uday Bondhugula	2019-03-29	1	-18/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	generation pass to make it drop certain assumptions, complete TODOs. - multiple fixes for getMemoryFootprintBytes - pass loopDepth correctly from getMemoryFootprintBytes() - use union while computing memory footprints - bug fixes for addAffineForOpDomain - take into account loop step - add domains of other loop IVs in turn that might have been used in the bounds - dma-generate: drop assumption of "non-unit stride loops being tile space loops and skipping those and recursing to inner depths"; DMA generation is now purely based on available fast mem capacity and memory footprint's calculated - handle memory region compute failures/bailouts correctly from dma-generate - loop tiling cleanup/NFC - update some debug and error messages to use emitNote/emitError in pipeline-data-transfer pass - NFC PiperOrigin-RevId: 234245969
*	Fix + cleanup for getMemRefRegion()	Uday Bondhugula	2019-03-29	1	-13/+16
\| \| \| \| \| \| \| \| \| \|	- determine symbols for the memref region correctly - this wasn't exposed earlier since we didn't have any test cases where the portion of the nest being DMAed for was non-hyperrectangular (i.e., bounds of one IV depending on other IVs within that part) PiperOrigin-RevId: 233493872
*	Automated rollback of changelist 232728977.	Uday Bondhugula	2019-03-29	1	-1/+1
\| \| \| \|	PiperOrigin-RevId: 232944889
*	Automated rollback of changelist 232717775.	Uday Bondhugula	2019-03-29	1	-5/+5
\| \| \| \|	PiperOrigin-RevId: 232807986
*	Rename the 'if' operation in the AffineOps dialect to 'affine.if' and namespace	River Riddle	2019-03-29	1	-1/+1
\| \| \| \| \| \|	the AffineOps dialect with 'affine'. PiperOrigin-RevId: 232728977
*	NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for'. ↵	River Riddle	2019-03-29	1	-6/+6
\| \| \| \| \| \|	The is the second step to adding a namespace to the AffineOps dialect. PiperOrigin-RevId: 232717775
*	Address post submit review comments for removing Block::findInstPositionInBlock.	River Riddle	2019-03-29	1	-1/+1
\| \| \| \|	PiperOrigin-RevId: 232713514
*	Adds the ability to compute the MemRefRegion of a sliced loop nest. Utilizes ↵	MLIR Team	2019-03-29	1	-37/+77
\| \| \| \| \| \| \| \| \|	this feature during loop fusion cost computation, to compute what the write region of a fusion candidate loop nest slice would be (without having to materialize the slice or change the IR). ) Adds parameter to public API of MemRefRegion::compute for passing in the slice loop bounds to compute the memref region of the loop nest slice. ) Exposes public method MemRefRegion::getRegionSize for computing the size of the memref region in bytes. PiperOrigin-RevId: 232706165
*	Remove findInstPositionInBlock from the Block api.	River Riddle	2019-03-29	1	-2/+3
\| \| \| \|	PiperOrigin-RevId: 232704766
*	Refactor the affine analysis by moving some functionality to IR and some to ↵	River Riddle	2019-03-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	AffineOps. This is important for allowing the affine dialect to define canonicalizations directly on the operations instead of relying on transformation passes, e.g. ComposeAffineMaps. A summary of the refactoring: * AffineStructures has moved to IR. * simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR. * makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps. * ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted. PiperOrigin-RevId: 232586468
*	NFC: Move AffineApplyOp to the AffineOps dialect. This also moves the ↵	River Riddle	2019-03-29	1	-1/+1
\| \| \| \| \| \|	isValidDim/isValidSymbol methods from Value to the AffineOps dialect. PiperOrigin-RevId: 232386632
*	Refactor common code getting memref access in getMemRefRegion - NFC	Uday Bondhugula	2019-03-29	1	-65/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- use getAccessMap() instead of repeating it - fold getMemRefRegion into MemRefRegion ctor (more natural, avoid heap allocation and unique_ptr where possible) - change extractForInductionVars - MutableArrayRef -> ArrayRef for the arguments. Since the method is just returning copies of 'Value *', the client can't mutate the pointers themselves; it's fine to mutate the 'Value''s themselves, but that doesn't mutate the pointers to those. - change the way extractForInductionVars returns (see b/123437690) PiperOrigin-RevId: 232359277
*	Update dma-generate pass to (1) work on blocks of instructions (instead of just	Uday Bondhugula	2019-03-29	1	-21/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	loops), (2) take into account fast memory space capacity and lower 'dmaDepth' to fit, (3) add location information for debug info / errors - change dma-generate pass to work on blocks of instructions (start/end iterators) instead of 'for' loops; complete TODOs - allows DMA generation for straightline blocks of operation instructions interspersed b/w loops - take into account fast memory capacity: check whether memory footprint fits in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA generation is performed until it does fit in the provided memory - add location information to MemRefRegion; any insufficient fast memory capacity errors or debug info w.r.t dma generation shows location information - allow DMA generation pass to be instantiated with a fast memory capacity option (besides command line flag) - change getMemRefRegion to return unique_ptr's - change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst' - other helper methods; add postDomInstFilter option for replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods Eg. output $ mlir-opt -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir /tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) { ^ $ mlir-opt -debug-only=dma-generate -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir /tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) { PiperOrigin-RevId: 232297044
*	Begin the process of fully removing OperationInst. This patch cleans up ↵	River Riddle	2019-03-29	1	-15/+11
\| \| \| \| \| \|	references to OperationInst in the /include, /AffineOps, and lib/Analysis. PiperOrigin-RevId: 232199262
*	Define the AffineForOp and replace ForInst with it. This patch is largely ↵	River Riddle	2019-03-29	1	-34/+42
\| \| \| \| \| \|	mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body. PiperOrigin-RevId: 232060516
*	Recommit: Define a AffineOps dialect as well as an AffineIfOp operation. ↵	River Riddle	2019-03-29	1	-15/+7
\| \| \| \| \| \|	Replace all instances of IfInst with AffineIfOp and delete IfInst. PiperOrigin-RevId: 231342063
*	Automated rollback of changelist 231318632.	Nicolas Vasilache	2019-03-29	1	-7/+15
\| \| \| \|	PiperOrigin-RevId: 231327161
*	Define a AffineOps dialect as well as an AffineIfOp operation. Replace all ↵	River Riddle	2019-03-29	1	-15/+7
\| \| \| \| \| \|	instances of IfInst with AffineIfOp and delete IfInst. PiperOrigin-RevId: 231318632
*	Change the ForInst induction variable to be a block argument of the body ↵	River Riddle	2019-03-29	1	-3/+4
\| \| \| \| \| \|	instead of the ForInst itself. This is a necessary step in converting ForInst into an operation. PiperOrigin-RevId: 231064139
*	Drop AffineMap::Null and IntegerSet::Null	Nicolas Vasilache	2019-03-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Addresses b/122486036 This CL addresses some leftover crumbs in AffineMap and IntegerSet by removing the Null method and cleaning up the constructors. As the ::Null uses were tracked down, opportunities appeared to untangle some of the Parsing logic and make it explicit where AffineMap/IntegerSet have ambiguous syntax. Previously, ambiguous cases were hidden behind the implicit pointer values of AffineMap* and IntegerSet* that were passed as function parameters. Depending the values of those pointers one of 3 behaviors could occur. This parsing logic convolution is one of the rare cases where I would advocate for code duplication. The more proper fix would be to make the syntax unambiguous or to allow some lookahead. PiperOrigin-RevId: 231058512
*	Allow operations to hold a blocklist and add support for parsing/printing a ↵	River Riddle	2019-03-29	1	-0/+8
\| \| \| \| \| \|	block list for verbose printing. PiperOrigin-RevId: 230951462
*	Update dma-generate: update for multiple load/store op's per memref	Uday Bondhugula	2019-03-29	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	- introduce a way to compute union using symbolic rectangular bounding boxes - handle multiple load/store op's to the same memref by taking a union of the regions - command-line argument to provide capacity of the fast memory space - minor change to replaceAllMemRefUsesWith to not generate affine_apply if the supplied index remap was identity PiperOrigin-RevId: 230848185
*	Add cloning functionality to Block and Function, this also adds support for ↵	River Riddle	2019-03-29	1	-2/+1
\| \| \| \| \| \|	remapping successor block operands of terminator operations. We define a new BlockAndValueMapping class to simplify mapping between cloned values. PiperOrigin-RevId: 230768759
*	Migrate VectorOrTensorType/MemRefType shape api to use int64_t instead of int.	River Riddle	2019-03-29	1	-2/+2
\| \| \| \|	PiperOrigin-RevId: 230605756
*	Update fusion cost model + some additional infrastructure and debug ↵	Uday Bondhugula	2019-03-29	1	-14/+93
\| \| \| \| \| \| \| \| \| \| \| \| \|	information for -loop-fusion - update fusion cost model to fuse while tolerating a certain amount of redundant computation; add cl option -fusion-compute-tolerance evaluate memory footprint and intermediate memory reduction - emit debug info from -loop-fusion showing what was fused and why - introduce function to compute memory footprint for a loop nest - getMemRefRegion readability update - NFC PiperOrigin-RevId: 230541857
*	Allocate private/local buffers for slices accurately during fusion	Uday Bondhugula	2019-03-29	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- the size of the private memref created for the slice should be based on the memref region accessed at the depth at which the slice is being materialized, i.e., symbolic in the outer IVs up until that depth, as opposed to the region accessed based on the entire domain. - leads to a significant contraction of the temporary / intermediate memref whenever the memref isn't reduced to a single scalar (through store fwd'ing). Other changes - update to promoteIfSingleIteration - avoid introducing unnecessary identity map affine_apply from IV; makes it much easier to write and read test cases and pass output for all passes that use promoteIfSingleIteration; loop-fusion test cases become much simpler - fix replaceAllMemrefUsesWith bug that was exposed by the above update - 'domInstFilter' could be one of the ops erased due to a memref replacement in it. - fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was missing (the latter need not always be 1); add lbFloorDivisors output argument - rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape PiperOrigin-RevId: 230405218
*	LoopFusion improvements:	MLIR Team	2019-03-29	1	-25/+31
\| \| \| \| \| \| \| \|	) Adds support for fusing into consumer loop nests with multiple loads from the same memref. ) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth. *) Removes dependence on src loop depth and simplifies cost model computation. PiperOrigin-RevId: 229575126
*	Minor code cleanup - NFC.	Uday Bondhugula	2019-03-29	1	-6/+8
\| \| \| \| \| \|	- readability changes PiperOrigin-RevId: 229443430
*	LoopFusion: automate selection of source loop nest slice depth and ↵	MLIR Team	2019-03-29	1	-30/+48
\| \| \| \| \| \| \| \| \| \| \| \|	destination loop nest insertion depth based on a simple cost model (cost model can be extended/replaced at a later time). ) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality). ) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest. ) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests). ) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths). *) Test: Adds multiple unit tests to test the new functionality. PiperOrigin-RevId: 229219757
*	Simplify compositions of AffineApply	Nicolas Vasilache	2019-03-29	1	-2/+4
\| \| \| \| \| \| \| \|	This CL is the 6th and last on the path to simplifying AffineMap composition. This removes `AffineValueMap::forwardSubstitutions` and replaces it by simple calls to `fullyComposeAffineMapAndOperands`. PiperOrigin-RevId: 228962580
*	Delete FuncBuilder::createChecked. It is perhaps still a good idea, but has no	Chris Lattner	2019-03-29	1	-2/+1
\| \| \| \| \| \| \| \|	clients. Let's re-add it in the future if there is ever a reason to. NFC. Unrelatedly, add a use of a variable to unbreak the non-assert build. PiperOrigin-RevId: 228284026
*	Fix 0-d memref corner case for getMemRefRegion()	Uday Bondhugula	2019-03-29	1	-0/+9
\| \| \| \| \| \| \|	- fix crash on test/Transforms/canonicalize.mlir with -memref-bound-check PiperOrigin-RevId: 228268486
*	Extend loop-fusion's slicing utility + other fixes / updates	Uday Bondhugula	2019-03-29	1	-64/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- refactor toAffineFromEq and the code surrounding it; refactor code into FlatAffineConstraints::getSliceBounds - add FlatAffineConstraints methods to detect identifiers as mod's and div's of other identifiers - add FlatAffineConstraints::getConstantLower/UpperBound - Address b/122118218 (don't assert on invalid fusion depths cmdline flags - instead, don't do anything; change cmdline flags src-loop-depth -> fusion-src-loop-depth - AffineExpr/Map print method update: don't fail on null instances (since we have a wrapper around a pointer, it's avoidable); rationale: dump/print methods should never fail if possible. - Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to IsRangeOneToOne when it's trivially going to be true. - Add additional test cases to exercise the new support - update a few existing test cases since the maps are now generated uniformly with all destination loop operands appearing for the backward slice - Fix projectOut - fix wrong range for getBestElimCandidate. - Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since we didn't have any non-hyperrectangular ones. PiperOrigin-RevId: 228265152
*	Misc readability and doc / code comment related improvements - NFC	Uday Bondhugula	2019-03-29	1	-16/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- when SSAValue/MLValue existed, code at several places was forced to create additional aggregate temporaries of SmallVector<SSAValue/MLValue> to handle the conversion; get rid of such redundant code - use filling ctors instead of explicit loops - for smallvectors, change insert(list.end(), ...) -> append(... - improve comments at various places - turn getMemRefAccess into MemRefAccess ctor and drop duplicated getMemRefAccess. In the next CL, provide getAccess() accessors for load, store, DMA op's to return a MemRefAccess. PiperOrigin-RevId: 228243638
*	Complete TODOs / cleanup for loop-fusion utility	Uday Bondhugula	2019-03-29	1	-10/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- this is CL 1/2 that does a clean up and gets rid of one limitation in an underlying method - as a result, fusion works for more cases. - fix bugs/incomplete impl. in toAffineMapFromEq - fusing across rank changing reshapes for example now just works For eg. given a rank 1 memref to rank 2 memref reshape (64 -> 8 x 8) like this, -loop-fusion -memref-dataflow-opt now completely fuses and inlines/store-forward to get rid of the temporary: INPUT // Rank 1 -> Rank 2 reshape for %i0 = 0 to 64 { %v = load %A[%i0] store %v, %B[%i0 floordiv 8, i0 mod 8] } for %i1 = 0 to 8 for %i2 = 0 to 8 %w = load %B[%i1, i2] "foo"(%w) : (f32) -> () OUTPUT $ mlir-opt -loop-fusion -memref-dataflow-opt fuse_reshape.mlir #map0 = (d0, d1) -> (d0 * 8 + d1) mlfunc @fuse_reshape(%arg0: memref<64xf32>) { for %i0 = 0 to 8 { for %i1 = 0 to 8 { %0 = affine_apply #map0(%i0, %i1) %1 = load %arg0[%0] : memref<64xf32> "foo"(%1) : (f32) -> () } } } AFAIK, there is no polyhedral tool / compiler that can perform such fusion - because it's not really standard loop fusion, but possible through a generalized slicing-based approach such as ours. PiperOrigin-RevId: 227918338