| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
PiperOrigin-RevId: 237508755
|
|
|
|
| |
PiperOrigin-RevId: 237390240
|
|
|
|
|
|
|
| |
needed + other cleanup
- clean up unionBoundingBox (hoist SmallVector allocations out of loop).
PiperOrigin-RevId: 237141668
|
|
|
|
|
|
|
|
|
| |
for loop fusion pass (WIP).
Adds utility to convert slice bounds to a FlatAffineConstraints representation.
Adds utility to FlatAffineConstraints to promote loop IV symbol identifiers to dim identifiers.
PiperOrigin-RevId: 236973261
|
|
|
|
|
|
| |
loop nests containing instructions with side effects (the proper solution will be do use memref read/write regions in the future).
PiperOrigin-RevId: 236733739
|
|
|
|
|
|
|
|
|
| |
- This change only impacts the cost model for fusion, given the way
addSliceBounds was being used. It so happens that the output in spite of this
CL's fix is the same; however, the assertions added no longer fail. (an
invalid/inconsistent memref region was being used earlier).
PiperOrigin-RevId: 236405030
|
|
|
|
|
|
| |
The only thing left in BuiltinOps are the core MLIR types. The standard types can't be moved because they are referenced within the IR directory, e.g. in things like Builder.
PiperOrigin-RevId: 236403665
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL changes dialect op source files (.h, .cpp, .td) to follow the following
convention:
<full-dialect-name>/<dialect-namespace>Ops.{h|cpp|td}
Builtin and standard dialects are specially treated, though. Both of them do
not have dialect namespace; the former is still named as BuiltinOps.* and the
latter is named as Ops.*.
Purely mechanical. NFC.
PiperOrigin-RevId: 236371358
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- detect all parallel loops based on dep information and mark them with a
"parallel" attribute
- add mlir::isLoopParallel(OpPointer<AffineForOp> ...), and refactor an existing method
to use that (reuse some code from @andydavis (cl/236007073) for this)
- a simple/meaningful way to test memref dep test as well
Ex:
$ mlir-opt -detect-parallel test/Transforms/parallelism-detection.mlir
#map1 = ()[s0] -> (s0)
func @foo(%arg0: index) {
%0 = alloc() : memref<1024x1024xvector<64xf32>>
%1 = alloc() : memref<1024x1024xvector<64xf32>>
%2 = alloc() : memref<1024x1024xvector<64xf32>>
for %i0 = 0 to %arg0 {
for %i1 = 0 to %arg0 {
for %i2 = 0 to %arg0 {
%3 = load %0[%i0, %i2] : memref<1024x1024xvector<64xf32>>
%4 = load %1[%i2, %i1] : memref<1024x1024xvector<64xf32>>
%5 = load %2[%i0, %i1] : memref<1024x1024xvector<64xf32>>
%6 = mulf %3, %4 : vector<64xf32>
%7 = addf %5, %6 : vector<64xf32>
store %7, %2[%i0, %i1] : memref<1024x1024xvector<64xf32>>
} {parallel: false}
} {parallel: true}
} {parallel: true}
return
}
PiperOrigin-RevId: 236367368
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
*) Breaks fusion pass into multiple sub passes over nodes in data dependence graph:
- first pass fuses single-use producers into their unique consumer.
- second pass enables fusing for input-reuse by fusing sibling nodes which read from the same memref, but which do not share dependence edges.
- third pass fuses remaining producers into their consumers (Note that the sibling fusion pass may have transformed a producer with multiple uses into a single-use producer).
*) Fusion for input reuse is enabled by computing a sibling node slice using the load/load accesses to the same memref, and fusion safety is guaranteed by checking that the sibling node memref write region (to a different memref) is preserved.
*) Enables output vector and output matrix computations from KFAC patches-second-moment operation to fuse into a single loop nest and reuse input from the image patches operation.
*) Adds a generic loop utilitiy for finding all sequential loops in a loop nest.
*) Adds and updates unit tests.
PiperOrigin-RevId: 236350987
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FlatAffineConstraints
- add a method to merge and align the spaces (identifiers) of two
FlatAffineConstraints (both get dimension-wise and symbol-wise unique
columns)
- this completes several TODOs, gets rid of previous assumptions/restrictions
in composeMap, unionBoundingBox, and reuses common code
- remove previous workarounds / duplicated funcitonality in
FlatAffineConstraints::composeMap and unionBoundingBox, use mergeAlignIds
from both
PiperOrigin-RevId: 236320581
|
|
|
|
|
|
| |
- NFC
PiperOrigin-RevId: 236169676
|
|
|
|
|
|
| |
values to ComputationSliceState which are used in FlatAffineConstraints::addSliceBounds, to ensure that constraints are only added for loop IV values which are present in the constraint system.
PiperOrigin-RevId: 235952912
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Analysis - NFC
- refactor AffineExprFlattener (-> SimpleAffineExprFlattener) so that it
doesn't depend on FlatAffineConstraints, and so that FlatAffineConstraints
could be moved out of IR/; the simplification that the IR needs for
AffineExpr's doesn't depend on FlatAffineConstraints
- have AffineExprFlattener derive from SimpleAffineExprFlattener to use for
all Analysis/Transforms purposes; override addLocalFloorDivId in the derived
class
- turn addAffineForOpDomain into a method on FlatAffineConstraints
- turn AffineForOp::getAsValueMap into an AffineValueMap ctor
PiperOrigin-RevId: 235283610
|
|
|
|
| |
PiperOrigin-RevId: 234829637
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generation pass to make it drop certain assumptions, complete TODOs.
- multiple fixes for getMemoryFootprintBytes
- pass loopDepth correctly from getMemoryFootprintBytes()
- use union while computing memory footprints
- bug fixes for addAffineForOpDomain
- take into account loop step
- add domains of other loop IVs in turn that might have been used in the bounds
- dma-generate: drop assumption of "non-unit stride loops being tile space loops
and skipping those and recursing to inner depths"; DMA generation is now purely
based on available fast mem capacity and memory footprint's calculated
- handle memory region compute failures/bailouts correctly from dma-generate
- loop tiling cleanup/NFC
- update some debug and error messages to use emitNote/emitError in
pipeline-data-transfer pass - NFC
PiperOrigin-RevId: 234245969
|
|
|
|
|
|
|
|
|
|
| |
- determine symbols for the memref region correctly
- this wasn't exposed earlier since we didn't have any test cases where the
portion of the nest being DMAed for was non-hyperrectangular (i.e., bounds of
one IV depending on other IVs within that part)
PiperOrigin-RevId: 233493872
|
|
|
|
| |
PiperOrigin-RevId: 232944889
|
|
|
|
| |
PiperOrigin-RevId: 232807986
|
|
|
|
|
|
| |
the AffineOps dialect with 'affine'.
PiperOrigin-RevId: 232728977
|
|
|
|
|
|
| |
The is the second step to adding a namespace to the AffineOps dialect.
PiperOrigin-RevId: 232717775
|
|
|
|
| |
PiperOrigin-RevId: 232713514
|
|
|
|
|
|
|
|
|
| |
this feature during loop fusion cost computation, to compute what the write region of a fusion candidate loop nest slice would be (without having to materialize the slice or change the IR).
*) Adds parameter to public API of MemRefRegion::compute for passing in the slice loop bounds to compute the memref region of the loop nest slice.
*) Exposes public method MemRefRegion::getRegionSize for computing the size of the memref region in bytes.
PiperOrigin-RevId: 232706165
|
|
|
|
| |
PiperOrigin-RevId: 232704766
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AffineOps. This is important for allowing the affine dialect to define canonicalizations directly on the operations instead of relying on transformation passes, e.g. ComposeAffineMaps. A summary of the refactoring:
* AffineStructures has moved to IR.
* simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR.
* makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps.
* ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted.
PiperOrigin-RevId: 232586468
|
|
|
|
|
|
| |
isValidDim/isValidSymbol methods from Value to the AffineOps dialect.
PiperOrigin-RevId: 232386632
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- use getAccessMap() instead of repeating it
- fold getMemRefRegion into MemRefRegion ctor (more natural, avoid heap
allocation and unique_ptr where possible)
- change extractForInductionVars - MutableArrayRef -> ArrayRef for the
arguments. Since the method is just returning copies of 'Value *', the client
can't mutate the pointers themselves; it's fine to mutate the 'Value''s
themselves, but that doesn't mutate the pointers to those.
- change the way extractForInductionVars returns (see b/123437690)
PiperOrigin-RevId: 232359277
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
loops), (2) take into account fast memory space capacity and lower 'dmaDepth'
to fit, (3) add location information for debug info / errors
- change dma-generate pass to work on blocks of instructions (start/end
iterators) instead of 'for' loops; complete TODOs - allows DMA generation for
straightline blocks of operation instructions interspersed b/w loops
- take into account fast memory capacity: check whether memory footprint fits
in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA
generation is performed until it does fit in the provided memory
- add location information to MemRefRegion; any insufficient fast memory
capacity errors or debug info w.r.t dma generation shows location information
- allow DMA generation pass to be instantiated with a fast memory capacity
option (besides command line flag)
- change getMemRefRegion to return unique_ptr's
- change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst'
- other helper methods; add postDomInstFilter option for
replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods
Eg. output
$ mlir-opt -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir
/tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity
for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
^
$ mlir-opt -debug-only=dma-generate -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir
/tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block
for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
PiperOrigin-RevId: 232297044
|
|
|
|
|
|
| |
references to OperationInst in the /include, /AffineOps, and lib/Analysis.
PiperOrigin-RevId: 232199262
|
|
|
|
|
|
| |
mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body.
PiperOrigin-RevId: 232060516
|
|
|
|
|
|
| |
Replace all instances of IfInst with AffineIfOp and delete IfInst.
PiperOrigin-RevId: 231342063
|
|
|
|
| |
PiperOrigin-RevId: 231327161
|
|
|
|
|
|
| |
instances of IfInst with AffineIfOp and delete IfInst.
PiperOrigin-RevId: 231318632
|
|
|
|
|
|
| |
instead of the ForInst itself. This is a necessary step in converting ForInst into an operation.
PiperOrigin-RevId: 231064139
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Addresses b/122486036
This CL addresses some leftover crumbs in AffineMap and IntegerSet by removing
the Null method and cleaning up the constructors.
As the ::Null uses were tracked down, opportunities appeared to untangle some
of the Parsing logic and make it explicit where AffineMap/IntegerSet have
ambiguous syntax. Previously, ambiguous cases were hidden behind the implicit
pointer values of AffineMap* and IntegerSet* that were passed as function
parameters. Depending the values of those pointers one of 3 behaviors could
occur.
This parsing logic convolution is one of the rare cases where I would advocate
for code duplication. The more proper fix would be to make the syntax
unambiguous or to allow some lookahead.
PiperOrigin-RevId: 231058512
|
|
|
|
|
|
| |
block list for verbose printing.
PiperOrigin-RevId: 230951462
|
|
|
|
|
|
|
|
|
|
| |
- introduce a way to compute union using symbolic rectangular bounding boxes
- handle multiple load/store op's to the same memref by taking a union of the regions
- command-line argument to provide capacity of the fast memory space
- minor change to replaceAllMemRefUsesWith to not generate affine_apply if the
supplied index remap was identity
PiperOrigin-RevId: 230848185
|
|
|
|
|
|
| |
remapping successor block operands of terminator operations. We define a new BlockAndValueMapping class to simplify mapping between cloned values.
PiperOrigin-RevId: 230768759
|
|
|
|
| |
PiperOrigin-RevId: 230605756
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
information for -loop-fusion
- update fusion cost model to fuse while tolerating a certain amount of redundant
computation; add cl option -fusion-compute-tolerance
evaluate memory footprint and intermediate memory reduction
- emit debug info from -loop-fusion showing what was fused and why
- introduce function to compute memory footprint for a loop nest
- getMemRefRegion readability update - NFC
PiperOrigin-RevId: 230541857
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- the size of the private memref created for the slice should be based on
the memref region accessed at the depth at which the slice is being
materialized, i.e., symbolic in the outer IVs up until that depth, as opposed
to the region accessed based on the entire domain.
- leads to a significant contraction of the temporary / intermediate memref
whenever the memref isn't reduced to a single scalar (through store fwd'ing).
Other changes
- update to promoteIfSingleIteration - avoid introducing unnecessary identity
map affine_apply from IV; makes it much easier to write and read test cases
and pass output for all passes that use promoteIfSingleIteration; loop-fusion
test cases become much simpler
- fix replaceAllMemrefUsesWith bug that was exposed by the above update -
'domInstFilter' could be one of the ops erased due to a memref replacement in
it.
- fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was
missing (the latter need not always be 1); add lbFloorDivisors output argument
- rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape
PiperOrigin-RevId: 230405218
|
|
|
|
|
|
|
|
| |
*) Adds support for fusing into consumer loop nests with multiple loads from the same memref.
*) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth.
*) Removes dependence on src loop depth and simplifies cost model computation.
PiperOrigin-RevId: 229575126
|
|
|
|
|
|
| |
- readability changes
PiperOrigin-RevId: 229443430
|
|
|
|
|
|
|
|
|
|
|
|
| |
destination loop nest insertion depth based on a simple cost model (cost model can be extended/replaced at a later time).
*) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality).
*) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest.
*) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests).
*) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths).
*) Test: Adds multiple unit tests to test the new functionality.
PiperOrigin-RevId: 229219757
|
|
|
|
|
|
|
|
| |
This CL is the 6th and last on the path to simplifying AffineMap composition.
This removes `AffineValueMap::forwardSubstitutions` and replaces it by simple
calls to `fullyComposeAffineMapAndOperands`.
PiperOrigin-RevId: 228962580
|
|
|
|
|
|
|
|
| |
clients. Let's re-add it in the future if there is ever a reason to. NFC.
Unrelatedly, add a use of a variable to unbreak the non-assert build.
PiperOrigin-RevId: 228284026
|
|
|
|
|
|
|
| |
- fix crash on test/Transforms/canonicalize.mlir with
-memref-bound-check
PiperOrigin-RevId: 228268486
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- refactor toAffineFromEq and the code surrounding it; refactor code into
FlatAffineConstraints::getSliceBounds
- add FlatAffineConstraints methods to detect identifiers as mod's and div's of other
identifiers
- add FlatAffineConstraints::getConstantLower/UpperBound
- Address b/122118218 (don't assert on invalid fusion depths cmdline flags -
instead, don't do anything; change cmdline flags
src-loop-depth -> fusion-src-loop-depth
- AffineExpr/Map print method update: don't fail on null instances (since we have
a wrapper around a pointer, it's avoidable); rationale: dump/print methods should
never fail if possible.
- Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to
IsRangeOneToOne when it's trivially going to be true.
- Add additional test cases to exercise the new support
- update a few existing test cases since the maps are now generated uniformly with
all destination loop operands appearing for the backward slice
- Fix projectOut - fix wrong range for getBestElimCandidate.
- Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since
we didn't have any non-hyperrectangular ones.
PiperOrigin-RevId: 228265152
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- when SSAValue/MLValue existed, code at several places was forced to create additional
aggregate temporaries of SmallVector<SSAValue/MLValue> to handle the conversion; get
rid of such redundant code
- use filling ctors instead of explicit loops
- for smallvectors, change insert(list.end(), ...) -> append(...
- improve comments at various places
- turn getMemRefAccess into MemRefAccess ctor and drop duplicated
getMemRefAccess. In the next CL, provide getAccess() accessors for load,
store, DMA op's to return a MemRefAccess.
PiperOrigin-RevId: 228243638
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- this is CL 1/2 that does a clean up and gets rid of one limitation in an
underlying method - as a result, fusion works for more cases.
- fix bugs/incomplete impl. in toAffineMapFromEq
- fusing across rank changing reshapes for example now just works
For eg. given a rank 1 memref to rank 2 memref reshape (64 -> 8 x 8) like this,
-loop-fusion -memref-dataflow-opt now completely fuses and inlines/store-forward
to get rid of the temporary:
INPUT
// Rank 1 -> Rank 2 reshape
for %i0 = 0 to 64 {
%v = load %A[%i0]
store %v, %B[%i0 floordiv 8, i0 mod 8]
}
for %i1 = 0 to 8
for %i2 = 0 to 8
%w = load %B[%i1, i2]
"foo"(%w) : (f32) -> ()
OUTPUT
$ mlir-opt -loop-fusion -memref-dataflow-opt fuse_reshape.mlir
#map0 = (d0, d1) -> (d0 * 8 + d1)
mlfunc @fuse_reshape(%arg0: memref<64xf32>) {
for %i0 = 0 to 8 {
for %i1 = 0 to 8 {
%0 = affine_apply #map0(%i0, %i1)
%1 = load %arg0[%0] : memref<64xf32>
"foo"(%1) : (f32) -> ()
}
}
}
AFAIK, there is no polyhedral tool / compiler that can perform such fusion -
because it's not really standard loop fusion, but possible through a
generalized slicing-based approach such as ours.
PiperOrigin-RevId: 227918338
|