| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
| |
The check for whether the memref was used in a non-derefencing context had to
be done inside, i.e., only for the op stmt's that the replacement was specified
to be performed on (by the domStmtFilter arg if provided). As such, it is
completely fine for example for a function to return a memref while the replacement
is being performed only a specific loop's body (as in the case of DMA
generation).
PiperOrigin-RevId: 223827753
|
|
|
|
|
|
| |
The algorithm collects defining operations within a scoped hash table. The scopes within the hash table correspond to nodes within the dominance tree for a function. This cl only adds support for simple operations, i.e non side-effecting. Such operations, e.g. load/store/call, will be handled in later patches.
PiperOrigin-RevId: 223811328
|
|
|
|
|
|
| |
Fixes one of the Filecheck'ed test which was mistakenly disabled.
PiperOrigin-RevId: 223401978
|
|
|
|
|
|
|
|
| |
Several things were suggested in post-submission reviews. In particular, use
pointers in function interfaces instead of references (still use references
internally). Clarify the behavior of the pass in presence of MLFunctions.
PiperOrigin-RevId: 222556851
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL adds an MLIR-MLIR pass which materializes super-vectors to
hardware-dependent sized vectors.
While the physical vector size is target-dependent, the pass is written in
a target-independent way: the target vector size is specified as a parameter
to the pass. This pass is thus a partial lowering that opens the "greybox"
that is the super-vector abstraction.
This first CL adds a first materilization pass iterates over vector_transfer_write operations and:
1. computes the program slice including the current vector_transfer_write;
2. computes the multi-dimensional ratio of super-vector shape to hardware
vector shape;
3. for each possible multi-dimensional value within the bounds of ratio, a new slice is
instantiated (i.e. cloned and rewritten) so that all operations in this instance operate on
the hardware vector type.
As a simple example, given:
```mlir
mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> {
%A = alloc (%M, %N) : memref<?x?xf32>
%B = alloc (%M, %N) : memref<?x?xf32>
%C = alloc (%M, %N) : memref<?x?xf32>
for %i0 = 0 to %M {
for %i1 = 0 to %N {
%a1 = load %A[%i0, %i1] : memref<?x?xf32>
%b1 = load %B[%i0, %i1] : memref<?x?xf32>
%s1 = addf %a1, %b1 : f32
store %s1, %C[%i0, %i1] : memref<?x?xf32>
}
}
return %C : memref<?x?xf32>
}
```
and the following options:
```
-vectorize -virtual-vector-size 32 --test-fastest-varying=0 -materialize-vectors -vector-size=8
```
materialization emits:
```mlir
#map0 = (d0, d1) -> (d0, d1)
#map1 = (d0, d1) -> (d0, d1 + 8)
#map2 = (d0, d1) -> (d0, d1 + 16)
#map3 = (d0, d1) -> (d0, d1 + 24)
mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> {
%0 = alloc(%arg0, %arg1) : memref<?x?xf32>
%1 = alloc(%arg0, %arg1) : memref<?x?xf32>
%2 = alloc(%arg0, %arg1) : memref<?x?xf32>
for %i0 = 0 to %arg0 {
for %i1 = 0 to %arg1 step 32 {
%3 = affine_apply #map0(%i0, %i1)
%4 = "vector_transfer_read"(%0, %3tensorflow/mlir#0, %3tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
%5 = affine_apply #map1(%i0, %i1)
%6 = "vector_transfer_read"(%0, %5tensorflow/mlir#0, %5tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
%7 = affine_apply #map2(%i0, %i1)
%8 = "vector_transfer_read"(%0, %7tensorflow/mlir#0, %7tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
%9 = affine_apply #map3(%i0, %i1)
%10 = "vector_transfer_read"(%0, %9tensorflow/mlir#0, %9tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
%11 = affine_apply #map0(%i0, %i1)
%12 = "vector_transfer_read"(%1, %11tensorflow/mlir#0, %11tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
%13 = affine_apply #map1(%i0, %i1)
%14 = "vector_transfer_read"(%1, %13tensorflow/mlir#0, %13tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
%15 = affine_apply #map2(%i0, %i1)
%16 = "vector_transfer_read"(%1, %15tensorflow/mlir#0, %15tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
%17 = affine_apply #map3(%i0, %i1)
%18 = "vector_transfer_read"(%1, %17tensorflow/mlir#0, %17tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
%19 = addf %4, %12 : vector<8xf32>
%20 = addf %6, %14 : vector<8xf32>
%21 = addf %8, %16 : vector<8xf32>
%22 = addf %10, %18 : vector<8xf32>
%23 = affine_apply #map0(%i0, %i1)
"vector_transfer_write"(%19, %2, %23tensorflow/mlir#0, %23tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> ()
%24 = affine_apply #map1(%i0, %i1)
"vector_transfer_write"(%20, %2, %24tensorflow/mlir#0, %24tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> ()
%25 = affine_apply #map2(%i0, %i1)
"vector_transfer_write"(%21, %2, %25tensorflow/mlir#0, %25tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> ()
%26 = affine_apply #map3(%i0, %i1)
"vector_transfer_write"(%22, %2, %26tensorflow/mlir#0, %26tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> ()
}
}
return %2 : memref<?x?xf32>
}
```
PiperOrigin-RevId: 222455351
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL adds tooling for computing slices as an independent CL.
The first consumer of this analysis will be super-vector materialization in a
followup CL.
In particular, this adds:
1. a getForwardStaticSlice function with documentation, example and a
standalone unit test;
2. a getBackwardStaticSlice function with documentation, example and a
standalone unit test;
3. a getStaticSlice function with documentation, example and a standalone unit
test;
4. a topologicalSort function that is exercised through the getStaticSlice
unit test.
The getXXXStaticSlice functions take an additional root (resp. terminators)
parameter which acts as a boundary that the transitive propagation algorithm
is not allowed to cross.
PiperOrigin-RevId: 222446208
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cases.
- fix bug in calculating index expressions for DMA buffers in certain cases
(affected tiled loop nests); add more test cases for better coverage.
- introduce an additional optional argument to replaceAllMemRefUsesWith;
additional operands to the index remap AffineMap can now be supplied by the
client.
- FlatAffineConstraints::addBoundsForStmt - fix off by one upper bound,
::composeMap - fix position bug.
- Some clean up and more comments
PiperOrigin-RevId: 222434628
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function pass replaces affine_apply operations in CFG functions with
sequences of primitive arithmetic instructions that form the affine map.
The actual replacement functionality is located in LoweringUtils as a
standalone function operating on an individual affine_apply operation and
inserting the result at the location of the original operation. It is expected
to be useful for other, target-specific lowering passes that may start at
MLFunction level that Deaffinator does not support.
PiperOrigin-RevId: 222406692
|
|
|
|
|
|
|
|
| |
buffering in the auto DMA overlap pass.
This is done online in the pass.
PiperOrigin-RevId: 222313640
|
|
|
|
| |
PiperOrigin-RevId: 222299120
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL refactors a few things in Vectorize.cpp:
1. a clear distinction is made between:
a. the LoadOp are the roots of vectorization and must be vectorized
eagerly and propagate their value; and
b. the StoreOp which are the terminals of vectorization and must be
vectorized late (i.e. they do not produce values that need to be
propagated).
2. the StoreOp must be vectorized late because in general it can store a value
that is not reachable from the subset of loads defined in the
current pattern. One trivial such case is storing a constant defined at the
top-level of the MLFunction and that needs to be turned into a splat.
3. a description of the algorithm is given;
4. the implementation matches the algorithm;
5. the last example is made parametric, in practice it will fully rely on the
implementation of vector_transfer_read/write which will handle boundary
conditions and padding. This will happen by lowering to a lower-level
abstraction either:
a. directly in MLIR (whether DMA or just loops or any async tasks in the
future) (whiteboxing);
b. in LLO/LLVM-IR/whatever blackbox library call/ search + swizzle inventor
one may want to use;
c. a partial mix of a. and b. (grey-boxing)
5. minor cleanups are applied;
6. mistakenly disabled unit tests are re-enabled (oopsie).
With this CL, this MLIR snippet:
```
mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> {
%A = alloc (%M, %N) : memref<?x?xf32>
%B = alloc (%M, %N) : memref<?x?xf32>
%C = alloc (%M, %N) : memref<?x?xf32>
%f1 = constant 1.0 : f32
%f2 = constant 2.0 : f32
for %i0 = 0 to %M {
for %i1 = 0 to %N {
// non-scoped %f1
store %f1, %A[%i0, %i1] : memref<?x?xf32>
}
}
for %i4 = 0 to %M {
for %i5 = 0 to %N {
%a5 = load %A[%i4, %i5] : memref<?x?xf32>
%b5 = load %B[%i4, %i5] : memref<?x?xf32>
%s5 = addf %a5, %b5 : f32
// non-scoped %f1
%s6 = addf %s5, %f1 : f32
store %s6, %C[%i4, %i5] : memref<?x?xf32>
}
}
return %C : memref<?x?xf32>
}
```
vectorized with these arguments:
```
-vectorize -virtual-vector-size 256 --test-fastest-varying=0
```
vectorization produces this standard innermost-loop vectorized code:
```
mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> {
%0 = alloc(%arg0, %arg1) : memref<?x?xf32>
%1 = alloc(%arg0, %arg1) : memref<?x?xf32>
%2 = alloc(%arg0, %arg1) : memref<?x?xf32>
%cst = constant 1.000000e+00 : f32
%cst_0 = constant 2.000000e+00 : f32
for %i0 = 0 to %arg0 {
for %i1 = 0 to %arg1 step 256 {
%cst_1 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32>
"vector_transfer_write"(%cst_1, %0, %i0, %i1) : (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
}
}
for %i2 = 0 to %arg0 {
for %i3 = 0 to %arg1 step 256 {
%3 = "vector_transfer_read"(%0, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32>
%4 = "vector_transfer_read"(%1, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32>
%5 = addf %3, %4 : vector<256xf32>
%cst_2 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32>
%6 = addf %5, %cst_2 : vector<256xf32>
"vector_transfer_write"(%6, %2, %i2, %i3) : (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
}
}
return %2 : memref<?x?xf32>
}
```
Of course, much more intricate n-D imperfectly-nested patterns can be emitted too in a fully declarative fashion, but this is enough for now.
PiperOrigin-RevId: 222280209
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the general case, loop bounds can be expressed as affine maps of the outer
loop iterators and function arguments. Relax the check for loop bounds to be
known integer constants and also accept one-dimensional affine bounds in
ConvertToCFG ForStmt lowering. Emit affine_apply operations for both the upper
and the lower bound. The semantics of MLFunctions guarantees that both bounds
can be computed before the loop starts iterating. Constant bounds are merely a
short-hand notation for zero-dimensional affine maps and get supported
transparently.
Multidimensional affine bounds are not yet supported because the target IR
dialect lacks min/max operations necessary to implement the corresponding
semantics.
PiperOrigin-RevId: 222275801
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL adds some vector support in prevision of the upcoming vector
materialization pass. In particular this CL adds 2 functions to:
1. compute the multiplicity of a subvector shape in a supervector shape;
2. help match operations on strict super-vectors. This is defined for a given
subvector shape as an operation that manipulates a vector type that is an
integral multiple of the subtype, with multiplicity at least 2.
This CL also adds a TestUtil pass where we can dump arbitrary testing of
functions and analysis that operate at a much smaller granularity than a pass
(e.g. an analysis for which it is convenient to write a bit of artificial MLIR
and write some custom test). This is in order to keep using Filecheck for
things that essentially look and feel like C++ unit tests.
PiperOrigin-RevId: 222250910
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and getMemRefRegion() to work with specified loop depths; add support for
outgoing DMAs, store op's.
- add support for getMemRefRegion symbolic in outer loops - hence support for
DMAs symbolic in outer surrounding loops.
- add DMA generation support for outgoing DMAs (store op's to lower memory
space); extend getMemoryRegion to store op's. -memref-bound-check now works
with store op's as well.
- fix dma-generate (references to the old memref in the dma_start op were also
being replaced with the new buffer); we need replace all memref uses to work
only on a subset of the uses - add a new optional argument for
replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional
'operation' argument to serve as a filter - if provided, only those uses that
are dominated by the filter are replaced.
- Add missing print for attributes for dma_start, dma_wait op's.
- update the FlatAffineConstraints API
PiperOrigin-RevId: 221889223
|
|
|
|
| |
PiperOrigin-RevId: 221863955
|
|
|
|
|
|
|
|
|
|
|
|
| |
Array attributes can nested and function attributes can appear anywhere at that
level. They should be remapped to point to the generated CFGFunction after
ML-to-CFG conversion, similarly to plain function attributes. Extract the
nested attribute remapping functionality from the Parser to Utils. Extract out
the remapping function for individual Functions from the module remapping
function. Use these new functions in the ML-to-CFG conversion pass and in the
parser.
PiperOrigin-RevId: 221510997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL adds support for and a vectorization test to perform scalar 2-D addf.
The support extension notably comprises:
1. extend vectorizable test to exclude vector_transfer operations and
expose them to LoopAnalysis where they are needed. This is a temporary
solution a concrete MLIR Op exists;
2. add some more functional sugar mapKeys, apply and ScopeGuard (which became
relevant again);
3. fix improper shifting during coarsening;
4. rename unaligned load/store to vector_transfer_read/write and simplify the
design removing the unnecessary AllocOp that were introduced prematurely:
vector_transfer_read currently has the form:
(memref<?x?x?xf32>, index, index, index) -> vector<32x64x256xf32>
vector_transfer_write currently has the form:
(vector<32x64x256xf32>, memref<?x?x?xf32>, index, index, index) -> ()
5. adds vectorizeOperations which traverses the operations in a ForStmt and
rewrites them to their vector form;
6. add support for vector splat from a constant.
The relevant tests are also updated.
PiperOrigin-RevId: 221421426
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Branch instruction arguments were defined and used inconsistently across
different instructions, in both the spec and the implementation. In
particular, conditional and unconditional branch instructions were using
different syntax in the implementation. This led to the IR we produce not
being accepted by the parser. Update the printer to use common syntax: `(`
list-of-SSA-uses `:` list-of-types `)`. The motivation for choosing this
syntax as opposed to the one in the spec, `(` list-of-SSA-uses `)` `:`
list-of-types is double-fold. First, it is tricky to differentiate the label
of the false branch from the type while parsing conditional branches (which is
what apparently motivated the implementation to diverge from the spec in the
first place). Second, the ongoing convergence between terminator instructions
and other operations prompts for consistency between their operand list syntax.
After this change, the only remaining difference between the two is the use of
parentheses. Update the comment of the parser that did not correspond to the
code. Remove the unused isParenthesized argument from parseSSAUseAndTypeList.
Update the spec accordingly. Note that the examples in the spec were _not_
using the EBNF defined a couple of lines above them, but were using the current
syntax. Add a supplementary example of a branch to a basic block with multiple
arguments.
PiperOrigin-RevId: 221162655
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement a pass converting a subset of MLFunctions to CFGFunctions. Currently
supports arbitrarily complex imperfect loop nests with statically constant
(i.e., not affine map) bounds filled with operations. Does NOT support
branches and non-constant loop bounds.
Conversion is performed per-function and the function names are preserved to
avoid breaking any external references to the current module. In-memory IR is
updated to point to the right functions in direct calls and constant loads.
This behavior is tested via a really hidden flag that enables function
renaming.
Inside each function, the control flow conversion is based on single-entry
single-exit regions, i.e. subgraphs of the CFG that have exactly one incoming
and exactly one outgoing edge. Since an MLFunction must have a single "return"
statement as per MLIR spec, it constitutes an SESE region. Individual
operations are appended to this region. Control flow statements are
recursively converted into such regions that are concatenated with the current
region. Bodies of the compound statement also form SESE regions, which allows
to nest control flow statements easily. Note that SESE regions are not
materialized in the code. It is sufficent to keep track of the end of the
region as the current instruction insertion point as long as all recursive
calls update the insertion point in the end.
The converter maintains a mapping between SSA values in ML functions and their
CFG counterparts. The mapping is used to find the operands for each operation
and is updated to contain the results of each operation as the conversion
continues.
PiperOrigin-RevId: 221162602
|
|
|
|
|
|
|
|
|
|
| |
accesses (distance/direction vectors).
Updates MemRefDependenceCheck to check and report on all memref access pairs at all loop nest depths.
Updates old and adds new memref dependence check tests.
Resolves multiple TODOs.
PiperOrigin-RevId: 220816515
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- constant bounded memory regions, static shapes, no handling of
overlapping/duplicate regions (through union) for now; also only, load memory
op's.
- add build methods for DmaStartOp, DmaWaitOp.
- move getMemoryRegion() into Analysis/Utils and expose it.
- fix addIndexSet, getMemoryRegion() post switch to exclusive upper bounds;
update test cases for memref-bound-check and memref-dependence-check for
exclusive bounds (missed in a previous CL)
PiperOrigin-RevId: 220729810
|
|
|
|
|
|
|
|
| |
cl/220448963 had missed a part of the updates.
- while on this, clean up some of the test cases to use ops' custom forms.
PiperOrigin-RevId: 220675303
|
|
|
|
|
|
|
| |
This CL implement exclusive upper bound behavior as per b/116854378.
A followup CL will update the semantics of the for loop.
PiperOrigin-RevId: 220448963
|
|
|
|
|
|
|
|
|
|
|
| |
- simple perfectly nested band tiling with fixed tile sizes.
- only the hyper-rectangular case is handled, with other limitations of
getIndexSet applying (constant loop bounds, etc.); once
the latter utility is extended, tiled code generation should become more
general.
- Add FlatAffineConstraints::isHyperRectangular()
PiperOrigin-RevId: 220324933
|
|
|
|
|
|
| |
Adds equality constraints to dependence constraint system for accesses using dims/symbols where the defining operation of the dim/symbol is a constant.
PiperOrigin-RevId: 219814740
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
variables from mod's and div's when converting to flat form.
- propagate mod, floordiv, ceildiv / local variables constraint information
when flattening affine expressions and converting them into flat affine
constraints; resolve multiple TODOs.
- enables memref bound checker to work with arbitrary affine expressions
- update FlatAffineConstraints API with several new methods
- test/exercise functionality mostly through -memref-bound-check
- other analyses such as dependence tests, etc. should now be able to work in the
presence of any affine composition of add, mul, floor, ceil, mod.
PiperOrigin-RevId: 219711806
|
|
|
|
|
|
|
|
|
|
|
|
| |
access the same element.
- Builds access functions and iterations domains for each access.
- Builds dependence polyhedron constraint system which has equality constraints for equated access functions and inequality constraints for iteration domain loop bounds.
- Runs elimination on the dependence polyhedron to test if no dependence exists between the accesses.
- Adds a trivial LoopFusion transformation pass with a simple test policy to test dependence between accesses to the same memref in adjacent loops.
- The LoopFusion pass will be extended in subsequent CLs.
PiperOrigin-RevId: 219630898
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL adds support for vectorization using more interesting 2-D and 3-D
patterns. Note in particular the fact that we match some pretty complex
imperfectly nested 2-D patterns with a quite minimal change to the
implementation: we just add a bit of recursion to traverse the matched
patterns and actually vectorize the loops.
For instance, vectorizing the following loop by 128:
```
for %i3 = 0 to %0 {
%7 = affine_apply (d0) -> (d0)(%i3)
%8 = load %arg0[%c0_0, %7] : memref<?x?xf32>
}
```
Currently generates:
```
#map0 = ()[s0] -> (s0 + 127)
#map1 = (d0) -> (d0)
for %i3 = 0 to #map0()[%0] step 128 {
%9 = affine_apply #map1(%i3)
%10 = alloc() : memref<1xvector<128xf32>>
%11 = "n_d_unaligned_load"(%arg0, %c0_0, %9, %10, %c0) :
(memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) ->
(memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index)
%12 = load %10[%c0] : memref<1xvector<128xf32>>
}
```
The above is subject to evolution.
PiperOrigin-RevId: 219629745
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce analysis to check memref accesses (in MLFunctions) for out of bound
ones. It works as follows:
$ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir
/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
%y = load %B[%idy] : memref<128 x i32>
^
/tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
%y = load %B[%idy] : memref<128 x i32>
^
#map0 = (d0, d1) -> (d0, d1)
#map1 = (d0, d1) -> (d0 * 128 - d1)
mlfunc @test() {
%0 = alloc() : memref<9x9xi32>
%1 = alloc() : memref<128xi32>
for %i0 = -1 to 9 {
for %i1 = -1 to 9 {
%2 = affine_apply #map0(%i0, %i1)
%3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32>
%4 = affine_apply #map1(%i0, %i1)
%5 = load %1[%4] : memref<128xi32>
}
}
return
}
- Improves productivity while manually / semi-automatically developing MLIR for
testing / prototyping; also provides an indirect way to catch errors in
transformations.
- This pass is an easy way to test the underlying affine analysis
machinery including low level routines.
Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256.
While on this:
- create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/
- fix a bug in AffineAnalysis.cpp::toAffineExpr
TODO: extend to non-constant loop bounds (straightforward). Will transparently
work for all accesses once floordiv, mod, ceildiv are supported in the
AffineMap -> FlatAffineConstraints conversion.
PiperOrigin-RevId: 219397961
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL is a first in a series that implements early vectorization of
increasingly complex patterns. In particular, early vectorization will support
arbitrary loop nesting patterns (both perfectly and imperfectly nested), at
arbitrary depths in the loop tree.
This first CL builds the minimal support for applying 1-D patterns.
It relies on an unaligned load/store op abstraction that can be inplemented
differently on different HW.
Future CLs will support higher dimensional patterns, but 1-D patterns already
exhibit interesting properties.
In particular, we want to separate pattern matching (i.e. legality both
structural and dependency analysis based), from profitability analysis, from
application of the transformation.
As a consequence patterns may intersect and we need to verify that a pattern
can still apply by the time we get to applying it.
A non-greedy analysis on profitability that takes into account pattern
intersection is left for future work.
Additionally the CL makes the following cleanups:
1. the matches method now returns a value, not a reference;
2. added comments about the MLFunctionMatcher and MLFunctionMatches usage by
value;
3. added size and empty methods to matches;
4. added a negative vectorization test with a conditional, this exhibited a
but in the iterators. Iterators now return nullptr if the underlying storage
is nullpt.
PiperOrigin-RevId: 219299489
|
|
|
|
|
|
|
|
|
| |
- Added a mechanism for specifying pattern matching more concisely like LLVM.
- Added support for canonicalization of addi/muli over vector/tensor splat
- Added ValueType to Attribute class hierarchy
- Allowed creating constant splat
PiperOrigin-RevId: 219149621
|
|
|
|
| |
PiperOrigin-RevId: 219148982
|
|
|
|
| |
PiperOrigin-RevId: 218885877
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As per MLIR spec, the absence of affine maps in MemRef type is interpreted as
an implicit identity affine map. Therefore, MemRef types declared with
explicit or implicit identity map should be considered equal at the MemRefType
level. During MemRefType construction, drop trivial identity affine map
compositions. A trivial identity composition consists of a single unbounded
identity map. It is unclear whether affine maps should be composed in-place to
a single map during MemRef type construction, so non-trivial compositions that
could have been simplified to an identity are NOT removed. We chose to drop
the trivial identity map rather than inject it in places that assume its
present implicitly because it makes the code simpler by reducing boilerplate;
identity mappings are obvious defaults.
Update tests that were checking for the presence of trivial identity map
compositions in the outputs.
PiperOrigin-RevId: 218862454
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) We incorrectly reassociated non-reassociative operations like subi, causing
miscompilations.
2) When constant folding, we didn't add users of the new constant back to the
worklist for reprocessing, causing us to miss some cases (pointed out by
Uday).
The code for tensorflow/mlir#2 is gross, but I'll add the new APIs in a followup patch.
PiperOrigin-RevId: 218803984
|
|
|
|
| |
PiperOrigin-RevId: 218786684
|
|
|
|
|
|
| |
no integer solutions.
PiperOrigin-RevId: 218772332
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Introduce Fourier-Motzkin variable elimination to eliminate a dimension from
a system of linear equalities/inequalities. Update isEmpty to use this.
Since FM is only exact on rational/real spaces, an emptiness check based on
this is guaranteed to be exact whenever it says the underlying set is empty;
if it says, it's not empty, there may still be no integer points in it.
Also, supports a version that computes "dark shadows".
- Test this by checking for "always false" conditionals in if statements.
- Unique IntegerSet's that are small (few constraints, few variables). This
basically means the canonical empty set and other small sets that are
likely commonly used get uniqued; allows checking for the canonical empty set
by pointer. IntegerSet::kUniquingThreshold gives the threshold constraint size
for uniqui'ing.
- rename simplify-affine-expr -> simplify-affine-structures
Other cleanup
- IntegerSet::numConstraints, AffineMap::numResults are no longer needed;
remove them.
- add copy assignment operators for AffineMap, IntegerSet.
- rename Invalid() -> Null() on AffineExpr, AffineMap, IntegerSet
- Misc cleanup for FlatAffineConstraints API
PiperOrigin-RevId: 218690456
|
|
|
|
|
|
|
|
| |
- Adds FlatAffineConstraints::isEmpty method to test if there are no solutions to the system.
- Adds GCD test check if equality constraints have no solution.
- Adds unit test cases.
PiperOrigin-RevId: 218546319
|
|
|
|
|
|
|
|
|
| |
is a straight-forward change, but required adding missing moveBefore() methods
on operations (requiring moving some traits around to make C++ happy). This
also fixes a constness issue with the getBlock/getFunction() methods on
Instruction, and adds a missing getFunction() method on MLFuncBuilder.
PiperOrigin-RevId: 218523905
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add a few canonicalization patterns to fold memref_cast into
load/store/dealloc.
- Canonicalize alloc(constant) into an alloc with a constant shape followed by
a cast.
- Add a new PatternRewriter::updatedRootInPlace API to make this more convenient.
SimplifyAllocConst and the testcase is heavily based on Uday's implementation work, just
in a different framework.
PiperOrigin-RevId: 218361237
|
|
|
|
|
|
|
|
| |
few TODOs,
and add some casting support to Operation.
PiperOrigin-RevId: 218219340
|
|
|
|
|
|
|
| |
the pattern matcher / canonicalizer, and rename existing eraseFromBlock methods
to align with it.
PiperOrigin-RevId: 218104455
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
resolve
multiple TODOs.
- replace the fake test pass (that worked on just the first loop in the
MLFunction) to perform DMA pipelining on all suitable loops.
- nested DMAs work now (DMAs in an outer loop, more DMAs in nested inner loops)
- fix bugs / assumptions: correctly copy memory space and elemental type of source
memref for double buffering.
- correctly identify matching start/finish statements, handle multiple DMAs per
loop.
- introduce dominates/properlyDominates utitilies for MLFunction statements.
- move checkDominancePreservationOnShifts to LoopAnalysis.h; rename it
getShiftValidity
- refactor getContainingStmtPos -> findAncestorStmtInBlock - move into
Analysis/Utils.h; has two users.
- other improvements / cleanup for related API/utilities
- add size argument to dma_wait - for nested DMAs or in general, it makes it
easy to obtain the size to use when lowering the dma_wait since we wouldn't
want to identify the matching dma_start, and more importantly, in general/in the
future, there may not always be a dma_start dominating the dma_wait.
- add debug information in the pass
PiperOrigin-RevId: 217734892
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL implements a very simple loop vectorization **test** and the basic
infrastructure to support it.
The test simply consists in:
1. matching the loops in the MLFunction and all the Load/Store operations
nested under the loop;
2. testing whether all the Load/Store are contiguous along the innermost
memory dimension along that particular loop. If any reference is
non-contiguous (i.e. the ForStmt SSAValue appears in the expression), then
the loop is not-vectorizable.
The simple test above can gradually be extended with more interesting
behaviors to account for the fact that a layout permutation may exist that
enables contiguity etc. All these will come in due time but it is worthwhile
noting that the test already supports detection of outer-vetorizable loops.
In implementing this test, I also added a recursive MLFunctionMatcher and some
sugar that can capture patterns
such as `auto gemmLike = Doall(Doall(Red(LoadStore())))` and allows iterating
on the matched IR structures. For now it just uses in order traversal but
post-order DFS will be useful in the future once IR rewrites start occuring.
One may note that the memory management design decision follows a different
pattern from MLIR. After evaluating different designs and how they quickly
increase cognitive overhead, I decided to opt for the simplest solution in my
view: a class-wide (threadsafe) RAII context.
This way, a pass that needs MLFunctionMatcher can just have its own locally
scoped BumpPtrAllocator and everything is cleaned up when the pass is destroyed.
If passes are expected to have a longer lifetime, then the contexts can easily
be scoped inside the runOnMLFunction call and storage lifetime reduced.
Lastly, whatever the scope of threading (module, function, pass), this is
expected to also be future-proof wrt concurrency (but this is a detail atm).
PiperOrigin-RevId: 217622889
|
|
|
|
|
|
|
| |
pass, build up the worklist infra in anticipation of improving the pattern
matcher to match more than one node.
PiperOrigin-RevId: 217330579
|
|
|
|
|
|
|
|
|
| |
of its users which are also AffineApplyOps.
Updates ComposeAffineMaps test pass to use this method.
Updates affine map composition test cases to handle the new pass, which can be reused when this method is used in a future instruction combine pass.
PiperOrigin-RevId: 217163351
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- add util to create a private / exclusive / single use affine
computation slice for an op stmt (see method doc comment); a single
multi-result affine_apply op is prepended to the op stmt to provide all
results needed for its operands as a function of loop iterators and symbols.
- use it for DMA pipelining (to create private slices for DMA start stmt's);
resolve TODOs/feature request (b/117159533)
- move createComposedAffineApplyOp to Transforms/Utils; free it from taking a
memref as input / generalize it.
PiperOrigin-RevId: 216926818
|
|
|
|
|
|
|
|
|
|
|
| |
out canonicalization pass to drive it, and a simple (x-x) === 0 pattern match
as a test case.
There is a tremendous number of improvements that need to land, and the
matcher/rewriter and patterns will be split out of this file, but this is a
starting point.
PiperOrigin-RevId: 216788604
|
|
|
|
|
|
|
|
|
|
|
| |
Add target independent standard DMA ops: dma.start, dma.wait. Update pipeline
data transfer to use these to detect DMA ops.
While on this
- return failure from mlir-opt::performActions if a pass generates invalid output
- improve error message for verify 'n' operand traits
PiperOrigin-RevId: 216429885
|