diff options
| author | Nicolas Vasilache <ntv@google.com> | 2018-12-06 11:37:25 -0800 |
|---|---|---|
| committer | jpienaar <jpienaar@google.com> | 2019-03-29 14:20:07 -0700 |
| commit | df0a25efeea1f63ea717f4fc80aabac88d94d189 (patch) | |
| tree | 4d84247c23eb3a82fa9b314426111cb4ec487ee3 /mlir/lib/Transforms/MaterializeVectors.cpp | |
| parent | 7c89a225cfafef3dcf5de4991846de2d0e9dda06 (diff) | |
| download | bcm5719-llvm-df0a25efeea1f63ea717f4fc80aabac88d94d189.tar.gz bcm5719-llvm-df0a25efeea1f63ea717f4fc80aabac88d94d189.zip | |
[MLIR] Add support for permutation_map
This CL hooks up and uses permutation_map in vector_transfer ops.
In particular, when going into the nuts and bolts of the implementation, it
became clear that cases arose that required supporting broadcast semantics.
Broadcast semantics are thus added to the general permutation_map.
The verify methods and tests are updated accordingly.
Examples of interest include.
Example 1:
The following MLIR snippet:
```mlir
for %i3 = 0 to %M {
for %i4 = 0 to %N {
for %i5 = 0 to %P {
%a5 = load %A[%i4, %i5, %i3] : memref<?x?x?xf32>
}}}
```
may vectorize with {permutation_map: (d0, d1, d2) -> (d2, d1)} into:
```mlir
for %i3 = 0 to %0 step 32 {
for %i4 = 0 to %1 {
for %i5 = 0 to %2 step 256 {
%4 = vector_transfer_read %arg0, %i4, %i5, %i3
{permutation_map: (d0, d1, d2) -> (d2, d1)} :
(memref<?x?x?xf32>, index, index) -> vector<32x256xf32>
}}}
````
Meaning that vector_transfer_read will be responsible for reading the 2-D slice:
`%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>. This will
require a transposition when vector_transfer_read is further lowered.
Example 2:
The following MLIR snippet:
```mlir
%cst0 = constant 0 : index
for %i0 = 0 to %M {
%a0 = load %A[%cst0, %cst0] : memref<?x?xf32>
}
```
may vectorize with {permutation_map: (d0) -> (0)} into:
```mlir
for %i0 = 0 to %0 step 128 {
%3 = vector_transfer_read %arg0, %c0_0, %c0_0
{permutation_map: (d0, d1) -> (0)} :
(memref<?x?xf32>, index, index) -> vector<128xf32>
}
````
Meaning that vector_transfer_read will be responsible of reading the 0-D slice
`%arg0[%c0, %c0]` into vector<128xf32>. This will require a 1-D vector
broadcast when vector_transfer_read is further lowered.
Additionally, some minor cleanups and refactorings are performed.
One notable thing missing here is the composition with a projection map during
materialization. This is because I could not find an AffineMap composition
that operates on AffineMap directly: everything related to composition seems
to require going through SSAValue and only operates on AffinMap at a distance
via AffineValueMap. I have raised this concern a bunch of times already, the
followup CL will actually do something about it.
In the meantime, the projection is hacked at a minimum to pass verification
and materialiation tests are temporarily incorrect.
PiperOrigin-RevId: 224376828
Diffstat (limited to 'mlir/lib/Transforms/MaterializeVectors.cpp')
| -rw-r--r-- | mlir/lib/Transforms/MaterializeVectors.cpp | 124 |
1 files changed, 121 insertions, 3 deletions
diff --git a/mlir/lib/Transforms/MaterializeVectors.cpp b/mlir/lib/Transforms/MaterializeVectors.cpp index 27f157c9234..0d7d0db2b20 100644 --- a/mlir/lib/Transforms/MaterializeVectors.cpp +++ b/mlir/lib/Transforms/MaterializeVectors.cpp @@ -82,6 +82,73 @@ /// operations and builds the slice scoped the innermost loop enclosing the /// current vector_transfer_write. These assumptions and the implementation /// details are subject to revision in the future. +/// +/// Example +/// ======== +/// In the following, the single vector_transfer_write op operates on a +/// vector<4x4x4xf32>. Let's assume the HW supports vector<4x4xf32>. +/// Materialization is achieved by instantiating each occurrence of the leading +/// dimension of vector<4x4x4xf32> into a vector<4x4xf32>. +/// The program transformation that implements this instantiation is a +/// multi-loop unroll-and-jam (it can be partial or full depending on the ratio +/// of super-vector shape to HW-vector shape). +/// +/// As a simple case, the following: +/// ```mlir +/// mlfunc @materialize(%M : index, %N : index, %O : index, %P : index) { +/// %A = alloc (%M, %N, %O, %P) : memref<?x?x?x?xf32, 0> +/// %f1 = constant splat<vector<4x4x4xf32>, 1.000000e+00> : +/// vector<4x4x4xf32> for %i0 = 0 to %M step 4 { +/// for %i1 = 0 to %N step 4 { +/// for %i2 = 0 to %O { +/// for %i3 = 0 to %P step 4 { +/// vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 +/// {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : +/// vector<4x4x4xf32>, memref<?x?x?x?xf32, 0>, +/// index, index, index, index +/// }}}} +/// return +/// } +/// ``` +/// +/// is instantiated by unroll-and-jam (just unroll in this case) into: +/// +/// ```mlir +/// mlfunc @materialize(%M : index, %N : index, %O : index, %P : index) { +/// %A = alloc (%M, %N, %O, %P) : memref<?x?x?x?xf32, 0> +/// %f1 = constant splat<vector<4x4xf32>, 1.000000e+00> : vector<4x4x4xf32> +/// for %i0 = 0 to %arg0 step 4 { +/// for %i1 = 0 to %arg1 step 4 { +/// for %i2 = 0 to %arg2 { +/// for %i3 = 0 to %arg3 step 4 { +/// %1 = affine_apply (d0, d1, d2, d3) -> (d0, d1, d2, d3) +/// (%i0, %i1, %i2, %i3) +/// vector_transfer_write f1, %0, %1#0, %1#1, %1#2, %1#3 +/// {permutation_map: (d0, d1, d2, d3) -> (d1, d0)} : +/// vector<4x4xf32>, memref<?x?x?x?xf32>, +/// index, index, index, index +/// %2 = affine_apply (d0, d1, d2, d3) -> (d0, d1, d2, d3 + 1) +/// (%i0, %i1, %i2, %i3) +/// vector_transfer_write {{.*}}, %0, %2#0, %2#1, %2#2, %2#3 +/// {permutation_map: (d0, d1, d2, d3) -> (d1, d0)} : +/// vector<4x4xf32>, memref<?x?x?x?xf32>, +/// index, index, index, index +/// %3 = affine_apply (d0, d1, d2, d3) -> (d0, d1, d2, d3 + 2) +/// (%i0, %i1, %i2, %i3) +/// vector_transfer_write {{.*}}, %0, %3#0, %3#1, %3#2, %3#3 +/// {permutation_map: (d0, d1, d2, d3) -> (d1, d0)} : +/// vector<4x4xf32>, memref<?x?x?x?xf32>, +/// index, index, index, index +/// %4 = affine_apply (d0, d1, d2, d3) -> (d0, d1, d2, d3 + 3) +/// (%i0, %i1, %i2, %i3) +/// vector_transfer_write {{.*}}, %0, %4#0, %4#1, %4#2, %4#3 +/// {permutation_map: (d0, d1, d2, d3) -> (d1, d0)} : +/// vector<4x4xf32>, memref<?x?x?x?xf32>, +/// index, index, index, index +/// }}}} +/// return +/// } +/// ``` using llvm::dbgs; using llvm::DenseSet; @@ -333,6 +400,58 @@ instantiate(MLFuncBuilder *b, OperationStmt *opStmt, VectorType superVectorType, materializeAttributes(opStmt, superVectorType, hwVectorType)); } +/// Computes the permutationMap required for a VectorTransferOp from the memref +/// to the `hwVectorType`. +/// This is achieved by returning the projection of the permutationMap along the +/// dimensions of the super-vector type that remain in the hwVectorType. +/// In particular, if a dimension is fully instantiated (i.e. unrolled) then it +/// is projected out in the final result. +template <typename VectorTransferOpTy> +static AffineMap projectedPermutationMap(VectorTransferOpTy *transfer, + VectorType hwVectorType) { + static_assert( + std::is_same<VectorTransferOpTy, VectorTransferReadOp>::value || + std::is_same<VectorTransferOpTy, VectorTransferWriteOp>::value, + "Must be called on a VectorTransferOp"); + auto superVectorType = transfer->getVectorType(); + auto optionalRatio = shapeRatio(superVectorType, hwVectorType); + assert(optionalRatio && + (optionalRatio->size() == superVectorType.getShape().size()) && + "Shape and ratio not of the same size"); + unsigned dim = 0; + SmallVector<AffineExpr, 4> keep; + MLIRContext *context = transfer->getOperation()->getContext(); + functional::zipApply( + [&dim, &keep, context](int shape, int ratio) { + assert(shape >= ratio && "shape dim must be greater than ratio dim"); + if (shape != ratio) { + // HW vector is not full instantiated along this dim, keep it. + keep.push_back(getAffineDimExpr(dim, context)); + } + ++dim; + }, + superVectorType.getShape(), *optionalRatio); + auto projectionMap = AffineMap::get(optionalRatio->size(), 0, keep, {}); + (void)projectionMap; + // No seemingly simple way to compose 2 AffineMap except going through SSA + // values... Punting for now and will resolve in the next CL. + // + // return projectionMap.compose(transfer->getPermutationMap()); + + // Still, we may need to drop a few dims to pass verification, so hack this in + // for now. + auto map = transfer->getPermutationMap(); + auto exprs = map.getResults(); + assert(exprs.size() >= keep.size()); + unsigned diff = exprs.size() - keep.size(); + SmallVector<AffineExpr, 4> projectedExprs(exprs.begin() + diff, exprs.end()); + auto res = AffineMap::get(map.getNumInputs(), 0, projectedExprs, {}); + LLVM_DEBUG(projectionMap.print(dbgs() << "\nProjectionMap: ")); + LLVM_DEBUG(map.print(dbgs() << "\nOriginal: ")); + LLVM_DEBUG(res.print(dbgs() << "\nTemporarily hacked projection: ")); + return res; +} + /// Creates an instantiated version of `read` for the instance of /// `hwVectorInstance` when lowering from a super-vector type to /// `hwVectorType`. `hwVectorInstance` represents one particular instance of @@ -349,8 +468,7 @@ instantiate(MLFuncBuilder *b, VectorTransferReadOp *read, reindexAffineIndices(b, hwVectorType, hwVectorInstance, indices); auto cloned = b->create<VectorTransferReadOp>( read->getLoc(), hwVectorType, read->getMemRef(), affineIndices, - makePermutationMap(read->getMemRefType(), hwVectorType), - read->getPaddingValue()); + projectedPermutationMap(read, hwVectorType), read->getPaddingValue()); return cast<OperationStmt>(cloned->getOperation()); } @@ -371,7 +489,7 @@ instantiate(MLFuncBuilder *b, VectorTransferWriteOp *write, auto cloned = b->create<VectorTransferWriteOp>( write->getLoc(), substitute(write->getVector(), *substitutionsMap), write->getMemRef(), affineIndices, - makePermutationMap(write->getMemRefType(), hwVectorType)); + projectedPermutationMap(write, hwVectorType)); return cast<OperationStmt>(cloned->getOperation()); } |

