| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
conflicts with function types.
Summary: The current syntax for AffineMapAttr and IntegerSetAttr conflict with function types, making it currently impossible to round-trip function types(and e.g. FuncOp) in the IR. This revision changes the syntax for the attributes by wrapping them in a keyword. AffineMapAttr is wrapped with `affine_map<>` and IntegerSetAttr is wrapped with `affine_set<>`.
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D72429
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This diff adds missing GPU lowering ops to MLIR.
Reviewers: herhut, pifon2a, ftynse
Tags: #pre-merge_beta_testing, #llvm
Differential Revision: https://reviews.llvm.org/D72439
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This diff implements the progressive lowering of insert_strided_slice.
Two cases appear:
1. when the source and dest vectors have different ranks, extract the dest
subvector at the proper offset and reduce to case 2.
2. when they have the same rank N:
a. if the source and dest type are the same, the insertion is trivial:
just forward the source
b. otherwise, iterate over all N-1 D subvectors and create an
extract/insert_strided_slice/insert replacement, reducing the problem
to vecotrs of the same N-1 rank.
This combines properly with the other conversion patterns to lower all the way to LLVM.
Reviewers: ftynse, rriddle, AlexEichenberger, andydavis1, tetuante, nicolasvasilache
Reviewed By: andydavis1
Subscribers: merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72317
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This diff implements the progressive lowering of strided_slice to either:
1. extractelement + insertelement for the 1-D case
2. extract + optional strided_slice + insert for the n-D case.
This combines properly with the other conversion patterns to lower all the way to LLVM.
Appropriate tests are added.
Reviewers: ftynse, rriddle, AlexEichenberger, andydavis1, tetuante
Reviewed By: andydavis1
Subscribers: merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72310
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D72137
|
|
|
|
|
| |
This reverts commit 7e7f849a6d94f77f1a29630419acb7226051f4b6 because
it recorded the wrong commit author.
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D72296
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D72205
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D72137
|
|
|
|
|
|
|
|
|
|
| |
This commit fixes shader ABI attributes to use `spv.` as the prefix
so that they match the dialect's namespace. This enables us to add
verification hooks in the SPIR-V dialect to verify them.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D72062
|
|
|
|
|
| |
Leftover change from before the MLIR merge, reviewed at accepted at
https://github.com/tensorflow/mlir/pull/338.
|
|
|
|
|
|
|
|
|
| |
The conversion from std.and/std.or to spv.LogicalAnd/spv.LogicalOr is
only valid for boolean (i1) types. Modify BinaryOpPattern in
StandardToSPIRV.td to allow limiting the type of the operands for
which the pattern is applied.
Differential Revision: https://reviews.llvm.org/D71881
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename the 'shlis' operation in the standard dialect to 'shift_left'. Add tests
for this operation (these have been missing so far) and add a lowering to the
'shl' operation in the LLVM dialect.
Add also 'shift_right_signed' (lowered to LLVM's 'ashr') and 'shift_right_unsigned'
(lowered to 'lshr').
The original plan was to name these operations 'shift.left', 'shift.right.signed'
and 'shift.right.unsigned'. This works if the operations are prefixed with 'std.'
in MLIR assembly. Unfortunately during import the short form is ambigous with
operations from a hypothetical 'shift' dialect. The best solution seems to omit
dots in standard operations for now.
Closes tensorflow/mlir#226
PiperOrigin-RevId: 286803388
|
|
|
|
| |
PiperOrigin-RevId: 286650682
|
|
|
|
|
|
|
|
| |
This will allow us to lower most of gpu.all_reduce (when all_reduce
doesn't exist in the target dialect) within the GPU dialect, and only do
target-specific lowering for the shuffle op.
PiperOrigin-RevId: 286548256
|
|
|
|
|
|
|
|
| |
Introduces some centralized methods to move towards
consistent use of i32 as vector subscripts.
Note: sizes/strides/offsets attributes are still i64
PiperOrigin-RevId: 286434133
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Examples:
vector.print %f : f32
vector.print %x : vector<4xf32>
vector.print %y : vector<3x4xf32>
vector.print %z : vector<2x3x4xf32>
LLVM lowering replaces these with fully unrolled calls
into a small runtime support library that provides some
basic printing operations (single value, opening closing
bracket, comma, newline).
PiperOrigin-RevId: 286230325
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce affine.prefetch: op to prefetch using a multi-dimensional
subscript on a memref; similar to affine.load but has no effect on
semantics, but only on performance.
Provide lowering through std.prefetch, llvm.prefetch and map to llvm's
prefetch instrinsic. All attributes reflected through the lowering -
locality hint, rw, and instr/data cache.
affine.prefetch %0[%i, %j + 5], false, 3, true : memref<400x400xi32>
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#225
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/225 from bondhugula:prefetch 4c3b4e93bc64d9a5719504e6d6e1657818a2ead0
PiperOrigin-RevId: 286212997
|
|
|
|
|
|
|
|
|
|
| |
When memory attributions are present in `gpu.func`, require that they are of
memref type and live in memoryspaces 3 and 5 for workgroup and private memory
attributions, respectively. Adapt the conversion from the GPU dialect to the
NVVM dialect to drop the private memory space from attributions as NVVM is able
to model them as local `llvm.alloca`s in the default memory space.
PiperOrigin-RevId: 286161763
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This updates the lowering pipelines from the GPU dialect to lower-level
dialects (NVVM, SPIRV) to use the recently introduced gpu.func operation
instead of a standard function annotated with an attribute. In particular, the
kernel outlining is updated to produce gpu.func instead of std.func and the
individual conversions are updated to consume gpu.funcs and disallow standard
funcs after legalization, if necessary. The attribute "gpu.kernel" is preserved
in the generic syntax, but can also be used with the custom syntax on
gpu.funcs. The special kind of function for GPU allows one to use additional
features such as memory attribution.
PiperOrigin-RevId: 285822272
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM
Similar to insert/extract vector instructions but
(1) work on 1-D vectors only
(2) allow for a dynamic index
%c3 = constant 3 : index
%0 = vector.insertelement %arg0, %arg1[%c : index] : vector<4xf32>
%1 = vector.extractelement %arg0[%c3 : index] : vector<4xf32>
PiperOrigin-RevId: 285792205
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During the conversion from the standard dialect to the LLVM dialect,
memref-typed arguments are promoted from registers to memory and passed into
functions by pointer. This had been introduced into the lowering to work around
the abesnce of calling convention modeling in MLIR to enable better
interoperability with LLVM IR generated from C, and has been exerciced for
several months. Make this promotion the default calling covention when
converting to the LLVM dialect. This adds the documentation, simplifies the
code and makes the conversion consistent across function operations and
function types used in other places, e.g. in high-order functions or
attributes, which would not follow the same rule previously.
PiperOrigin-RevId: 285751280
|
|
|
|
|
|
| |
This change allows for DialectConversion to attempt folding as a mechanism to legalize illegal operations. This also expands folding support in OpBuilder::createOrFold to generate new constants when folding, and also enables it to work in the context of a PatternRewriter.
PiperOrigin-RevId: 285448440
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example, a shuffle
%1 = vector.shuffle %arg0, %arg1 [0 : i32, 1 : i32] : vector<2xf32>, vector<2xf32>
becomes a direct LLVM shuffle
0 = llvm.shufflevector %arg0, %arg1 [0 : i32, 1 : i32] : !llvm<"<2 x float>">, !llvm<"<2 x float>">
but
%1 = vector.shuffle %a, %b[1 : i32, 0 : i32, 2: i32] : vector<1x4xf32>, vector<2x4xf32>
becomes the more elaborate (note the index permutation that drives
argument selection for the extract operations)
%0 = llvm.mlir.undef : !llvm<"[3 x <4 x float>]">
%1 = llvm.extractvalue %arg1[0] : !llvm<"[2 x <4 x float>]">
%2 = llvm.insertvalue %1, %0[0] : !llvm<"[3 x <4 x float>]">
%3 = llvm.extractvalue %arg0[0] : !llvm<"[1 x <4 x float>]">
%4 = llvm.insertvalue %3, %2[1] : !llvm<"[3 x <4 x float>]">
%5 = llvm.extractvalue %arg1[1] : !llvm<"[2 x <4 x float>]">
%6 = llvm.insertvalue %5, %4[2] : !llvm<"[3 x <4 x float>]">
PiperOrigin-RevId: 285268164
|
|
|
|
|
|
| |
This type is not used anymore now that Linalg view and subview have graduated to std and that alignment is supported on alloc.
PiperOrigin-RevId: 285213424
|
|
|
|
|
|
|
| |
Closes tensorflow/mlir#312
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/312 from dfki-ehna:tanh 9e89b072ff91ff390ad739501745114feb3ac856
PiperOrigin-RevId: 285205674
|
|
|
|
| |
PiperOrigin-RevId: 285162061
|
|
|
|
|
|
|
| |
Both work for the current use case, but the latter allows implementing
prefix sums and is a little easier to understand for partial warps.
PiperOrigin-RevId: 285145287
|
|
|
|
|
|
|
| |
Closes tensorflow/mlir#313
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/313 from denis0x0D:sandbox/lowering_std_farith 41715070a74d13bfa9401957478978c1bb8006c0
PiperOrigin-RevId: 285023586
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example, an insert
%0 = vector.insert %arg0, %arg1[3 : i32] : f32 into vector<4xf32>
becomes
%0 = llvm.mlir.constant(3 : i32) : !llvm.i32
%1 = llvm.insertelement %arg0, %arg1[%0 : !llvm.i32] : !llvm<"<4 x float>">
A more elaborate example, inserting an element in a higher dimension
vector
%0 = vector.insert %arg0, %arg1[3 : i32, 7 : i32, 15 : i32] : f32 into vector<4x8x16xf32>
becomes
%0 = llvm.extractvalue %arg1[3 : i32, 7 : i32] : !llvm<"[4 x [8 x <16 x float>]]">
%1 = llvm.mlir.constant(15 : i32) : !llvm.i32
%2 = llvm.insertelement %arg0, %0[%1 : !llvm.i32] : !llvm<"<16 x float>">
%3 = llvm.insertvalue %2, %arg1[3 : i32, 7 : i32] : !llvm<"[4 x [8 x <16 x float>]]">
PiperOrigin-RevId: 284882443
|
|
|
|
|
|
| |
This reorganizes the vector transformations to be more easily testable as patterns and more easily composable into fused passes in the future.
PiperOrigin-RevId: 284817474
|
|
|
|
|
|
|
|
|
|
| |
For example
%0 = vector.shuffle %x, %y [3 : i32, 2 : i32, 1 : i32, 0 : i32] : vector<2xf32>, vector<2xf32>
yields a vector<4xf32> result with a permutation of the elements of %x and %y
PiperOrigin-RevId: 284657191
|
|
|
|
|
|
|
|
|
|
|
| |
The existing GPU to SPIR-V lowering created a spv.module for every
function with gpu.kernel attribute. A better approach is to lower the
module that the function lives in (which has the attribute
gpu.kernel_module) to a spv.module operation. This better captures the
host-device separation modeled by GPU dialect and simplifies the
lowering as well.
PiperOrigin-RevId: 284574688
|
|
|
|
|
|
|
| |
Unifies vector op unrolling transformation, by using the same unrolling implementation for contraction and elementwise operations.
Removes fakefork/join operations which are non longer needed now that we have the InsertStridedSlice operation.
PiperOrigin-RevId: 284570784
|
|
|
|
|
|
| |
Closes tensorflow/mlir#304
PiperOrigin-RevId: 284568358
|
|
|
|
|
|
|
| |
Since these operations lower to [insert|extract][element|value] at LLVM
dialect level, neither element nor value would correctly reflect the meaning.
PiperOrigin-RevId: 284240727
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example, a scalar broadcast
%0 = vector.broadcast %x : f32 to vector<2xf32>
return %0 : vector<2xf32>
which expands scalar x into vector [x,x] by lowering
to the following LLVM IR dialect to implement the
duplication over the leading dimension.
%0 = llvm.mlir.undef : !llvm<"<2 x float>">
%1 = llvm.mlir.constant(0 : index) : !llvm.i64
%2 = llvm.insertelement %x, %0[%1 : !llvm.i64] : !llvm<"<2 x float>">
%3 = llvm.shufflevector %2, %0 [0 : i32, 0 : i32] : !llvm<"<2 x float>">, !llvm<"<2 x float>">
return %3 : vector<2xf32>
In the trailing dimensions, the operand is simply
"passed through", unless a more elaborate "stretch"
is required.
For example
%0 = vector.broadcast %arg0 : vector<1xf32> to vector<4xf32>
return %0 : vector<4xf32>
becomes
%0 = llvm.mlir.undef : !llvm<"<4 x float>">
%1 = llvm.mlir.constant(0 : index) : !llvm.i64
%2 = llvm.extractelement %arg0[%1 : !llvm.i64] : !llvm<"<1 x float>">
%3 = llvm.mlir.constant(0 : index) : !llvm.i64
%4 = llvm.insertelement %2, %0[%3 : !llvm.i64] : !llvm<"<4 x float>">
%5 = llvm.shufflevector %4, %0 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm<"<4 x float>">, !llvm<"<4 x float>">
llvm.return %5 : !llvm<"<4 x float>">
PiperOrigin-RevId: 284219926
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GPU functions use memory attributions, a combination of Op attributes and
region arguments, to specify function-wide buffers placed in workgroup or
private memory spaces. Introduce a lowering pattern for GPU functions to be
converted to LLVM functions taking into account memory attributions. Workgroup
attributions get transformed into module-level globals with unique names
derived from function names. Private attributions get converted into
llvm.allocas inside the function body. In both cases, we inject at the
beginning of the function the IR that obtains the raw pointer to the data and
populates a MemRef descriptor based on the MemRef type of buffer, making
attributions compose with the rest of the MemRef lowering and transparent for
use with std.load and std.store. While using raw pointers instead of
descriptors might have been more efficient, it is better implemented as a
canonicalization or a separate transformation so that non-attribution memrefs
could also benefit from it.
PiperOrigin-RevId: 284208396
|
|
|
|
|
|
|
|
|
| |
Updates vector ContractionOp to use proper vector masks (produced by CreateMaskOp/ConstantMaskOp).
Leverages the following canonicalizations in unrolling unit test: CreateMaskOp -> ConstantMaskOp, StridedSliceOp(ConstantMaskOp) -> ConstantMaskOp
Removes IndexTupleOp (no longer needed now that we have vector mask ops).
Updates all unit tests.
PiperOrigin-RevId: 284182168
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#253
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/253 from bondhugula:dimop a4b464f24ae63fd259114558d87e11b8ee4dae86
PiperOrigin-RevId: 284169689
|
|
|
|
|
|
|
| |
Closes tensorflow/mlir#261
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/261 from nmostafa:nmostafa/unranked 96b6e918f6ed64496f7573b2db33c0b02658ca45
PiperOrigin-RevId: 284037040
|
|
|
|
|
|
|
|
|
|
| |
SPIR-V/Vulkan spec requires the workgroups size to be specified with
the spv.ExecutionMode operation. This was hard-wired to be set to a
particular value. It is now changed to be configurable by clients of
the pass or of the patterns that implement the lowering from GPU to
SPIRV.
PiperOrigin-RevId: 284017482
|
|
|
|
|
|
|
|
|
| |
malloc/free.
In the future, a more configurable malloc and free interface should be used and exposed via
extra parameters to the `createLowerToLLVMPass`. Until requirements are gathered, a simple CL flag allows generating code that runs successfully on hardware that cannot use the stdlib.
PiperOrigin-RevId: 283833424
|
|
|
|
|
|
| |
Now that we have unrolling as a declarative pattern, we can drop a full pass that has gone stale. In the future we may want to add specific unrolling patterns for VectorTransferReadOp.
PiperOrigin-RevId: 283806880
|
|
|
|
|
|
|
|
| |
type lists and indexing maps to a target vector size.
Adds unit tests for unrolling the vector ContractionOp with different iteration orders.
PiperOrigin-RevId: 283747503
|
|
|
|
|
|
|
|
|
| |
This CL refactors some of the MLIR vector dependencies to allow decoupling VectorOps, vector analysis, vector transformations and vector conversions from each other.
This makes the system more modular and allows extracting VectorToVector into VectorTransforms that do not depend on vector conversions.
This refactoring exhibited a bunch of cyclic library dependencies that have been cleaned up.
PiperOrigin-RevId: 283660308
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not all StandardOps can be lowered to SPIR-V. For example, subview op
implementation requires use of pointer bitcasts which is not valid
according to SPIR-V spec (or at least is ambiguous about it). Such ops
need to be removed/transformed before lowering to SPIR-V. The
SPIRVLegalizationPass is added a place where such legalizations can be
added. Current implementation folds the subview ops with load/stores
so that the lowering itself does not have to convert a subview op.
PiperOrigin-RevId: 283642981
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SPIR-V lowering used nested !spv.arrays to represented
multi-dimensional arrays, with the hope that in-conjunction with the
layout annotations, the shape and layout of memref can be represented
directly. It is unclear though how portable this representation will
end up being. It will rely on driver compilers implementing complex
index computations faithfully. A more portable approach is to use
linearized arrays to represent memrefs and explicitly instantiate all
the index computation in SPIR-V. This gives added benefit that we can
further optimize the generated code in MLIR before generating the
SPIR-V binary.
PiperOrigin-RevId: 283571167
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As described in the documentation, ViewOp is expected to take an optional
dynamic offset followed by a list of dynamic sizes. However, the ViewOp parser
did not include a check for the offset being a single value and accepeted a
list of values instead.
Furthermore, several tests have been exercising the wrong syntax of a ViewOp,
passing multiple values to the dyanmic stride list, which was not caught by the
parser. The trailing values could have been erronously interpreted as dynamic
sizes. This is likely due to resyntaxing of the ViewOp, with the previous
syntax taking the list of sizes before the offset. Update the tests to use the
syntax with the offset preceding the sizes.
Worse, the conversion of ViewOp to the LLVM dialect assumed the wrong order of
operands with offset in the trailing position, and erronously relied on the
permissive parsing that interpreted trailing dynamic offset values as leading
dynamic sizes. Fix the lowering to use the correct order of operands.
PiperOrigin-RevId: 283532506
|
|
|
|
|
|
|
|
|
|
|
|
| |
stride
are constant (i.e., there are no size and stride operands).
We recently added canonicalization that rewrites constant size and stride operands to
SubViewOp into static information in the type, so these patterns now occur during code
generation.
PiperOrigin-RevId: 283524688
|