| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As described in the documentation, ViewOp is expected to take an optional
dynamic offset followed by a list of dynamic sizes. However, the ViewOp parser
did not include a check for the offset being a single value and accepeted a
list of values instead.
Furthermore, several tests have been exercising the wrong syntax of a ViewOp,
passing multiple values to the dyanmic stride list, which was not caught by the
parser. The trailing values could have been erronously interpreted as dynamic
sizes. This is likely due to resyntaxing of the ViewOp, with the previous
syntax taking the list of sizes before the offset. Update the tests to use the
syntax with the offset preceding the sizes.
Worse, the conversion of ViewOp to the LLVM dialect assumed the wrong order of
operands with offset in the trailing position, and erronously relied on the
permissive parsing that interpreted trailing dynamic offset values as leading
dynamic sizes. Fix the lowering to use the correct order of operands.
PiperOrigin-RevId: 283532506
|
|
|
|
|
|
|
|
|
|
|
|
| |
stride
are constant (i.e., there are no size and stride operands).
We recently added canonicalization that rewrites constant size and stride operands to
SubViewOp into static information in the type, so these patterns now occur during code
generation.
PiperOrigin-RevId: 283524688
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM IR supports linkage on global objects such as global variables and
functions. Introduce the Linkage attribute into the LLVM dialect, backed by an
integer storage. Use this attribute on LLVM::GlobalOp and make it mandatory.
Implement parsing/printing of the attribute and conversion to LLVM IR.
See tensorflow/mlir#277.
PiperOrigin-RevId: 283309328
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These changes to SPIR-V lowering while adding support for lowering
SUbViewOp, but are not directly related.
- Change the lowering of MemRefType to
!spv.ptr<!spv.struct<!spv.array<...>[offset]>, ..>
This is consistent with the Vulkan spec.
- To enable testing a simple pattern of lowering functions is added to
ConvertStandardToSPIRVPass. This is just used to convert the type of
the arguments of the function. The added function lowering itself is
not meant to be the way functions are eventually lowered into SPIR-V
dialect.
PiperOrigin-RevId: 282589644
|
|
|
|
|
|
|
|
| |
This CL uses the recently added op to finish the implementation of Vector -> Vector unrolling by replacing the "fake join op" by a series of InsertStridedSliceOp.
Test is updated accordingly
PiperOrigin-RevId: 282451126
|
|
|
|
|
|
| |
Also change the text format a bit, so that indices are braced by squares.
PiperOrigin-RevId: 282437095
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To simplify the lowering into SPIR-V, while still respecting the ABI
requirements of SPIR-V/Vulkan, split the process into two
1) While lowering a function to SPIR-V (when the function is an entry
point function), allow specifying attributes on arguments and
function itself that describe the ABI of the function.
2) Add a pass that materializes the ABI described in the function.
Two attributes are needed.
1) Attribute on arguments of the entry point function that describe
the descriptor_set, binding, storage class, etc, of the
spv.globalVariable this argument will be replaced by
2) Attribute on function that specifies workgroup size, etc. (for now
only workgroup size).
Add the pass -spirv-lower-abi-attrs to materialize the ABI described
by the attributes.
This change makes the SPIRVBasicTypeConverter class unnecessary and is
removed, further simplifying the SPIR-V lowering path.
PiperOrigin-RevId: 282387587
|
|
|
|
| |
PiperOrigin-RevId: 282132339
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current SubViewOp specification allows for either all offsets,
shape and stride to be dynamic or all of them to be static. There are
opportunities for more fine-grained canonicalization based on which of
these are static. For example, if the sizes are static, the result
memref is of static shape. The specification of SubViewOp is modified
to allow on or more of offsets, shapes and strides to be statically
specified. The verification is updated to ensure that the result type
of the subview op is consistent with which of these are static and
which are dynamic.
PiperOrigin-RevId: 281560457
|
|
|
|
|
|
|
|
|
| |
This CL uses the pattern rewrite infrastructure to implement a simple VectorOps -> VectorOps legalization strategy to unroll coarse-grained vector operations into finer grained ones.
The transformation is written using local pattern rewrites to allow composition with other rewrites. It proceeds by iteratively introducing fake cast ops and cleaning canonicalizing or lowering them away where appropriate.
This is an example of writing transformations as compositions of local pattern rewrites that should enable us to make them significantly more declarative.
PiperOrigin-RevId: 281555100
|
|
|
|
|
|
|
|
|
|
|
|
| |
The command-line flag name `lower-to-llvm` for the pass performing dialect
conversion from the Standard dialect to the LLVM dialect is misleading and
inconsistent with most of the conversion passses. It leads the user to believe
that there are no restrictions on what can be converted, while in fact only a
subset of the Standard dialect can be converted (with operations from other
dialects converted by separate passes). Use `convert-std-to-llvm` that better
reflects what the pass does and is consistent with most other conversions.
PiperOrigin-RevId: 281238797
|
|
|
|
|
|
|
| |
This makes the flags consistent with the naming scheme used elsewhere in the
codebase for dialect conversions.
PiperOrigin-RevId: 281027517
|
|
|
|
|
|
| |
This is step 1/n in refactoring infrastructure along the Vector dialect to make it ready for retargetability and composable progressive lowering.
PiperOrigin-RevId: 280529784
|
|
|
|
|
|
|
|
|
|
| |
Following up on the consolidation of MemRef descriptor conversion, update
Vector-to-LLVM conversion to use the helper class that abstracts away the
implementation details of the MemRef descriptor. This also makes the types of
the attributes in emitted llvm.insert/extractelement operations consistently
i64 instead of a mix of index and i64.
PiperOrigin-RevId: 280441451
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL moves VectorOps to Tablegen and cleans up the implementation.
This is almost NFC but 2 changes occur:
1. an interface change occurs in the padding value specification in vector_transfer_read:
the value becomes non-optional. As a shortcut we currently use %f0 for all paddings.
This should become an OpInterface for vectorization in the future.
2. the return type of vector.type_cast is trivial and simplified to `memref<vector<...>>`
Relevant roundtrip and invalid tests that used to sit in core are moved to the vector dialect.
The op documentation is moved to the .td file.
PiperOrigin-RevId: 280430869
|
|
|
|
|
|
|
|
|
|
|
| |
This CL uses the now standard std.subview in linalg.
Two shortcuts are currently taken to allow this port:
1. the type resulting from a view is currently degraded to fully dynamic to pass the SubViewOp verifier.
2. indexing into SubViewOp may access out of bounds since lowering to LLVM does not currently enforce it by construction.
These will be fixed in subsequent commits after discussions.
PiperOrigin-RevId: 280250129
|
|
|
|
|
|
|
|
| |
Lowering of CmpIOp, DivISOp, RemISOp, SubIOp and SelectOp to SPIR-V
dialect enables the lowering of operations generated by AffineExpr ->
StandardOps conversion into the SPIR-V dialect.
PiperOrigin-RevId: 280039204
|
|
|
|
|
|
|
|
|
| |
loop::ForOp can be lowered to the structured control flow represented
by spirv::LoopOp by making the continue block of the spirv::LoopOp the
loop latch and the merge block the exit block. The resulting
spirv::LoopOp has a single back edge from the continue to header
block, and a single exit from header to merge.
PiperOrigin-RevId: 280015614
|
|
|
|
|
|
| |
A followup CL will replace usage of linalg.subview by std.subview.
PiperOrigin-RevId: 279961981
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL adds an extra pointer to the memref descriptor to allow specifying alignment.
In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing.
After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers.
This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ).
PiperOrigin-RevId: 279959463
|
|
|
|
| |
PiperOrigin-RevId: 278959717
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL ports the lowering of linalg.view to the newly introduced std.view.
Differences in implementation relate to std.view having slightly different semantics:
1. a static or dynamic offset can be specified.
2. the size of the (contiguous) shape is passed instead of a range.
3. static size and stride information is extracted from the memref type rather than the range.
Besides these differences, lowering behaves the same.
A future CL will update Linalg to use this unified infrastructure.
PiperOrigin-RevId: 278948853
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current lowering of loops to GPU only supports lowering of loop
nests where the loops mapped to workgroups and workitems are perfectly
nested. Here a new lowering is added to handle lowering of imperfectly
nested loop body with the following properties
1) The loops partitioned to workgroups are perfectly nested.
2) The loop body of the inner most loop partitioned to workgroups can
contain one or more loop nests that are to be partitioned across
workitems. Each individual loops nests partitioned to workitems should
also be perfectly nested.
3) The number of workgroups and workitems are not deduced from the
loop bounds but are passed in by the caller of the lowering as values.
4) For statements within the perfectly nested loop nest partitioned
across workgroups that are not loops, it is valid to have all threads
execute that statement. This is NOT verified.
PiperOrigin-RevId: 277958868
|
|
|
|
|
|
| |
isn't held by a ModuleOp (NFC)
PiperOrigin-RevId: 277752004
|
|
|
|
| |
PiperOrigin-RevId: 276859463
|
|
|
|
|
|
|
|
|
|
| |
The type constraint had to be relaxed due to the order of lowering passes in
the examples, that since has been fixed. The relaxed version was still used by
the CUDA lowering for launch sizes of `index` type. This is not necessary since
the GPU dialect does not restrict the type of the launch size operands. Use an
LLVM type instead and restore the check in the LLVM_CallOp definition.
PiperOrigin-RevId: 275920109
|
|
|
|
|
|
|
|
|
|
| |
A VectorTypeCastOp can only be used to lower between statically sized contiguous memrefs of scalar and matching vector type. The sizes and strides are thus fully static and easy to determine.
A relevant test is added.
This is a step towards solving tensorflow/mlir#189.
PiperOrigin-RevId: 275538981
|
|
|
|
|
|
|
|
| |
Adding gen table for rewrite patterns from GPU to NVVM dialect.
Copy missing op documentation from GPUOps.td to GPU.md.
PiperOrigin-RevId: 275419588
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Makes the spv.module generated by the GPU to SPIR-V conversion SPIR-V
spec compliant (validated using spirv-val from Vulkan tools).
1) Separate out the VulkanLayoutUtils from
DecorateSPIRVCompositeTypeLayoutPass to make it reusable within the
Type converter in SPIR-V lowering infrastructure. This is used to
compute the layout of the !spv.struct used in global variable type
description.
2) Set the capabilities of the spv.module to Shader (needed for use of
Logical Memory Model, and the extensions to
SPV_KHR_storage_buffer_storage_class for use of Storage Buffer)
PiperOrigin-RevId: 275081486
|
|
|
|
|
|
|
|
|
|
| |
In addition to specifying the type of accumulation through the 'op' attribute, the accumulation can now also be specified as arbitrary code region.
Adds a gpu.yield op to specify the result of the accumulation.
Also support more types (integers) and accumulations (mul).
PiperOrigin-RevId: 275065447
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, the lowering of `alloc` operations has been computing the number of
bytes to allocate when lowering based on the properties of MLIR type. This does
not take into account type legalization that happens when compiling LLVM IR
down to target assembly. This legalization can widen the type, potentially
leading to out-of-bounds accesses to `alloc`ed data due to mismatches between
address computation that takes the widening into account and allocation that
does not. Use the LLVM IR's equivalent of `sizeof` to compute the number of
bytes to be allocated:
%0 = getelementptr %type* null, %indexType 0
%1 = ptrtoint %type* %0 to %indexType
adapted from
http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt
PiperOrigin-RevId: 274159900
|
|
|
|
| |
PiperOrigin-RevId: 274152154
|
|
|
|
|
|
|
|
|
| |
This test was not updated in the original commit that switched to using LLVM
functions since it wasn't broken by that change. FileCheck was able to match
the `func` part of `llvm.func` to the expected pattern and continue as usual.
Make sure the `llvm.` dialect prefix is included in the expected output.
PiperOrigin-RevId: 274127281
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Standard to LLVM dialect conversion, the binary op conversion pattern
implicitly assumed some operands were of LLVM IR dialect type. This is not
necessarily true, for example if the Ops that produce those operands did not
match the existing convresion patterns. Check if all operands are of LLVM IR
dialect type and if not, fail to patch the binary op pattern.
Closes tensorflow/mlir#168
PiperOrigin-RevId: 274063207
|
|
|
|
|
|
|
|
|
|
|
| |
The lowering is specified as a pattern and is done only if the result
is a SPIR-V scalar type or vector type.
Handling ConstantOp with index return type needs special handling
since SPIR-V dialect does not have index types. Based on the bitwidth
of the attribute value, either i32 or i64 is chosen.
Other constant lowerings are left as a TODO.
PiperOrigin-RevId: 274056805
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function-like operation allows one to define functions that have wrapped
LLVM IR function type, in particular variadic functions. The operation was
added in parallel to the existing lowering flow, this commit only switches the
flow to use it.
Using a custom function type makes the LLVM IR dialect type system more
consistent and avoids complex conversion rules for functions that previously
had to use the built-in function type instead of a wrapped LLVM IR dialect type
and perform conversions during the analysis.
PiperOrigin-RevId: 273910855
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lowering infrastructure needs to be enhanced to lower into a
spv.Module that is consistent with the SPIR-V spec. The following
changes are needed
1) The Vulkan/SPIR-V validation rules dictates entry functions to have
signature of void(void). This requires changes to the function
signature conversion infrastructure within the dialect conversion
framework. When an argument is dropped from the original function
signature, a function can be specified that when invoked will return
the value to use as a replacement for the argument from the original
function.
2) Some changes to the type converter to make the converted type
consistent with the Vulkan/SPIR-V validation rules,
a) Add support for converting dynamically shaped tensors to
spv.rtarray type.
b) Make the global variable of type !spv.ptr<!spv.struct<...>>
3) Generate the entry point operation for the kernel functions and
automatically compute all the interface variables needed
PiperOrigin-RevId: 273784229
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, we were attaching attributes containing CUBIN blobs to the kernel
function called by `gpu.launch_func`. This kernel is now contained in a nested
module that is used as a compilation unit. Attach compiled CUBIN blobs to the
module rather than to the function since we were compiling the module. This
also avoids duplication of the attribute on multiple kernels within the same
module.
PiperOrigin-RevId: 273497303
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, the CUBIN getter function was introduced as a mechanism to
circumvent the absence of globals in the LLVM dialect. It would allocate memory
and populate it with the CUBIN data. LLVM dialect now supports globals and they
are already used to store CUBIN data, making the getter function a trivial
address computation of a global. Emit the address computation directly at the
place of `gpu.launch_func` instead of putting it in a function and calling it.
This simplifies the conversion flow and prepares it for using the
DialectConversion infrastructure.
PiperOrigin-RevId: 273496221
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the accessor function is a trivial getter of the global variable, it
makes less sense to have the getter generation as a separate pass. Move the
getter generation into the lowering of `gpu.launch_func` to CUDA calls. This
change is mostly code motion, but the process can be simplified further by
generating the addressof inplace instead of using a call. This is will be done
in a follow-up.
PiperOrigin-RevId: 273492517
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The kernel function called by gpu.launch_func is now placed into an isolated
nested module during the outlining stage to simplify separate compilation.
Until recently, modules did not have names and could not be referenced. This
limitation was circumvented by introducing a stub kernel at the same name at
the same nesting level as the module containing the actual kernel. This
relation is only effective in one direction: from actual kernel function to its
launch_func "caller".
Leverage the recently introduced symbol name attributes on modules to refer to
a specific nested module from `gpu.launch_func`. This removes the implicit
connection between the identically named stub and kernel functions. It also
enables support for `gpu.launch_func`s to call different kernels located in the
same module.
PiperOrigin-RevId: 273491891
|
|
|
|
| |
PiperOrigin-RevId: 272768027
|
|
|
|
|
|
|
|
| |
This makes the name of the conversion pass more consistent with the naming
scheme, since it actually converts from the Loop dialect to the Standard
dialect rather than working with arbitrary control flow operations.
PiperOrigin-RevId: 272612112
|
|
|
|
|
|
|
|
| |
offset determination.
This also adds coverage with a missing test, which uncovered a bug in the conditional for testing whether an offset is dynamic or not.
PiperOrigin-RevId: 272505798
|
|
|
|
|
|
|
|
|
| |
This is a follow-up to the PRtensorflow/mlir#146 which introduced the ROCDL Dialect. This PR introduces a pass to lower GPU Dialect to the ROCDL Dialect. As with the previous PR, this one builds on the work done by @whchung, and addresses most of the review comments in the original PR.
Closes tensorflow/mlir#154
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/154 from deven-amd:deven-lower-gpu-to-rocdl 809893e08236da5ab6a38e3459692fa04247773d
PiperOrigin-RevId: 272390729
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A recent ABI compatibility change affected the conversion from standard
CallOp/CallIndirectOp to LLVM::CallOp by changing its signature. In order to
analyze the signature, the code was looking up the callee symbol in the module.
This is incorrect since, during the conversion, the module may contain both the
original and the converted function op that have the same symbol name. There is
no strict guarantee on which of the two symbols will be found by the lookup.
The conversion was not failing because the type legalizer converts the LLVM
types to themselves making the original and the converted function signatures
ultimately produce the same type.
Instead of looking up the function signature to get the list of result types,
use the types of the CallOp/CallIndirectOp results which must match those of
the function in valid IR. These types are guaranteed to be the original,
unconverted types when converting the operation. Furthermore, this avoids the
need to perform a lookup of a symbol name in the module which may be expensive.
Finally, propagate attributes as-is from the original op to the converted op
since they share the attribute name for the callee of direct calls and the rest
of attributes are not affected by the conversion. This removes the need for
additional contorsions between direct and indirect calls to extract the name of
the optional callee attribute only to insert it back. This also prevents the
conversion from unintentionally dropping the other attributes of the op.
PiperOrigin-RevId: 272218871
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL finishes the implementation of the lowering part of the [strided memref RFC](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
Strided memrefs correspond conceptually to the following templated C++ struct:
```
template <typename Elem, size_t Rank>
struct {
Elem *ptr;
int64_t offset;
int64_t sizes[Rank];
int64_t strides[Rank];
};
```
The linearization procedure for address calculation for strided memrefs is the same as for linalg views:
`base_offset + SUM_i index_i * stride_i`.
The following CL will unify Linalg and Standard by removing !linalg.view in favor of strided memrefs.
PiperOrigin-RevId: 272033399
|
|
|
|
| |
PiperOrigin-RevId: 271624731
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
boundaries in LLVMLowering.
The strided MemRef RFC discusses a normalized descriptor and interaction with library calls (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
Lowering of nested LLVM structs as value types does not play nicely with externally compiled C/C++ functions due to ABI issues.
Solving the ABI problem generally is a very complex problem and most likely involves taking
a dependence on clang that we do not want atm.
A simple workaround is to pass pointers to memref descriptors at function boundaries, which this CL implement.
PiperOrigin-RevId: 271591708
|
|
|
|
|
|
|
|
| |
The reduction operation is currently fixed to "add", and the scope is fixed to "workgroup".
The implementation is currently limited to sizes that are multiple 32 (warp size) and no larger than 1024.
PiperOrigin-RevId: 271290265
|