| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
PiperOrigin-RevId: 280470142
|
|
|
|
|
|
|
| |
This is essentially a dialect conversion and conceptually belongs to
conversions.
PiperOrigin-RevId: 280460034
|
|
|
|
|
|
|
|
| |
Previous commits removed all uses of LLVMTypeConverter::k*PosInMemRefDescriptor
outside of the MemRefDescriptor class. These numbers are an implementation
detail and can be hidden under a layer of more semantic APIs.
PiperOrigin-RevId: 280442444
|
|
|
|
|
|
|
|
|
|
| |
Following up on the consolidation of MemRef descriptor conversion, update
Vector-to-LLVM conversion to use the helper class that abstracts away the
implementation details of the MemRef descriptor. This also makes the types of
the attributes in emitted llvm.insert/extractelement operations consistently
i64 instead of a mix of index and i64.
PiperOrigin-RevId: 280441451
|
|
|
|
| |
PiperOrigin-RevId: 280431812
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL moves VectorOps to Tablegen and cleans up the implementation.
This is almost NFC but 2 changes occur:
1. an interface change occurs in the padding value specification in vector_transfer_read:
the value becomes non-optional. As a shortcut we currently use %f0 for all paddings.
This should become an OpInterface for vectorization in the future.
2. the return type of vector.type_cast is trivial and simplified to `memref<vector<...>>`
Relevant roundtrip and invalid tests that used to sit in core are moved to the vector dialect.
The op documentation is moved to the .td file.
PiperOrigin-RevId: 280430869
|
|
|
|
|
|
|
|
|
|
|
| |
Following up on the consolidation of MemRef descriptor conversion, update
Linalg-to-LLVM conversion to use the helper class that abstracts away the
implementation details of the MemRef descriptor. This required MemRefDescriptor
to become publicly visible. Since this conversion is heavily EDSC-based,
introduce locally an additional wrapper that uses builder and location pointed
to by the EDSC context while emitting descriptor manipulation operations.
PiperOrigin-RevId: 280429228
|
|
|
|
|
|
|
|
|
|
|
| |
Memref descriptor is becoming increasingly complex. Memrefs are manipulated by
multiple standard instructions, each of which has a non-trivial lowering to the
LLVM dialect. This leads to verbose code that manipulates the descriptors
exposing the internals of insert/extractelement opreations. Implement a wrapper
class that contains a memref descriptor and provides semantically named methods
that build the primitive IR operations instead.
PiperOrigin-RevId: 280371225
|
|
|
|
|
|
| |
This refactors the implementation of block signature(type) conversion to not insert fake cast operations to perform the type conversion, but to instead create a new block containing the proper signature. This has the benefit of enabling the use of pre-computed analyses that rely on mapping values. It also leads to a much cleaner implementation overall. The major user facing change is that applySignatureConversion will now replace the entry block of the region, meaning that blocks generally shouldn't be cached over calls to applySignatureConversion.
PiperOrigin-RevId: 280226936
|
|
|
|
|
|
|
|
| |
Lowering of CmpIOp, DivISOp, RemISOp, SubIOp and SelectOp to SPIR-V
dialect enables the lowering of operations generated by AffineExpr ->
StandardOps conversion into the SPIR-V dialect.
PiperOrigin-RevId: 280039204
|
|
|
|
|
|
|
|
|
|
| |
Existing check that sets FuncOp to be dynamically legal was just
checking that the types of the argument are SPIR-V compatible. Since
the current conversion from GPU to SPIR-V does not handle lowering
non-kernel functions, change the legality check to verify that the
FuncOp has the gpu.kernel attribute and has void(void) return type.
PiperOrigin-RevId: 280032782
|
|
|
|
|
|
|
|
|
| |
loop::ForOp can be lowered to the structured control flow represented
by spirv::LoopOp by making the continue block of the spirv::LoopOp the
loop latch and the merge block the exit block. The resulting
spirv::LoopOp has a single back edge from the continue to header
block, and a single exit from header to merge.
PiperOrigin-RevId: 280015614
|
|
|
|
|
|
| |
A followup CL will replace usage of linalg.subview by std.subview.
PiperOrigin-RevId: 279961981
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL adds an extra pointer to the memref descriptor to allow specifying alignment.
In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing.
After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers.
This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ).
PiperOrigin-RevId: 279959463
|
|
|
|
|
|
| |
Now that a view op has graduated to the std dialect, we can update Linalg to use it and remove ops that have become obsolete. As a byproduct, the linalg buffer and associated ops can also disappear.
PiperOrigin-RevId: 279073591
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL ports the lowering of linalg.view to the newly introduced std.view.
Differences in implementation relate to std.view having slightly different semantics:
1. a static or dynamic offset can be specified.
2. the size of the (contiguous) shape is passed instead of a range.
3. static size and stride information is extracted from the memref type rather than the range.
Besides these differences, lowering behaves the same.
A future CL will update Linalg to use this unified infrastructure.
PiperOrigin-RevId: 278948853
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current lowering of loops to GPU only supports lowering of loop
nests where the loops mapped to workgroups and workitems are perfectly
nested. Here a new lowering is added to handle lowering of imperfectly
nested loop body with the following properties
1) The loops partitioned to workgroups are perfectly nested.
2) The loop body of the inner most loop partitioned to workgroups can
contain one or more loop nests that are to be partitioned across
workitems. Each individual loops nests partitioned to workitems should
also be perfectly nested.
3) The number of workgroups and workitems are not deduced from the
loop bounds but are passed in by the caller of the lowering as values.
4) For statements within the perfectly nested loop nest partitioned
across workgroups that are not loops, it is valid to have all threads
execute that statement. This is NOT verified.
PiperOrigin-RevId: 277958868
|
|
|
|
|
|
|
|
|
|
| |
Upstream LLVM gained support for #ifndef with https://reviews.llvm.org/D61888
This is changed mechanically via the following command:
find . -name "*.td" -exec sed -i -e ':a' -e 'N' -e '$!ba' -e 's/#ifdef \([A-Z_]*\)\n#else/#ifndef \1/g' {} \;
PiperOrigin-RevId: 277789427
|
|
|
|
| |
PiperOrigin-RevId: 277033167
|
|
|
|
| |
PiperOrigin-RevId: 276859463
|
|
|
|
| |
PiperOrigin-RevId: 276440911
|
|
|
|
|
|
| |
Rename GPU op names from gpu_Foo to GPU_FooOp.
PiperOrigin-RevId: 275882232
|
|
|
|
|
|
| |
Closes tensorflow/mlir#177
PiperOrigin-RevId: 275692653
|
|
|
|
| |
PiperOrigin-RevId: 275631166
|
|
|
|
|
|
|
|
| |
nvvm.shfl.sync.bfly optionally returns a predicate whether source lane was active. Support for this was added to clang in https://reviews.llvm.org/D68892.
Add an optional 'pred' unit attribute to the instruction to return this predicate. Specify this attribute in the partial warp reduction so we don't need to manually compute the predicate.
PiperOrigin-RevId: 275616564
|
|
|
|
|
|
|
|
|
|
| |
A VectorTypeCastOp can only be used to lower between statically sized contiguous memrefs of scalar and matching vector type. The sizes and strides are thus fully static and easy to determine.
A relevant test is added.
This is a step towards solving tensorflow/mlir#189.
PiperOrigin-RevId: 275538981
|
|
|
|
|
|
|
| |
This unbreaks the `cmake -G Ninja ../llvm -DLLVM_BUILD_EXAMPLES=ON -DLLVM_TARGETS_TO_BUILD="host"`
in my local OSS build
PiperOrigin-RevId: 275452330
|
|
|
|
|
|
|
|
| |
Adding gen table for rewrite patterns from GPU to NVVM dialect.
Copy missing op documentation from GPUOps.td to GPU.md.
PiperOrigin-RevId: 275419588
|
|
|
|
|
|
| |
These don't add any value, and some are even more restrictive than the respective static 'get' method.
PiperOrigin-RevId: 275391240
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Makes the spv.module generated by the GPU to SPIR-V conversion SPIR-V
spec compliant (validated using spirv-val from Vulkan tools).
1) Separate out the VulkanLayoutUtils from
DecorateSPIRVCompositeTypeLayoutPass to make it reusable within the
Type converter in SPIR-V lowering infrastructure. This is used to
compute the layout of the !spv.struct used in global variable type
description.
2) Set the capabilities of the spv.module to Shader (needed for use of
Logical Memory Model, and the extensions to
SPV_KHR_storage_buffer_storage_class for use of Storage Buffer)
PiperOrigin-RevId: 275081486
|
|
|
|
|
|
|
|
|
|
| |
In addition to specifying the type of accumulation through the 'op' attribute, the accumulation can now also be specified as arbitrary code region.
Adds a gpu.yield op to specify the result of the accumulation.
Also support more types (integers) and accumulations (mul).
PiperOrigin-RevId: 275065447
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current SignatureConversion framework (part of DialectConversion)
allows remapping input arguments to a function from 1->0, 1->1 or
1->many arguments during conversion. Another case is where the
argument itself is dropped, but it's use are remapped to another
Value*.
An example of this is: The Vulkan/SPIR-V spec requires entry functions
to be of type void(void). The GPU -> SPIR-V conversion implemented
this without having the DialectConversion framework track the
remapping that lead to some undefined behavior. The changes here
addresses that.
PiperOrigin-RevId: 275059656
|
|
|
|
|
|
| |
This hook is useful when an operation is known to be dead, and no replacement values make sense.
PiperOrigin-RevId: 275052756
|
|
|
|
|
|
|
| |
This CL adds a missing lowering for splat of multi-dimensional vectors.
Additional support is also added to the runtime utils library to allow printing memrefs with such vectors.
PiperOrigin-RevId: 274794723
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, the lowering of `alloc` operations has been computing the number of
bytes to allocate when lowering based on the properties of MLIR type. This does
not take into account type legalization that happens when compiling LLVM IR
down to target assembly. This legalization can widen the type, potentially
leading to out-of-bounds accesses to `alloc`ed data due to mismatches between
address computation that takes the widening into account and allocation that
does not. Use the LLVM IR's equivalent of `sizeof` to compute the number of
bytes to be allocated:
%0 = getelementptr %type* null, %indexType 0
%1 = ptrtoint %type* %0 to %indexType
adapted from
http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt
PiperOrigin-RevId: 274159900
|
|
|
|
|
|
|
|
|
|
|
|
| |
- dropping what looks like outdated code post some of the previous
updates
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#179
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/179 from bondhugula:llfix 2a72ea441fe1b3924802273ffbe9870afeb90f91
PiperOrigin-RevId: 274158273
|
|
|
|
| |
PiperOrigin-RevId: 274154655
|
|
|
|
| |
PiperOrigin-RevId: 274152154
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Standard to LLVM dialect conversion, the binary op conversion pattern
implicitly assumed some operands were of LLVM IR dialect type. This is not
necessarily true, for example if the Ops that produce those operands did not
match the existing convresion patterns. Check if all operands are of LLVM IR
dialect type and if not, fail to patch the binary op pattern.
Closes tensorflow/mlir#168
PiperOrigin-RevId: 274063207
|
|
|
|
|
|
|
|
|
|
|
| |
The lowering is specified as a pattern and is done only if the result
is a SPIR-V scalar type or vector type.
Handling ConstantOp with index return type needs special handling
since SPIR-V dialect does not have index types. Based on the bitwidth
of the attribute value, either i32 or i64 is chosen.
Other constant lowerings are left as a TODO.
PiperOrigin-RevId: 274056805
|
|
|
|
| |
PiperOrigin-RevId: 273948293
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function-like operation allows one to define functions that have wrapped
LLVM IR function type, in particular variadic functions. The operation was
added in parallel to the existing lowering flow, this commit only switches the
flow to use it.
Using a custom function type makes the LLVM IR dialect type system more
consistent and avoids complex conversion rules for functions that previously
had to use the built-in function type instead of a wrapped LLVM IR dialect type
and perform conversions during the analysis.
PiperOrigin-RevId: 273910855
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lowering infrastructure needs to be enhanced to lower into a
spv.Module that is consistent with the SPIR-V spec. The following
changes are needed
1) The Vulkan/SPIR-V validation rules dictates entry functions to have
signature of void(void). This requires changes to the function
signature conversion infrastructure within the dialect conversion
framework. When an argument is dropped from the original function
signature, a function can be specified that when invoked will return
the value to use as a replacement for the argument from the original
function.
2) Some changes to the type converter to make the converted type
consistent with the Vulkan/SPIR-V validation rules,
a) Add support for converting dynamically shaped tensors to
spv.rtarray type.
b) Make the global variable of type !spv.ptr<!spv.struct<...>>
3) Generate the entry point operation for the kernel functions and
automatically compute all the interface variables needed
PiperOrigin-RevId: 273784229
|
|
|
|
| |
PiperOrigin-RevId: 273707610
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, we were attaching attributes containing CUBIN blobs to the kernel
function called by `gpu.launch_func`. This kernel is now contained in a nested
module that is used as a compilation unit. Attach compiled CUBIN blobs to the
module rather than to the function since we were compiling the module. This
also avoids duplication of the attribute on multiple kernels within the same
module.
PiperOrigin-RevId: 273497303
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, the CUBIN getter function was introduced as a mechanism to
circumvent the absence of globals in the LLVM dialect. It would allocate memory
and populate it with the CUBIN data. LLVM dialect now supports globals and they
are already used to store CUBIN data, making the getter function a trivial
address computation of a global. Emit the address computation directly at the
place of `gpu.launch_func` instead of putting it in a function and calling it.
This simplifies the conversion flow and prepares it for using the
DialectConversion infrastructure.
PiperOrigin-RevId: 273496221
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the accessor function is a trivial getter of the global variable, it
makes less sense to have the getter generation as a separate pass. Move the
getter generation into the lowering of `gpu.launch_func` to CUDA calls. This
change is mostly code motion, but the process can be simplified further by
generating the addressof inplace instead of using a call. This is will be done
in a follow-up.
PiperOrigin-RevId: 273492517
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The kernel function called by gpu.launch_func is now placed into an isolated
nested module during the outlining stage to simplify separate compilation.
Until recently, modules did not have names and could not be referenced. This
limitation was circumvented by introducing a stub kernel at the same name at
the same nesting level as the module containing the actual kernel. This
relation is only effective in one direction: from actual kernel function to its
launch_func "caller".
Leverage the recently introduced symbol name attributes on modules to refer to
a specific nested module from `gpu.launch_func`. This removes the implicit
connection between the identically named stub and kernel functions. It also
enables support for `gpu.launch_func`s to call different kernels located in the
same module.
PiperOrigin-RevId: 273491891
|
|
|
|
|
|
| |
gpu.all_reduce now supports block sizes that are not multiple of 32.
PiperOrigin-RevId: 273255204
|
|
|
|
|
|
|
|
|
|
| |
MemRefType:;getDynamicStrideOrOffset() method - NFC
This fixes global ODR-use issues, some of which manifest in Parser.cpp.
Fixes tensorflow/mlir#167.
PiperOrigin-RevId: 272886347
|