| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
This reverts commit 4624a1e8ac8a3f69cc887403b976f538f587744a. Causing
problems downstream.
(cherry picked from commit 0133cc60e4e230ee2c176c23eff5aa2f4ee17a75)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is based on the use of code constantly checking for an attribute on
a model and instead represents the distinct operaion with a different
op. Instead, this op can be used to provide better filtering.
Reviewers: herhut, mravishankar, antiagainst, rriddle
Reviewed By: herhut, antiagainst, rriddle
Subscribers: liufengdb, aartbik, jholewinski, mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72336
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This diff adds missing GPU lowering ops to MLIR.
Reviewers: herhut, pifon2a, ftynse
Tags: #pre-merge_beta_testing, #llvm
Differential Revision: https://reviews.llvm.org/D72439
|
|
|
|
|
|
|
|
|
|
| |
properly value-typed.
Summary: These were temporary methods used to simplify the transition.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D72548
|
|
|
|
|
|
| |
ValuePtr was a temporary typedef during the transition to a value-typed Value.
PiperOrigin-RevId: 286945714
|
|
|
|
| |
PiperOrigin-RevId: 286906740
|
|
|
|
|
|
|
|
|
|
| |
Value being value-typed.
This is an initial step to refactoring the representation of OpResult as proposed in: https://groups.google.com/a/tensorflow.org/g/mlir/c/XXzzKhqqF_0/m/v6bKb08WCgAJ
This change will make it much simpler to incrementally transition all of the existing code to use value-typed semantics.
PiperOrigin-RevId: 286844725
|
|
|
|
|
|
|
|
| |
This will allow us to lower most of gpu.all_reduce (when all_reduce
doesn't exist in the target dialect) within the GPU dialect, and only do
target-specific lowering for the shuffle op.
PiperOrigin-RevId: 286548256
|
|
|
|
|
|
|
|
| |
in `mlir` namespace.
Aside from being cleaner, this also makes the codebase more consistent.
PiperOrigin-RevId: 286206974
|
|
|
|
|
|
|
|
|
|
| |
When memory attributions are present in `gpu.func`, require that they are of
memref type and live in memoryspaces 3 and 5 for workgroup and private memory
attributions, respectively. Adapt the conversion from the GPU dialect to the
NVVM dialect to drop the private memory space from attributions as NVVM is able
to model them as local `llvm.alloca`s in the default memory space.
PiperOrigin-RevId: 286161763
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This updates the lowering pipelines from the GPU dialect to lower-level
dialects (NVVM, SPIRV) to use the recently introduced gpu.func operation
instead of a standard function annotated with an attribute. In particular, the
kernel outlining is updated to produce gpu.func instead of std.func and the
individual conversions are updated to consume gpu.funcs and disallow standard
funcs after legalization, if necessary. The attribute "gpu.kernel" is preserved
in the generic syntax, but can also be used with the custom syntax on
gpu.funcs. The special kind of function for GPU allows one to use additional
features such as memory attribution.
PiperOrigin-RevId: 285822272
|
|
|
|
|
|
| |
The clamp value determines the returned predicate. Previously, the clamp value was fixed to 31 and the predicate was therefore always true. This is incorrect for partial warp reductions, but went unnoticed because the returned values happened to be zero (but it could be anything).
PiperOrigin-RevId: 285343160
|
|
|
|
| |
PiperOrigin-RevId: 285162061
|
|
|
|
|
|
|
| |
Both work for the current use case, but the latter allows implementing
prefix sums and is a little easier to understand for partial warps.
PiperOrigin-RevId: 285145287
|
|
|
|
|
|
| |
Remove nested anonymous namespace.
PiperOrigin-RevId: 284987357
|
|
|
|
|
|
| |
This enables combining the patterns with other patterns into larger lowerings.
PiperOrigin-RevId: 284979271
|
|
|
|
|
|
| |
This allows for users to provide operand_range and result_range in builder.create<> calls, instead of requiring an explicit copy into a separate data structure like SmallVector/std::vector.
PiperOrigin-RevId: 284360710
|
|
|
|
|
|
|
|
|
| |
Accept the address space of the global as a builder argument when constructing
an LLVM::GlobalOp instance. This decreases the reliance of LLVM::GlobalOp users
on the internal name of the attribute used for this purpose. Update several
uses of the address space in GPU to NVVM conversion.
PiperOrigin-RevId: 284233254
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GPU functions use memory attributions, a combination of Op attributes and
region arguments, to specify function-wide buffers placed in workgroup or
private memory spaces. Introduce a lowering pattern for GPU functions to be
converted to LLVM functions taking into account memory attributions. Workgroup
attributions get transformed into module-level globals with unique names
derived from function names. Private attributions get converted into
llvm.allocas inside the function body. In both cases, we inject at the
beginning of the function the IR that obtains the raw pointer to the data and
populates a MemRef descriptor based on the MemRef type of buffer, making
attributions compose with the rest of the MemRef lowering and transparent for
use with std.load and std.store. While using raw pointers instead of
descriptors might have been more efficient, it is better implemented as a
canonicalization or a separate transformation so that non-attribution memrefs
could also benefit from it.
PiperOrigin-RevId: 284208396
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM IR supports linkage on global objects such as global variables and
functions. Introduce the Linkage attribute into the LLVM dialect, backed by an
integer storage. Use this attribute on LLVM::GlobalOp and make it mandatory.
Implement parsing/printing of the attribute and conversion to LLVM IR.
See tensorflow/mlir#277.
PiperOrigin-RevId: 283309328
|
|
|
|
|
|
|
| |
This makes the flags consistent with the naming scheme used elsewhere in the
codebase for dialect conversions.
PiperOrigin-RevId: 281027517
|
|
|
|
| |
PiperOrigin-RevId: 276440911
|
|
|
|
|
|
| |
Rename GPU op names from gpu_Foo to GPU_FooOp.
PiperOrigin-RevId: 275882232
|
|
|
|
|
|
|
|
| |
nvvm.shfl.sync.bfly optionally returns a predicate whether source lane was active. Support for this was added to clang in https://reviews.llvm.org/D68892.
Add an optional 'pred' unit attribute to the instruction to return this predicate. Specify this attribute in the partial warp reduction so we don't need to manually compute the predicate.
PiperOrigin-RevId: 275616564
|
|
|
|
|
|
|
|
| |
Adding gen table for rewrite patterns from GPU to NVVM dialect.
Copy missing op documentation from GPUOps.td to GPU.md.
PiperOrigin-RevId: 275419588
|
|
|
|
|
|
|
|
|
|
| |
In addition to specifying the type of accumulation through the 'op' attribute, the accumulation can now also be specified as arbitrary code region.
Adds a gpu.yield op to specify the result of the accumulation.
Also support more types (integers) and accumulations (mul).
PiperOrigin-RevId: 275065447
|
|
|
|
| |
PiperOrigin-RevId: 273948293
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function-like operation allows one to define functions that have wrapped
LLVM IR function type, in particular variadic functions. The operation was
added in parallel to the existing lowering flow, this commit only switches the
flow to use it.
Using a custom function type makes the LLVM IR dialect type system more
consistent and avoids complex conversion rules for functions that previously
had to use the built-in function type instead of a wrapped LLVM IR dialect type
and perform conversions during the analysis.
PiperOrigin-RevId: 273910855
|
|
|
|
| |
PiperOrigin-RevId: 273707610
|
|
|
|
|
|
| |
gpu.all_reduce now supports block sizes that are not multiple of 32.
PiperOrigin-RevId: 273255204
|
|
|
|
|
|
|
|
|
|
| |
The GPUIndexIntrinsicOpLowering template is currently used by the code in both the GPUToNVVM and GPUToROCDL dirs.
Moving it to a common location to remove code duplication.
Closes tensorflow/mlir#163
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/163 from deven-amd:deven-refactor-gpu-index-ops-lowering b8dc2a5f5353df196039b6ff2ad42106028693ed
PiperOrigin-RevId: 272863297
|
|
|
|
| |
PiperOrigin-RevId: 272851237
|
|
|
|
|
|
|
|
| |
Perform second reduce only with first warp. This requires an additional __sync_threads(), but doesn't need special handling when the last warp is small. This simplifies support for block sizes that are not multiple of 32.
Supporting partial warp reduce will be done in a separate CL.
PiperOrigin-RevId: 272168917
|
|
|
|
|
|
| |
local workgroup -> block, subgroup -> warp, invocation -> thread.
PiperOrigin-RevId: 271946342
|
|
|
|
|
|
|
|
| |
The reduction operation is currently fixed to "add", and the scope is fixed to "workgroup".
The implementation is currently limited to sizes that are multiple 32 (warp size) and no larger than 1024.
PiperOrigin-RevId: 271290265
|
|
|
|
|
|
|
|
| |
Roll forward of commit 5684a12.
When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module.
PiperOrigin-RevId: 270639748
|
|
|
|
| |
PiperOrigin-RevId: 270126672
|
|
|
|
|
|
| |
When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module.
PiperOrigin-RevId: 269987720
|
|
|
|
|
|
| |
These directives were temporary during the generalization of FunctionPass/ModulePass to OpPass.
PiperOrigin-RevId: 268970259
|
|
|
|
|
|
|
|
|
| |
Instead of lowering the program in two steps (Standard->LLVM followed
by GPU->NVVM), leading to invalid IR inbetween, the runner now uses
one pattern based rewrite step to go directly from Standard+GPU to
LLVM+NVVM.
PiperOrigin-RevId: 265861934
|
|
|
|
| |
PiperOrigin-RevId: 264193915
|
|
|
|
|
|
|
|
| |
Switch to C++14 standard method as llvm::make_unique has been removed (
https://reviews.llvm.org/D66259). Also mark some targets as c++14 to ease next
integrates.
PiperOrigin-RevId: 263953918
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since raw pointers are always passed around for IR construct without
implying any ownership transfer, it can be error prone to have implicit
ownership transferred the same way.
For example this code can seem harmless:
Pass *pass = ....
pm.addPass(pass);
pm.addPass(pass);
pm.run(module);
PiperOrigin-RevId: 263053082
|
|
|
|
|
|
|
| |
Per tacit agreement, individual dialects should now live in lib/Dialect/Name
with headers in include/mlir/Dialect/Name and tests in test/Dialect/Name.
PiperOrigin-RevId: 259896851
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GPU dialect operations (launch and launch_func) use `index` type for thread and
block index values inside the kernel, for compatibility with affine loops.
NVVM dialect operations, following the NVVM intrinsics, use `!llvm.i32` type,
which does not necessarily have the same bit width as the lowered `index` type.
Optionally sign-extend (indices are signed) or truncate the result of the NVVM
dialect operation to the bit width of the lowered `index` type before passing
it to other operations. This behavior is consistent with `std.index_cast`. We
cannot use the latter since we are targeting LLVM dialect types directly,
rather than standard integer types.
PiperOrigin-RevId: 254980868
|
|
|
|
|
|
| |
Rename `createLowerGpuOpsToNNVMOpsPass` to `createLowerGpuOpsToNVVMOpsPass`.
PiperOrigin-RevId: 253801577
|
|
|
|
| |
PiperOrigin-RevId: 253767688
|
|
PiperOrigin-RevId: 253551452
|