| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- the [begin, end) range identified for copying could end in between the
block, which makes hoisting invalid in some cases. Change the range
identification to always end with end of block.
- add test case to exercise these (with fast mem capacity set to minimal so
that single element memref buffers are generated at the innermost loop)
- the location of begin/end of the block range for data copying was
being confused with the insert points for copy in and copy out code.
In cases, where we choose to hoist transfers, these are separate.
- when copy loops are single iteration ones, promote their bodies at
the end of the pass.
- change default fast mem space to 1 (setting it to zero made it
generate DMA op's that won't verify in the default case - since the
DMA ops have a check for src/dest memref spaces being different).
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Co-Authored-By: Mehdi Amini <joker.eph@gmail.com>
Closes tensorflow/mlir#88
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/88 from bondhugula:datacopy 88697267c45e850c3ced87671e16e4a930c02a42
PiperOrigin-RevId: 266980911
|
|
|
|
|
|
| |
This interface will allow for providing hooks to interrop with operation folding. The first hook, 'shouldMaterializeInto', will allow for controlling which region to insert materialized constants into. The folder will generally materialize constants into the top-level isolated region, this allows for materializing into a lower level ancestor region if it is more profitable/correct.
PiperOrigin-RevId: 266702972
|
|
|
|
|
|
|
|
|
| |
pointer (NFC)
This is a convenient utility around the existing `getUsedValuesDefinedAbove()`
that take two regions.
PiperOrigin-RevId: 266686854
|
|
|
|
|
|
| |
AffineForOp themselves are pure and can be removed if there are no internal operations.
PiperOrigin-RevId: 266481293
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is done by providing a walk callback that returns a WalkResult. This result is either `advance` or `interrupt`. `advance` means that the walk should continue, whereas `interrupt` signals that the walk should stop immediately. An example is shown below:
auto result = op->walk([](Operation *op) {
if (some_invariant)
return WalkResult::interrupt();
return WalkResult::advance();
});
if (result.wasInterrupted())
...;
PiperOrigin-RevId: 266436700
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change refactors and cleans up the implementation of the operation walk methods. After this refactoring is that the explicit template parameter for the operation type is no longer needed for the explicit op walks. For example:
op->walk<AffineForOp>([](AffineForOp op) { ... });
is now accomplished via:
op->walk([](AffineForOp op) { ... });
PiperOrigin-RevId: 266209552
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- fix operand mapping while cloning sub-blocks to jam - was incorrect
for imperfect nests where def/use was across sub-blocks
- strengthen/generalize the first test case to cover the previously
missed scenario
- clean up the other cases while on this.
Previously, unroll-jamming the following nest
```
affine.for %arg0 = 0 to 2048 {
%0 = alloc() : memref<512x10xf32>
affine.for %arg1 = 0 to 10 {
%1 = affine.load %0[%arg0, %arg1] : memref<512x10xf32>
}
dealloc %0 : memref<512x10xf32>
}
```
would yield
```
%0 = alloc() : memref<512x10xf32>
%1 = affine.apply #map0(%arg0)
%2 = alloc() : memref<512x10xf32>
affine.for %arg1 = 0 to 10 {
%4 = affine.load %0[%arg0, %arg1] : memref<512x10xf32>
%5 = affine.apply #map0(%arg0)
%6 = affine.load %0[%5, %arg1] : memref<512x10xf32>
}
dealloc %0 : memref<512x10xf32>
%3 = affine.apply #map0(%arg0)
dealloc %0 : memref<512x10xf32>
```
instead of
```
module {
affine.for %arg0 = 0 to 2048 step 2 {
%0 = alloc() : memref<512x10xf32>
%1 = affine.apply #map0(%arg0)
%2 = alloc() : memref<512x10xf32>
affine.for %arg1 = 0 to 10 {
%4 = affine.load %0[%arg0, %arg1] : memref<512x10xf32>
%5 = affine.apply #map0(%arg0)
%6 = affine.load %2[%5, %arg1] : memref<512x10xf32>
}
dealloc %0 : memref<512x10xf32>
%3 = affine.apply #map0(%arg0)
dealloc %2 : memref<512x10xf32>
}
```
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#98
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/98 from bondhugula:ujam ddbc853f69b5608b3e8ff9b5ac1f6a5a0bb315a4
PiperOrigin-RevId: 266073460
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor replaceAllMemRefUsesWith to split it into two methods: the new
method does the replacement on a single op, and is used by the existing
one.
- make the methods return LogicalResult instead of bool
- Earlier, when replacement failed (due to non-deferencing uses of the
memref), the set of ops that had already been processed would have
been replaced leaving the IR in an inconsistent state. Now, a
pass is made over all ops to first check for non-deferencing
uses, and then replacement is performed. No test cases were affected
because all clients of this method were first checking for
non-deferencing uses before calling this method (for other reasons).
This isn't true for a use case in another upcoming PR (scalar
replacement); clients can now bail out with consistent IR on failure
of replaceAllMemRefUsesWith. Add test case.
- multiple deferencing uses of the same memref in a single op is
possible (we have no such use cases/scenarios), and this has always
remained unsupported. Add an assertion for this.
- minor fix to another test pipeline-data-transfer case.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#87
PiperOrigin-RevId: 265808183
|
|
|
|
|
|
| |
The context can easily be recovered from the Location in these situations.
PiperOrigin-RevId: 265578574
|
|
|
|
|
|
| |
This fixes a bug when folding ops with inner ops and inner ops are still being visited.
PiperOrigin-RevId: 265475780
|
|
|
|
|
|
| |
Fixes tensorflow/mlir#72
PiperOrigin-RevId: 265097597
|
|
|
|
| |
PiperOrigin-RevId: 264482571
|
|
|
|
| |
PiperOrigin-RevId: 264277760
|
|
|
|
| |
PiperOrigin-RevId: 264193915
|
|
|
|
|
|
|
|
| |
Switch to C++14 standard method as llvm::make_unique has been removed (
https://reviews.llvm.org/D66259). Also mark some targets as c++14 to ease next
integrates.
PiperOrigin-RevId: 263953918
|
|
|
|
|
|
|
|
| |
moved.
Often we want to ensure that block arguments are converted before operations that use them. This refactors the current implementation to be cleaner/less frequent by triggering conversion when a set of blocks are moved/inlined; or when legalization is successful.
PiperOrigin-RevId: 263795005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since raw pointers are always passed around for IR construct without
implying any ownership transfer, it can be error prone to have implicit
ownership transferred the same way.
For example this code can seem harmless:
Pass *pass = ....
pm.addPass(pass);
pm.addPass(pass);
pm.run(module);
PiperOrigin-RevId: 263053082
|
|
|
|
|
|
|
|
| |
reference.
The pattern list is not modified by any of these APIs and should thus be passed with const.
PiperOrigin-RevId: 262844002
|
|
|
|
|
|
| |
There are currently several different terms used to refer to a parent IR unit in 'get' methods: getParent/getEnclosing/getContaining. This cl standardizes all of these methods to use 'getParent*'.
PiperOrigin-RevId: 262680287
|
|
|
|
|
|
| |
This will allow for reusing the same pattern list, which may be costly to continually reconstruct, on multiple invocations.
PiperOrigin-RevId: 262664599
|
|
|
|
|
|
|
|
| |
This CL is step 2/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools.
This CL adds the vector.extractelement operation to the MLIR vector dialect as well as the appropriate roundtrip test. Lowering to LLVM will occur in the following CL.
PiperOrigin-RevId: 262545089
|
|
|
|
|
|
| |
These methods will allow replacing the uses of results with an existing operation, with the same number of results, or a range of values. This removes a number of hand-rolled result replacement loops and simplifies replacement for operations with multiple results.
PiperOrigin-RevId: 262206600
|
|
|
|
|
|
|
|
| |
via GreedyPatternRewriteDriver::replaceOp.
This fixes a bug where ops inside the parent op are visited even though the parent op has been removed.
PiperOrigin-RevId: 261953580
|
|
|
|
|
|
|
| |
This CL modifies the LowerLinalgToLoopsPass to use RewritePattern.
This will make it easier to inline Linalg generic functions and regions when emitting to loops in a subsequent CL.
PiperOrigin-RevId: 261894120
|
|
|
|
|
|
| |
This allows for proper forward declaration, as opposed to leaking the internal implementation via a using directive. This also allows for all pattern building to go through 'insert' methods on the OwningRewritePatternList, replacing uses of 'push_back' and 'RewriteListBuilder'.
PiperOrigin-RevId: 261816316
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When inlining the declaration for llvm::DenseSet/DenseMap in the mlir
namespace from a forward declaration, clang does not take the default
for the template parameters if their are declared later.
namespace llvm {
template<typename Foo>
class DenseMap;
}
namespace mlir {
using llvm::DenseMap;
}
namespace llvm {
template<typename Foo = int>
class DenseMap {};
}
namespace mlir {
DenseMap<> map;
}
PiperOrigin-RevId: 261495612
|
|
|
|
|
|
|
|
|
|
| |
AffineDataCopyGeneration pass relied on command line flags for internal logic
in several places, which makes it unusable in a library context (i.e. outside a
standalone mlir-opt binary that does the command line parsing). Define
configuration flags in the constructor instead, and set them up to command
line-based defaults to maintain the original behavior.
PiperOrigin-RevId: 261322364
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Explicit copying to contiguous buffers is a standard technique to avoid
conflict misses and TLB misses, and improve hardware prefetching
performance. When done in conjunction with cache tiling, it nearly
eliminates all cache conflict and TLB misses, and a single hardware
prefetch stream is needed per data tile.
- generalize/extend DMA generation pass (renamed data copying pass) to
perform either point-wise explicit copies to fast memory buffers or
DMAs (depending on a cmd line option). All logic is the same as
erstwhile -dma-generate.
- -affine-dma-generate is now renamed -affine-data-copy; when -dma flag is
provided, DMAs are generated, or else explicit copy loops are generated
(point-wise) by default.
- point-wise copying could be used for CPUs (or GPUs); some indicative
performance numbers with a "C" version of the MLIR when compiled with
and without this optimization (about 2x improvement here).
With a matmul on 4096^2 matrices on a single core of an Intel Core i7
Skylake i7-8700K with clang 8.0.0:
clang -O3: 518s
clang -O3 with MLIR tiling (128x128): 24.5s
clang -O3 with MLIR tiling + data copying 12.4s
(code equivalent to test/Transforms/data-copy.mlir func @matmul)
- fix some misleading comments.
- change default fast-mem space to 0 (more intuitive now with the
default copy generation using point-wise copies instead of DMAs)
On a simple 3-d matmul loop nest, code generated with -affine-data-copy:
```
affine.for %arg3 = 0 to 4096 step 128 {
affine.for %arg4 = 0 to 4096 step 128 {
%0 = affine.apply #map0(%arg3, %arg4)
%1 = affine.apply #map1(%arg3, %arg4)
%2 = alloc() : memref<128x128xf32, 2>
// Copy-in Out matrix.
affine.for %arg5 = 0 to 128 {
%5 = affine.apply #map2(%arg3, %arg5)
affine.for %arg6 = 0 to 128 {
%6 = affine.apply #map2(%arg4, %arg6)
%7 = load %arg2[%5, %6] : memref<4096x4096xf32>
affine.store %7, %2[%arg5, %arg6] : memref<128x128xf32, 2>
}
}
affine.for %arg5 = 0 to 4096 step 128 {
%5 = affine.apply #map0(%arg3, %arg5)
%6 = affine.apply #map1(%arg3, %arg5)
%7 = alloc() : memref<128x128xf32, 2>
// Copy-in LHS.
affine.for %arg6 = 0 to 128 {
%11 = affine.apply #map2(%arg3, %arg6)
affine.for %arg7 = 0 to 128 {
%12 = affine.apply #map2(%arg5, %arg7)
%13 = load %arg0[%11, %12] : memref<4096x4096xf32>
affine.store %13, %7[%arg6, %arg7] : memref<128x128xf32, 2>
}
}
%8 = affine.apply #map0(%arg5, %arg4)
%9 = affine.apply #map1(%arg5, %arg4)
%10 = alloc() : memref<128x128xf32, 2>
// Copy-in RHS.
affine.for %arg6 = 0 to 128 {
%11 = affine.apply #map2(%arg5, %arg6)
affine.for %arg7 = 0 to 128 {
%12 = affine.apply #map2(%arg4, %arg7)
%13 = load %arg1[%11, %12] : memref<4096x4096xf32>
affine.store %13, %10[%arg6, %arg7] : memref<128x128xf32, 2>
}
}
// Compute.
affine.for %arg6 = #map7(%arg3) to #map8(%arg3) {
affine.for %arg7 = #map7(%arg4) to #map8(%arg4) {
affine.for %arg8 = #map7(%arg5) to #map8(%arg5) {
%11 = affine.load %7[-%arg3 + %arg6, -%arg5 + %arg8] : memref<128x128xf32, 2>
%12 = affine.load %10[-%arg5 + %arg8, -%arg4 + %arg7] : memref<128x128xf32, 2>
%13 = affine.load %2[-%arg3 + %arg6, -%arg4 + %arg7] : memref<128x128xf32, 2>
%14 = mulf %11, %12 : f32
%15 = addf %13, %14 : f32
affine.store %15, %2[-%arg3 + %arg6, -%arg4 + %arg7] : memref<128x128xf32, 2>
}
}
}
dealloc %10 : memref<128x128xf32, 2>
dealloc %7 : memref<128x128xf32, 2>
}
%3 = affine.apply #map0(%arg3, %arg4)
%4 = affine.apply #map1(%arg3, %arg4)
// Copy out result matrix.
affine.for %arg5 = 0 to 128 {
%5 = affine.apply #map2(%arg3, %arg5)
affine.for %arg6 = 0 to 128 {
%6 = affine.apply #map2(%arg4, %arg6)
%7 = affine.load %2[%arg5, %arg6] : memref<128x128xf32, 2>
store %7, %arg2[%5, %6] : memref<4096x4096xf32>
}
}
dealloc %2 : memref<128x128xf32, 2>
}
}
```
With -affine-data-copy -dma:
```
affine.for %arg3 = 0 to 4096 step 128 {
%0 = affine.apply #map3(%arg3)
%1 = alloc() : memref<128xf32, 2>
%2 = alloc() : memref<1xi32>
affine.dma_start %arg2[%arg3], %1[%c0], %2[%c0], %c128_0 : memref<4096xf32>, memref<128xf32, 2>, memref<1xi32>
affine.dma_wait %2[%c0], %c128_0 : memref<1xi32>
%3 = alloc() : memref<1xi32>
affine.for %arg4 = 0 to 4096 step 128 {
%5 = affine.apply #map0(%arg3, %arg4)
%6 = affine.apply #map1(%arg3, %arg4)
%7 = alloc() : memref<128x128xf32, 2>
%8 = alloc() : memref<1xi32>
affine.dma_start %arg0[%arg3, %arg4], %7[%c0, %c0], %8[%c0], %c16384, %c4096, %c128_2 : memref<4096x4096xf32>, memref<128x128xf32, 2>, memref<1xi32>
affine.dma_wait %8[%c0], %c16384 : memref<1xi32>
%9 = affine.apply #map3(%arg4)
%10 = alloc() : memref<128xf32, 2>
%11 = alloc() : memref<1xi32>
affine.dma_start %arg1[%arg4], %10[%c0], %11[%c0], %c128_1 : memref<4096xf32>, memref<128xf32, 2>, memref<1xi32>
affine.dma_wait %11[%c0], %c128_1 : memref<1xi32>
affine.for %arg5 = #map3(%arg3) to #map5(%arg3) {
affine.for %arg6 = #map3(%arg4) to #map5(%arg4) {
%12 = affine.load %7[-%arg3 + %arg5, -%arg4 + %arg6] : memref<128x128xf32, 2>
%13 = affine.load %10[-%arg4 + %arg6] : memref<128xf32, 2>
%14 = affine.load %1[-%arg3 + %arg5] : memref<128xf32, 2>
%15 = mulf %12, %13 : f32
%16 = addf %14, %15 : f32
affine.store %16, %1[-%arg3 + %arg5] : memref<128xf32, 2>
}
}
dealloc %11 : memref<1xi32>
dealloc %10 : memref<128xf32, 2>
dealloc %8 : memref<1xi32>
dealloc %7 : memref<128x128xf32, 2>
}
%4 = affine.apply #map3(%arg3)
affine.dma_start %1[%c0], %arg2[%arg3], %3[%c0], %c128 : memref<128xf32, 2>, memref<4096xf32>, memref<1xi32>
affine.dma_wait %3[%c0], %c128 : memref<1xi32>
dealloc %3 : memref<1xi32>
dealloc %2 : memref<1xi32>
dealloc %1 : memref<128xf32, 2>
}
```
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#50
PiperOrigin-RevId: 261221903
|
|
|
|
|
|
| |
Reported by clang-6.
PiperOrigin-RevId: 260311814
|
|
|
|
|
|
|
| |
Clipping creates non-affine memory accesses, use std_load and std_store instead of affine_load and affine_store.
In the future we may also want a fill with the neutral element rather than clip, this would make the accesses affine if we wanted more analyses and transformations to happen post lowering to pointwise copies.
PiperOrigin-RevId: 260110503
|
|
|
|
|
|
|
|
| |
This mode analyzes which operations are legalizable to the given target if a conversion were to be applied, i.e. no rewrites are ever performed even on success. This mode is useful for device partitioning or other utilities that may want to analyze the effect of conversion to different targets before performing it.
The analysis method currently just fills a provided set with the operations that were found to be legalizable. This can be extended in the future to capture more information as necessary.
PiperOrigin-RevId: 259987105
|
|
|
|
|
|
| |
This CL moves LoopParametricTiling into test/lib as a pass for purely testing purposes.
PiperOrigin-RevId: 259300264
|
|
|
|
|
|
| |
This cl enforces that the conversion of the type signatures for regions, and thus their entry blocks, is handled via ConversionPatterns. A new hook 'applySignatureConversion' is added to the ConversionPatternRewriter to perform the desired conversion on a region. This also means that the handling of rewriting the signature of a FuncOp is moved to a pattern. A default implementation is provided via 'mlir::populateFuncOpTypeConversionPattern'. This removes the hacky implicit 'dynamically legal' status of FuncOp that was present previously, and leaves it up to the user to decide when/how to convert the signature of a function.
PiperOrigin-RevId: 259161999
|
|
|
|
|
|
| |
More moving less cloning.
PiperOrigin-RevId: 258947575
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL introduces a simple loop utility function which rewrites the bounds and step of a loop so as to become mappable on a regular grid of processors whose identifiers are given by SSA values.
A corresponding unit test is added.
For example, using CUDA terminology, and assuming a 2-d grid with processorIds = [blockIdx.x, threadIdx.x] and numProcessors = [gridDim.x, blockDim.x], the loop:
```
loop.for %i = %lb to %ub step %step {
...
}
```
is rewritten into a version resembling the following pseudo-IR:
```
loop.for %i = %lb + threadIdx.x + blockIdx.x * blockDim.x to %ub
step %gridDim.x * blockDim.x {
...
}
```
PiperOrigin-RevId: 258945942
|
|
|
|
|
|
|
|
|
|
|
| |
This CL adapts the recently introduced parametric tiling to have an API matching the tiling
of AffineForOp. The transformation using stripmineSink is more general and produces imperfectly nested loops.
Perfect nesting invariants of the tiled version are obtained by selectively applying hoisting of ops to isolate perfectly nested bands. Such hoisting may fail to produce a perfect loop nest in cases where ForOp transitively depend on enclosing induction variables. In such cases, the API provides a LogicalResult return but the SimpleParametricLoopTilingPass does not currently use this result.
A new unit test is added with a triangular loop for which the perfect nesting property does not hold. For this example, the old behavior was to produce IR that did not verify (some use was not dominated by its def).
PiperOrigin-RevId: 258928309
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DialectConversion.
This allows for providing specific handling for dynamically legal operations/dialects without overriding the general 'isDynamicallyLegal' hook. This also means that a derived ConversionTarget class need not always be defined when some operations are dynamically legal.
Example usage:
ConversionTarget target(...);
target.addDynamicallyLegalOp<ReturnOp>([](ReturnOp op) {
return ...
};
target.addDynamicallyLegalDialect<StandardOpsDialect>([](Operation *op) {
return ...
};
PiperOrigin-RevId: 258884753
|
|
|
|
|
|
| |
This specific PatternRewriter will allow for exposing hooks in the future that are only useful for the conversion framework, e.g. type conversions.
PiperOrigin-RevId: 258818122
|
|
|
|
|
|
|
|
| |
This cl begins a large refactoring over how signature types are converted in the DialectConversion infrastructure. The signatures of blocks are now converted on-demand when an operation held by that block is being converted. This allows for handling the case where a region is created as part of a pattern, something that wasn't possible previously.
This cl also generalizes the region signature conversion used by FuncOp to work on any region of any operation. This generalization allows for removing the 'apply*Conversion' functions that were specific to FuncOp/ModuleOp. The implementation currently uses a new hook on TypeConverter, 'convertRegionSignature', but this should ideally be removed in favor of using Patterns. That depends on adding support to the PatternRewriter used by ConversionPattern to allow applying signature conversions to regions, which should be coming in a followup.
PiperOrigin-RevId: 258645733
|
|
|
|
|
|
|
|
| |
This explicit tag is useful is several ways:
*) This simplifies how to mark sub sections of a dialect as explicitly unsupported, e.g. my target supports all operations in the foo dialect except for these select few. This is useful for partial lowerings between dialects.
*) Partial conversions will now verify that operations that were explicitly marked as illegal must be converted. This provides some guarantee that the operations that need to be lowered by a specific pass will be.
PiperOrigin-RevId: 258582879
|
|
|
|
|
|
| |
As the move to ODS is made, body and region names across affine and loop dialects are uniformized.
PiperOrigin-RevId: 258416590
|
|
|
|
|
|
|
|
|
|
|
| |
Users generally want several different modes of conversion. This cl refactors DialectConversion to provide two:
* Partial (applyPartialConversion)
- This mode allows for illegal operations to exist in the IR, and does not fail if an operation fails to be legalized.
* Full (applyFullConversion)
- This mode fails if any operation is not properly legalized to the conversion target. This allows for ensuring that the IR after a conversion only contains operations legal for the target.
PiperOrigin-RevId: 258412243
|
|
|
|
|
|
|
|
| |
patterns.
These methods don't compose well with the rest of conversion framework, and create artificial breaks in conversion. Replace these methods with two(populateAffineToStdConversionPatterns and populateLoopToStdConversionPatterns respectively) that populate a list of patterns to perform the same behavior.
PiperOrigin-RevId: 258219277
|
|
|
|
|
|
| |
'applyPatternsGreedily' is a useful utility outside of just function regions.
PiperOrigin-RevId: 258182937
|
|
|
|
|
|
|
|
| |
This cl changes the way that operations/blocks to convert are collected/traversed so that parent region operations can be legalized before their bodies. Most RewritePatterns for region operations assume that the entry arguments to each region are yet to be converted. Given that the bodies are currently converted first, this makes it difficult to fit these patterns into the same run as one converting types.
The operations/blocks to convert are now collected before any legalization has run, which simplifies the conversion logic itself, as legalization may insert new operations, move blocks, etc.
PiperOrigin-RevId: 258170158
|
|
|
|
|
|
| |
ConversionPattern should ideally only be used when the types of the operands are changing, which in this case they aren't. Using OpRewritePattern also lends to much simpler code.
PiperOrigin-RevId: 258158474
|
|
|
|
|
|
|
|
|
|
|
| |
Multiple (perfectly) nested loops with independent bounds can be combined into
a single loop and than subdivided into blocks of arbitrary size for load
balancing or more efficient parallelism exploitation. However, MLIR wants to
preserve the multi-dimensional multi-loop structure at higher levels of
abstraction. Introduce a transformation that coalesces nested loops with
independent bounds so that they can be further subdivided by tiling.
PiperOrigin-RevId: 258151016
|
|
|
|
|
|
|
| |
These ops should not belong to the std dialect.
This CL extracts them in their own dialect and updates the corresponding conversions and tests.
PiperOrigin-RevId: 258123853
|
|
|
|
|
|
| |
When using a RewritePattern and replacing an operation with an existing value, that value may have already been replaced by something else. This cl ensures that only the final value is used when applying rewrites.
PiperOrigin-RevId: 258058488
|
|
|
|
| |
PiperOrigin-RevId: 257903383
|