bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Fix affine data copy generation corner cases/bugs	Uday Bondhugula	2019-09-03	1	-79/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- the [begin, end) range identified for copying could end in between the block, which makes hoisting invalid in some cases. Change the range identification to always end with end of block. - add test case to exercise these (with fast mem capacity set to minimal so that single element memref buffers are generated at the innermost loop) - the location of begin/end of the block range for data copying was being confused with the insert points for copy in and copy out code. In cases, where we choose to hoist transfers, these are separate. - when copy loops are single iteration ones, promote their bodies at the end of the pass. - change default fast mem space to 1 (setting it to zero made it generate DMA op's that won't verify in the default case - since the DMA ops have a check for src/dest memref spaces being different). Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Co-Authored-By: Mehdi Amini <joker.eph@gmail.com> Closes tensorflow/mlir#88 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/88 from bondhugula:datacopy 88697267c45e850c3ced87671e16e4a930c02a42 PiperOrigin-RevId: 266980911
*	Add a new dialect interface for the OperationFolder `OpFolderDialectInterface`.	River Riddle	2019-09-01	2	-6/+14
\| \| \| \| \| \|	This interface will allow for providing hooks to interrop with operation folding. The first hook, 'shouldMaterializeInto', will allow for controlling which region to insert materialized constants into. The folder will generally materialize constants into the top-level isolated region, this allows for materializing into a lower level ancestor region if it is more profitable/correct. PiperOrigin-RevId: 266702972
*	Add a `getUsedValuesDefinedAbove()` overload that takes an `Operation` ↵	Mehdi Amini	2019-09-01	1	-0/+6
\| \| \| \| \| \| \| \| \|	pointer (NFC) This is a convenient utility around the existing `getUsedValuesDefinedAbove()` that take two regions. PiperOrigin-RevId: 266686854
*	Add a canonicalization to erase empty AffineForOps.	River Riddle	2019-08-30	1	-6/+0
\| \| \| \| \| \|	AffineForOp themselves are pure and can be removed if there are no internal operations. PiperOrigin-RevId: 266481293
*	Add support for early exit walk methods.	River Riddle	2019-08-30	1	-11/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is done by providing a walk callback that returns a WalkResult. This result is either `advance` or `interrupt`. `advance` means that the walk should continue, whereas `interrupt` signals that the walk should stop immediately. An example is shown below: auto result = op->walk([](Operation *op) { if (some_invariant) return WalkResult::interrupt(); return WalkResult::advance(); }); if (result.wasInterrupted()) ...; PiperOrigin-RevId: 266436700
*	Refactor the 'walk' methods for operations.	River Riddle	2019-08-29	8	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \|	This change refactors and cleans up the implementation of the operation walk methods. After this refactoring is that the explicit template parameter for the operation type is no longer needed for the explicit op walks. For example: op->walk<AffineForOp>([](AffineForOp op) { ... }); is now accomplished via: op->walk([](AffineForOp op) { ... }); PiperOrigin-RevId: 266209552
*	fix loop unroll and jam - operand mapping - imperfect nest case	Uday Bondhugula	2019-08-28	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- fix operand mapping while cloning sub-blocks to jam - was incorrect for imperfect nests where def/use was across sub-blocks - strengthen/generalize the first test case to cover the previously missed scenario - clean up the other cases while on this. Previously, unroll-jamming the following nest ``` affine.for %arg0 = 0 to 2048 { %0 = alloc() : memref<512x10xf32> affine.for %arg1 = 0 to 10 { %1 = affine.load %0[%arg0, %arg1] : memref<512x10xf32> } dealloc %0 : memref<512x10xf32> } ``` would yield ``` %0 = alloc() : memref<512x10xf32> %1 = affine.apply #map0(%arg0) %2 = alloc() : memref<512x10xf32> affine.for %arg1 = 0 to 10 { %4 = affine.load %0[%arg0, %arg1] : memref<512x10xf32> %5 = affine.apply #map0(%arg0) %6 = affine.load %0[%5, %arg1] : memref<512x10xf32> } dealloc %0 : memref<512x10xf32> %3 = affine.apply #map0(%arg0) dealloc %0 : memref<512x10xf32> ``` instead of ``` module { affine.for %arg0 = 0 to 2048 step 2 { %0 = alloc() : memref<512x10xf32> %1 = affine.apply #map0(%arg0) %2 = alloc() : memref<512x10xf32> affine.for %arg1 = 0 to 10 { %4 = affine.load %0[%arg0, %arg1] : memref<512x10xf32> %5 = affine.apply #map0(%arg0) %6 = affine.load %2[%5, %arg1] : memref<512x10xf32> } dealloc %0 : memref<512x10xf32> %3 = affine.apply #map0(%arg0) dealloc %2 : memref<512x10xf32> } ``` Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Closes tensorflow/mlir#98 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/98 from bondhugula:ujam ddbc853f69b5608b3e8ff9b5ac1f6a5a0bb315a4 PiperOrigin-RevId: 266073460
*	Refactor / improve replaceAllMemRefUsesWith	Uday Bondhugula	2019-08-27	3	-173/+214
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor replaceAllMemRefUsesWith to split it into two methods: the new method does the replacement on a single op, and is used by the existing one. - make the methods return LogicalResult instead of bool - Earlier, when replacement failed (due to non-deferencing uses of the memref), the set of ops that had already been processed would have been replaced leaving the IR in an inconsistent state. Now, a pass is made over all ops to first check for non-deferencing uses, and then replacement is performed. No test cases were affected because all clients of this method were first checking for non-deferencing uses before calling this method (for other reasons). This isn't true for a use case in another upcoming PR (scalar replacement); clients can now bail out with consistent IR on failure of replaceAllMemRefUsesWith. Add test case. - multiple deferencing uses of the same memref in a single op is possible (we have no such use cases/scenarios), and this has always remained unsupported. Add an assertion for this. - minor fix to another test pipeline-data-transfer case. Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Closes tensorflow/mlir#87 PiperOrigin-RevId: 265808183
*	NFC: Remove the explicit context from Operation::create and OperationState.	River Riddle	2019-08-26	1	-1/+1
\| \| \| \| \| \|	The context can easily be recovered from the Location in these situations. PiperOrigin-RevId: 265578574
*	Support folding of ops with inner ops in GreedyPatternRewriteDriver.	Andy Ly	2019-08-26	1	-8/+7
\| \| \| \| \| \|	This fixes a bug when folding ops with inner ops and inner ops are still being visited. PiperOrigin-RevId: 265475780
*	NFC: Add a note to 'applyPatternsGreedily' that it also performs folding/dce.	River Riddle	2019-08-23	1	-3/+2
\| \| \| \| \| \|	Fixes tensorflow/mlir#72 PiperOrigin-RevId: 265097597
*	NFC: Move AffineOps dialect to the Dialect sub-directory.	River Riddle	2019-08-20	14	-14/+14
\| \| \| \|	PiperOrigin-RevId: 264482571
*	Move Linalg and VectorOps dialects to the Dialect subdir - NFC	Nicolas Vasilache	2019-08-19	3	-3/+3
\| \| \| \|	PiperOrigin-RevId: 264277760
*	NFC: Move LLVMIR, SDBM, and StandardOps to the Dialect/ directory.	River Riddle	2019-08-19	15	-15/+15
\| \| \| \|	PiperOrigin-RevId: 264193915
*	Change from llvm::make_unique to std::make_unique	Jacques Pienaar	2019-08-17	18	-23/+23
\| \| \| \| \| \| \| \|	Switch to C++14 standard method as llvm::make_unique has been removed ( https://reviews.llvm.org/D66259). Also mark some targets as c++14 to ease next integrates. PiperOrigin-RevId: 263953918
*	Refactor DialectConversion to convert the signatures of blocks when they are ↵	River Riddle	2019-08-16	1	-33/+41
\| \| \| \| \| \| \| \|	moved. Often we want to ensure that block arguments are converted before operations that use them. This refactors the current implementation to be cleaner/less frequent by triggering conversion when a set of blocks are moved/inlined; or when legalization is successful. PiperOrigin-RevId: 263795005
*	Express ownership transfer in PassManager API through std::unique_ptr (NFC)	Mehdi Amini	2019-08-12	17	-37/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since raw pointers are always passed around for IR construct without implying any ownership transfer, it can be error prone to have implicit ownership transferred the same way. For example this code can seem harmless: Pass *pass = .... pm.addPass(pass); pm.addPass(pass); pm.run(module); PiperOrigin-RevId: 263053082
*	NFC: Update pattern rewrite API to pass OwningRewritePatternList by const ↵	River Riddle	2019-08-11	2	-31/+30
\| \| \| \| \| \| \| \|	reference. The pattern list is not modified by any of these APIs and should thus be passed with const. PiperOrigin-RevId: 262844002
*	NFC: Standardize the terminology used for parent ops/regions/etc.	River Riddle	2019-08-09	3	-8/+8
\| \| \| \| \| \|	There are currently several different terms used to refer to a parent IR unit in 'get' methods: getParent/getEnclosing/getContaining. This cl standardizes all of these methods to use 'getParent*'. PiperOrigin-RevId: 262680287
*	NFC: Update usages of OwningRewritePatternList to pass by & instead of &&.	River Riddle	2019-08-09	5	-17/+16
\| \| \| \| \| \|	This will allow for reusing the same pattern list, which may be costly to continually reconstruct, on multiple invocations. PiperOrigin-RevId: 262664599
*	Add a higher-order vector.extractelement operation in MLIR	Nicolas Vasilache	2019-08-09	3	-6/+11
\| \| \| \| \| \| \| \|	This CL is step 2/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools. This CL adds the vector.extractelement operation to the MLIR vector dialect as well as the appropriate roundtrip test. Lowering to LLVM will occur in the following CL. PiperOrigin-RevId: 262545089
*	Add utility 'replaceAllUsesWith' methods to Operation.	River Riddle	2019-08-07	3	-8/+4
\| \| \| \| \| \|	These methods will allow replacing the uses of results with an existing operation, with the same number of results, or a range of values. This removes a number of hand-rolled result replacement loops and simplifies replacement for operations with multiple results. PiperOrigin-RevId: 262206600
*	Remove ops in regions/blocks from worklist when parent op is being removed ↵	Andy Ly	2019-08-06	1	-0/+4
\| \| \| \| \| \| \| \|	via GreedyPatternRewriteDriver::replaceOp. This fixes a bug where ops inside the parent op are visited even though the parent op has been removed. PiperOrigin-RevId: 261953580
*	Refactor Linalg ops to loop lowering (NFC)	Nicolas Vasilache	2019-08-06	1	-7/+4
\| \| \| \| \| \| \|	This CL modifies the LowerLinalgToLoopsPass to use RewritePattern. This will make it easier to inline Linalg generic functions and regions when emitting to loops in a subsequent CL. PiperOrigin-RevId: 261894120
*	NFC: Implement OwningRewritePatternList as a class instead of a using directive.	River Riddle	2019-08-05	4	-15/+11
\| \| \| \| \| \|	This allows for proper forward declaration, as opposed to leaking the internal implementation via a using directive. This also allows for all pattern building to go through 'insert' methods on the OwningRewritePatternList, replacing uses of 'push_back' and 'RewriteListBuilder'. PiperOrigin-RevId: 261816316
*	Fix clang 5.0 by using type aliases for LLVM DenseSet/Map	Mehdi Amini	2019-08-03	2	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When inlining the declaration for llvm::DenseSet/DenseMap in the mlir namespace from a forward declaration, clang does not take the default for the template parameters if their are declared later. namespace llvm { template<typename Foo> class DenseMap; } namespace mlir { using llvm::DenseMap; } namespace llvm { template<typename Foo = int> class DenseMap {}; } namespace mlir { DenseMap<> map; } PiperOrigin-RevId: 261495612
*	AffineDataCopyGeneration: don't use CL flag values inside the pass	Alex Zinenko	2019-08-02	1	-13/+22
\| \| \| \| \| \| \| \| \| \|	AffineDataCopyGeneration pass relied on command line flags for internal logic in several places, which makes it unusable in a library context (i.e. outside a standalone mlir-opt binary that does the command line parsing). Define configuration flags in the constructor instead, and set them up to command line-based defaults to maintain the original behavior. PiperOrigin-RevId: 261322364
*	Introduce explicit copying optimization by generalizing the DMA generation pass	Uday Bondhugula	2019-08-01	2	-186/+281
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Explicit copying to contiguous buffers is a standard technique to avoid conflict misses and TLB misses, and improve hardware prefetching performance. When done in conjunction with cache tiling, it nearly eliminates all cache conflict and TLB misses, and a single hardware prefetch stream is needed per data tile. - generalize/extend DMA generation pass (renamed data copying pass) to perform either point-wise explicit copies to fast memory buffers or DMAs (depending on a cmd line option). All logic is the same as erstwhile -dma-generate. - -affine-dma-generate is now renamed -affine-data-copy; when -dma flag is provided, DMAs are generated, or else explicit copy loops are generated (point-wise) by default. - point-wise copying could be used for CPUs (or GPUs); some indicative performance numbers with a "C" version of the MLIR when compiled with and without this optimization (about 2x improvement here). With a matmul on 4096^2 matrices on a single core of an Intel Core i7 Skylake i7-8700K with clang 8.0.0: clang -O3: 518s clang -O3 with MLIR tiling (128x128): 24.5s clang -O3 with MLIR tiling + data copying 12.4s (code equivalent to test/Transforms/data-copy.mlir func @matmul) - fix some misleading comments. - change default fast-mem space to 0 (more intuitive now with the default copy generation using point-wise copies instead of DMAs) On a simple 3-d matmul loop nest, code generated with -affine-data-copy: ``` affine.for %arg3 = 0 to 4096 step 128 { affine.for %arg4 = 0 to 4096 step 128 { %0 = affine.apply #map0(%arg3, %arg4) %1 = affine.apply #map1(%arg3, %arg4) %2 = alloc() : memref<128x128xf32, 2> // Copy-in Out matrix. affine.for %arg5 = 0 to 128 { %5 = affine.apply #map2(%arg3, %arg5) affine.for %arg6 = 0 to 128 { %6 = affine.apply #map2(%arg4, %arg6) %7 = load %arg2[%5, %6] : memref<4096x4096xf32> affine.store %7, %2[%arg5, %arg6] : memref<128x128xf32, 2> } } affine.for %arg5 = 0 to 4096 step 128 { %5 = affine.apply #map0(%arg3, %arg5) %6 = affine.apply #map1(%arg3, %arg5) %7 = alloc() : memref<128x128xf32, 2> // Copy-in LHS. affine.for %arg6 = 0 to 128 { %11 = affine.apply #map2(%arg3, %arg6) affine.for %arg7 = 0 to 128 { %12 = affine.apply #map2(%arg5, %arg7) %13 = load %arg0[%11, %12] : memref<4096x4096xf32> affine.store %13, %7[%arg6, %arg7] : memref<128x128xf32, 2> } } %8 = affine.apply #map0(%arg5, %arg4) %9 = affine.apply #map1(%arg5, %arg4) %10 = alloc() : memref<128x128xf32, 2> // Copy-in RHS. affine.for %arg6 = 0 to 128 { %11 = affine.apply #map2(%arg5, %arg6) affine.for %arg7 = 0 to 128 { %12 = affine.apply #map2(%arg4, %arg7) %13 = load %arg1[%11, %12] : memref<4096x4096xf32> affine.store %13, %10[%arg6, %arg7] : memref<128x128xf32, 2> } } // Compute. affine.for %arg6 = #map7(%arg3) to #map8(%arg3) { affine.for %arg7 = #map7(%arg4) to #map8(%arg4) { affine.for %arg8 = #map7(%arg5) to #map8(%arg5) { %11 = affine.load %7[-%arg3 + %arg6, -%arg5 + %arg8] : memref<128x128xf32, 2> %12 = affine.load %10[-%arg5 + %arg8, -%arg4 + %arg7] : memref<128x128xf32, 2> %13 = affine.load %2[-%arg3 + %arg6, -%arg4 + %arg7] : memref<128x128xf32, 2> %14 = mulf %11, %12 : f32 %15 = addf %13, %14 : f32 affine.store %15, %2[-%arg3 + %arg6, -%arg4 + %arg7] : memref<128x128xf32, 2> } } } dealloc %10 : memref<128x128xf32, 2> dealloc %7 : memref<128x128xf32, 2> } %3 = affine.apply #map0(%arg3, %arg4) %4 = affine.apply #map1(%arg3, %arg4) // Copy out result matrix. affine.for %arg5 = 0 to 128 { %5 = affine.apply #map2(%arg3, %arg5) affine.for %arg6 = 0 to 128 { %6 = affine.apply #map2(%arg4, %arg6) %7 = affine.load %2[%arg5, %arg6] : memref<128x128xf32, 2> store %7, %arg2[%5, %6] : memref<4096x4096xf32> } } dealloc %2 : memref<128x128xf32, 2> } } ``` With -affine-data-copy -dma: ``` affine.for %arg3 = 0 to 4096 step 128 { %0 = affine.apply #map3(%arg3) %1 = alloc() : memref<128xf32, 2> %2 = alloc() : memref<1xi32> affine.dma_start %arg2[%arg3], %1[%c0], %2[%c0], %c128_0 : memref<4096xf32>, memref<128xf32, 2>, memref<1xi32> affine.dma_wait %2[%c0], %c128_0 : memref<1xi32> %3 = alloc() : memref<1xi32> affine.for %arg4 = 0 to 4096 step 128 { %5 = affine.apply #map0(%arg3, %arg4) %6 = affine.apply #map1(%arg3, %arg4) %7 = alloc() : memref<128x128xf32, 2> %8 = alloc() : memref<1xi32> affine.dma_start %arg0[%arg3, %arg4], %7[%c0, %c0], %8[%c0], %c16384, %c4096, %c128_2 : memref<4096x4096xf32>, memref<128x128xf32, 2>, memref<1xi32> affine.dma_wait %8[%c0], %c16384 : memref<1xi32> %9 = affine.apply #map3(%arg4) %10 = alloc() : memref<128xf32, 2> %11 = alloc() : memref<1xi32> affine.dma_start %arg1[%arg4], %10[%c0], %11[%c0], %c128_1 : memref<4096xf32>, memref<128xf32, 2>, memref<1xi32> affine.dma_wait %11[%c0], %c128_1 : memref<1xi32> affine.for %arg5 = #map3(%arg3) to #map5(%arg3) { affine.for %arg6 = #map3(%arg4) to #map5(%arg4) { %12 = affine.load %7[-%arg3 + %arg5, -%arg4 + %arg6] : memref<128x128xf32, 2> %13 = affine.load %10[-%arg4 + %arg6] : memref<128xf32, 2> %14 = affine.load %1[-%arg3 + %arg5] : memref<128xf32, 2> %15 = mulf %12, %13 : f32 %16 = addf %14, %15 : f32 affine.store %16, %1[-%arg3 + %arg5] : memref<128xf32, 2> } } dealloc %11 : memref<1xi32> dealloc %10 : memref<128xf32, 2> dealloc %8 : memref<1xi32> dealloc %7 : memref<128x128xf32, 2> } %4 = affine.apply #map3(%arg3) affine.dma_start %1[%c0], %arg2[%arg3], %3[%c0], %c128 : memref<128xf32, 2>, memref<4096xf32>, memref<1xi32> affine.dma_wait %3[%c0], %c128 : memref<1xi32> dealloc %3 : memref<1xi32> dealloc %2 : memref<1xi32> dealloc %1 : memref<128xf32, 2> } ``` Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Closes tensorflow/mlir#50 PiperOrigin-RevId: 261221903
*	Initialize union to avoid -Wmissing-field-initializers warning.	Jacques Pienaar	2019-07-27	1	-2/+2
\| \| \| \| \| \|	Reported by clang-6. PiperOrigin-RevId: 260311814
*	Use "standard" load and stores in LowerVectorTransfers	Nicolas Vasilache	2019-07-26	1	-2/+6
\| \| \| \| \| \| \|	Clipping creates non-affine memory accesses, use std_load and std_store instead of affine_load and affine_store. In the future we may also want a fill with the neutral element rather than clip, this would make the accesses affine if we wanted more analyses and transformations to happen post lowering to pointwise copies. PiperOrigin-RevId: 260110503
*	Add support for an analysis mode to DialectConversion.	River Riddle	2019-07-25	1	-4/+46
\| \| \| \| \| \| \| \|	This mode analyzes which operations are legalizable to the given target if a conversion were to be applied, i.e. no rewrites are ever performed even on success. This mode is useful for device partitioning or other utilities that may want to analyze the effect of conversion to different targets before performing it. The analysis method currently just fills a provided set with the operations that were found to be legalizable. This can be extended in the future to capture more information as necessary. PiperOrigin-RevId: 259987105
*	Refactor LoopParametricTiling as a test pass - NFC	Nicolas Vasilache	2019-07-22	3	-153/+76
\| \| \| \| \| \|	This CL moves LoopParametricTiling into test/lib as a pass for purely testing purposes. PiperOrigin-RevId: 259300264
*	Refactor region type signature conversion to be explicit via patterns.	River Riddle	2019-07-20	1	-185/+174
\| \| \| \| \| \|	This cl enforces that the conversion of the type signatures for regions, and thus their entry blocks, is handled via ConversionPatterns. A new hook 'applySignatureConversion' is added to the ConversionPatternRewriter to perform the desired conversion on a region. This also means that the handling of rewriting the signature of a FuncOp is moved to a pattern. A default implementation is provided via 'mlir::populateFuncOpTypeConversionPattern'. This removes the hacky implicit 'dynamically legal' status of FuncOp that was present previously, and leaves it up to the user to decide when/how to convert the signature of a function. PiperOrigin-RevId: 259161999
*	Refactor stripmineSink for AffineForOp - NFC	Nicolas Vasilache	2019-07-19	1	-34/+9
\| \| \| \| \| \|	More moving less cloning. PiperOrigin-RevId: 258947575
*	Utility function to map a loop on a parametric grid of virtual processors	Nicolas Vasilache	2019-07-19	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL introduces a simple loop utility function which rewrites the bounds and step of a loop so as to become mappable on a regular grid of processors whose identifiers are given by SSA values. A corresponding unit test is added. For example, using CUDA terminology, and assuming a 2-d grid with processorIds = [blockIdx.x, threadIdx.x] and numProcessors = [gridDim.x, blockDim.x], the loop: ``` loop.for %i = %lb to %ub step %step { ... } ``` is rewritten into a version resembling the following pseudo-IR: ``` loop.for %i = %lb + threadIdx.x + blockIdx.x * blockDim.x to %ub step %gridDim.x * blockDim.x { ... } ``` PiperOrigin-RevId: 258945942
*	Uniformize the API for the mlir::tile functions on AffineForOp and loop::ForOp	Nicolas Vasilache	2019-07-19	2	-87/+166
\| \| \| \| \| \| \| \| \| \| \|	This CL adapts the recently introduced parametric tiling to have an API matching the tiling of AffineForOp. The transformation using stripmineSink is more general and produces imperfectly nested loops. Perfect nesting invariants of the tiled version are obtained by selectively applying hoisting of ops to isolate perfectly nested bands. Such hoisting may fail to produce a perfect loop nest in cases where ForOp transitively depend on enclosing induction variables. In such cases, the API provides a LogicalResult return but the SimpleParametricLoopTilingPass does not currently use this result. A new unit test is added with a triangular loop for which the perfect nesting property does not hold. For this example, the old behavior was to produce IR that did not verify (some use was not dominated by its def). PiperOrigin-RevId: 258928309
*	Add support for providing a legality callback for dynamic legality in ↵	River Riddle	2019-07-19	1	-14/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DialectConversion. This allows for providing specific handling for dynamically legal operations/dialects without overriding the general 'isDynamicallyLegal' hook. This also means that a derived ConversionTarget class need not always be defined when some operations are dynamically legal. Example usage: ConversionTarget target(...); target.addDynamicallyLegalOp<ReturnOp>([](ReturnOp op) { return ... }; target.addDynamicallyLegalDialect<StandardOpsDialect>([](Operation *op) { return ... }; PiperOrigin-RevId: 258884753
*	NFC: Expose a ConversionPatternRewriter for use with ConversionPatterns.	River Riddle	2019-07-19	1	-199/+274
\| \| \| \| \| \|	This specific PatternRewriter will allow for exposing hooks in the future that are only useful for the conversion framework, e.g. type conversions. PiperOrigin-RevId: 258818122
*	Refactor the conversion of block argument types in DialectConversion.	River Riddle	2019-07-19	1	-298/+265
\| \| \| \| \| \| \| \|	This cl begins a large refactoring over how signature types are converted in the DialectConversion infrastructure. The signatures of blocks are now converted on-demand when an operation held by that block is being converted. This allows for handling the case where a region is created as part of a pattern, something that wasn't possible previously. This cl also generalizes the region signature conversion used by FuncOp to work on any region of any operation. This generalization allows for removing the 'apply*Conversion' functions that were specific to FuncOp/ModuleOp. The implementation currently uses a new hook on TypeConverter, 'convertRegionSignature', but this should ideally be removed in favor of using Patterns. That depends on adding support to the PatternRewriter used by ConversionPattern to allow applying signature conversions to regions, which should be coming in a followup. PiperOrigin-RevId: 258645733
*	Add support for explicitly marking dialects and operations as illegal.	River Riddle	2019-07-19	1	-16/+36
\| \| \| \| \| \| \| \|	This explicit tag is useful is several ways: ) This simplifies how to mark sub sections of a dialect as explicitly unsupported, e.g. my target supports all operations in the foo dialect except for these select few. This is useful for partial lowerings between dialects. ) Partial conversions will now verify that operations that were explicitly marked as illegal must be converted. This provides some guarantee that the operations that need to be lowered by a specific pass will be. PiperOrigin-RevId: 258582879
*	Move affine.for and affine.if to ODS	Nicolas Vasilache	2019-07-16	3	-19/+18
\| \| \| \| \| \|	As the move to ODS is made, body and region names across affine and loop dialects are uniformized. PiperOrigin-RevId: 258416590
*	Refactor DialectConversion to support different conversion modes.	River Riddle	2019-07-16	2	-60/+168
\| \| \| \| \| \| \| \| \| \| \|	Users generally want several different modes of conversion. This cl refactors DialectConversion to provide two: * Partial (applyPartialConversion) - This mode allows for illegal operations to exist in the IR, and does not fail if an operation fails to be legalized. * Full (applyFullConversion) - This mode fails if any operation is not properly legalized to the conversion target. This allows for ensuring that the IR after a conversion only contains operations legal for the target. PiperOrigin-RevId: 258412243
*	Remove lowerAffineConstructs and lowerControlFlow in favor of providing ↵	River Riddle	2019-07-16	1	-8/+12
\| \| \| \| \| \| \| \|	patterns. These methods don't compose well with the rest of conversion framework, and create artificial breaks in conversion. Replace these methods with two(populateAffineToStdConversionPatterns and populateLoopToStdConversionPatterns respectively) that populate a list of patterns to perform the same behavior. PiperOrigin-RevId: 258219277
*	Update 'applyPatternsGreedily' to work on the regions of any operations.	River Riddle	2019-07-16	1	-20/+24
\| \| \| \| \| \|	'applyPatternsGreedily' is a useful utility outside of just function regions. PiperOrigin-RevId: 258182937
*	Refactor the traversal of operations to Convert in DialectConversion.	River Riddle	2019-07-16	1	-99/+79
\| \| \| \| \| \| \| \|	This cl changes the way that operations/blocks to convert are collected/traversed so that parent region operations can be legalized before their bodies. Most RewritePatterns for region operations assume that the entry arguments to each region are yet to be converted. Given that the bodies are currently converted first, this makes it difficult to fit these patterns into the same run as one converting types. The operations/blocks to convert are now collected before any legalization has run, which simplifies the conversion logic itself, as legalization may insert new operations, move blocks, etc. PiperOrigin-RevId: 258170158
*	Refactor LowerAffine to use OpRewritePattern instead of ConversionPattern.	River Riddle	2019-07-16	1	-97/+70
\| \| \| \| \| \|	ConversionPattern should ideally only be used when the types of the operands are changing, which in this case they aren't. Using OpRewritePattern also lends to much simpler code. PiperOrigin-RevId: 258158474
*	Introduce loop coalescing utility and a simple pass	Alex Zinenko	2019-07-16	3	-0/+250
\| \| \| \| \| \| \| \| \| \| \|	Multiple (perfectly) nested loops with independent bounds can be combined into a single loop and than subdivided into blocks of arbitrary size for load balancing or more efficient parallelism exploitation. However, MLIR wants to preserve the multi-dimensional multi-loop structure at higher levels of abstraction. Introduce a transformation that coalesces nested loops with independent bounds so that they can be further subdivided by tiling. PiperOrigin-RevId: 258151016
*	Extract std.for std.if and std.terminator in their own dialect	Nicolas Vasilache	2019-07-16	5	-15/+21
\| \| \| \| \| \| \|	These ops should not belong to the std dialect. This CL extracts them in their own dialect and updates the corresponding conversions and tests. PiperOrigin-RevId: 258123853
*	Fix a bug in DialectConversion when using RewritePattern.	River Riddle	2019-07-16	1	-1/+2
\| \| \| \| \| \|	When using a RewritePattern and replacing an operation with an existing value, that value may have already been replaced by something else. This cl ensures that only the final value is used when applying rewrites. PiperOrigin-RevId: 258058488
*	NFC: Add header blocks to DialectConversion.h to improve readability.	River Riddle	2019-07-13	1	-3/+3
\| \| \| \|	PiperOrigin-RevId: 257903383