bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	FlatAffineConstraints API cleanup; add normalizeConstraintsByGCD().	Uday Bondhugula	2019-03-29	2	-33/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- add method normalizeConstraintsByGCD - call normalizeConstraintsByGCD() and GCDTightenInequalities() at the end of projectOut. - remove call to GCDTightenInequalities() from getMemRefRegion - change isEmpty() to check isEmptyByGCDTest() / hasInvalidConstraint() each time an identifier is eliminated (to detect emptiness early). - make FourierMotzkinEliminate, gaussianEliminateId(s), GCDTightenInequalities() private - improve / update stale comments PiperOrigin-RevId: 224866741
*	Generate strided DMAs from -dma-generate	Uday Bondhugula	2019-03-29	2	-28/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- generate DMAs correctly now using strided DMAs where needed - add support for multi-level/nested strides; op still supports one level of stride for now. Other things - add test case for symbolic lower/upper bound; cases where the DMA buffer size can't be bounded by a known constant - add test case for dynamic shapes where the DMA buffers are however bounded by constants - refactor some of the '-dma-generate' code PiperOrigin-RevId: 224584529
*	[MLIR] Add composeWithUnboundedMap	Nicolas Vasilache	2019-03-29	1	-8/+9
\| \| \| \| \| \| \| \|	This CL adds a finer grain composition function between AffineExpr and an unbounded map. This will be used in the next CL. Also cleans up some comments remaining from a previous CL. PiperOrigin-RevId: 224536314
*	Return bool from all emitError methods similar to Operation::emitOpError	Smit Hinsu	2019-03-29	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	This simplifies call-sites returning true after emitting an error. After the conversion, dropped braces around single statement blocks as that seems more common. Also, switched to emitError method instead of emitting Error kind using the emitDiagnostic method. TESTED with existing unit tests PiperOrigin-RevId: 224527868
*	[MLIR] Drop assert for NYI in VectorAnalysis	Nicolas Vasilache	2019-03-29	1	-6/+7
\| \| \| \| \| \| \|	This CLs adds proper error emission, removes NYI assertions and documents assumptions that are required in the relevant functions. PiperOrigin-RevId: 224377143
*	[MLIR] Drop unnecessary mention of NYI.	Nicolas Vasilache	2019-03-29	1	-2/+3
\| \| \| \| \| \| \|	This CL also documents the `substExpr` helper function assumptions. The assumptions are properly propagated up already. PiperOrigin-RevId: 224377072
*	[MLIR] Remove NYI assertions in LoopAnalysis.cpp	Nicolas Vasilache	2019-03-29	1	-9/+18
\| \| \| \| \| \| \|	This CL also cleans up some loose ends and returns conservative answers while emitting errors in the NYI cases. PiperOrigin-RevId: 224377004
*	[MLIR] Add AffineMap composition and use it in Materialization	Nicolas Vasilache	2019-03-29	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds the following free functions: ``` /// Returns the AffineExpr e o m. AffineExpr compose(AffineExpr e, AffineMap m); /// Returns the AffineExpr f o g. AffineMap compose(AffineMap f, AffineMap g); ``` This addresses the issue that AffineMap composition is only available at a distance via AffineValueMap and is thus unusable on Attributes. This CL thus implements AffineMap composition in a more modular and composable way. This CL does not claim that it can be a good replacement for the implementation in AffineValueMap, in particular it does not support bounded maps atm. Standalone tests are added that replicate some of the logic of the AffineMap composition pass. Lastly, affine map composition is used properly inside MaterializeVectors and a standalone test is added that requires permutation_map composition with a projection map. PiperOrigin-RevId: 224376870
*	[MLIR] Add support for permutation_map	Nicolas Vasilache	2019-03-29	2	-49/+165
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL hooks up and uses permutation_map in vector_transfer ops. In particular, when going into the nuts and bolts of the implementation, it became clear that cases arose that required supporting broadcast semantics. Broadcast semantics are thus added to the general permutation_map. The verify methods and tests are updated accordingly. Examples of interest include. Example 1: The following MLIR snippet: ```mlir for %i3 = 0 to %M { for %i4 = 0 to %N { for %i5 = 0 to %P { %a5 = load %A[%i4, %i5, %i3] : memref<?x?x?xf32> }}} ``` may vectorize with {permutation_map: (d0, d1, d2) -> (d2, d1)} into: ```mlir for %i3 = 0 to %0 step 32 { for %i4 = 0 to %1 { for %i5 = 0 to %2 step 256 { %4 = vector_transfer_read %arg0, %i4, %i5, %i3 {permutation_map: (d0, d1, d2) -> (d2, d1)} : (memref<?x?x?xf32>, index, index) -> vector<32x256xf32> }}} ```` Meaning that vector_transfer_read will be responsible for reading the 2-D slice: `%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>. This will require a transposition when vector_transfer_read is further lowered. Example 2: The following MLIR snippet: ```mlir %cst0 = constant 0 : index for %i0 = 0 to %M { %a0 = load %A[%cst0, %cst0] : memref<?x?xf32> } ``` may vectorize with {permutation_map: (d0) -> (0)} into: ```mlir for %i0 = 0 to %0 step 128 { %3 = vector_transfer_read %arg0, %c0_0, %c0_0 {permutation_map: (d0, d1) -> (0)} : (memref<?x?xf32>, index, index) -> vector<128xf32> } ```` Meaning that vector_transfer_read will be responsible of reading the 0-D slice `%arg0[%c0, %c0]` into vector<128xf32>. This will require a 1-D vector broadcast when vector_transfer_read is further lowered. Additionally, some minor cleanups and refactorings are performed. One notable thing missing here is the composition with a projection map during materialization. This is because I could not find an AffineMap composition that operates on AffineMap directly: everything related to composition seems to require going through SSAValue and only operates on AffinMap at a distance via AffineValueMap. I have raised this concern a bunch of times already, the followup CL will actually do something about it. In the meantime, the projection is hacked at a minimum to pass verification and materialiation tests are temporarily incorrect. PiperOrigin-RevId: 224376828
*	Fix cases where unsigned / signed arithmetic was being mixed (following up on	Uday Bondhugula	2019-03-29	2	-22/+23
\| \| \| \| \| \| \| \| \|	cl/224246657); eliminate repeated evaluation of exprs in loop upper bounds. - while on this, sweep through and fix potential repeated evaluation of expressions in loop upper bounds PiperOrigin-RevId: 224268918
*	Fix bug in GCD calculation when flattening AffineExpr (adds unit test which ↵	MLIR Team	2019-03-29	1	-1/+1
\| \| \| \| \| \|	triggers the bug and tests the fix). PiperOrigin-RevId: 224246657
*	Complete multiple unhandled cases for DmaGeneration / getMemRefRegion;	Uday Bondhugula	2019-03-29	3	-48/+189
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	update/improve/clean up API. - update FlatAffineConstraints::getConstBoundDifference; return constant differences between symbolic affine expressions, look at equalities as well. - fix buffer size computation when generating DMAs symbolic in outer loops, correctly handle symbols at various places (affine access maps, loop bounds, loop IVs outer to the depth at which DMA generation is being done) - bug fixes / complete some TODOs for getMemRefRegion - refactor common code b/w memref dependence check and getMemRefRegion - FlatAffineConstraints API update; added methods employ trivial checks / detection - sufficient to handle hyper-rectangular cases in a precise way while being fast / low complexity. Hyper-rectangular cases fall out as trivial cases for these methods while other cases still do not cause failure (either return conservative or return failure that is handled by the caller). PiperOrigin-RevId: 224229879
*	Remove duplicate FlatAffineConstraints::removeId - refactor to use	Uday Bondhugula	2019-03-29	1	-34/+14
\| \| \| \| \| \| \| \| \| \| \|	removeColumnRange - remove functionally duplicate code in removeId. - rename removeColumnRange -> removeIdRange - restrict valid input to just the identifier columns (not the constant term column). PiperOrigin-RevId: 224054064
*	FlatAffineConstraints::removeId() fix.	Uday Bondhugula	2019-03-29	1	-15/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an obvious bug, but none of the test cases exposed it since numIds was correctly updated, and the dimensional identifiers were always eliminated before the symbolic identifiers in all cases that removeId was getting called from. However, other work in progress exercises the other scenarios and exposes this bug. Add an hasConsistentState() private method to move common assertion checks, and call it from several base methods. Make hasInvalidConstraint() a private method as well (from a file static one). PiperOrigin-RevId: 224032721
*	During forward substitution, merge symbols from input AffineMap with the ↵	MLIR Team	2019-03-29	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \|	symbol list of the target AffineMap. Symbols can be used as dim identifiers and symbolic identifiers, and so we must preserve the symbolic identifies from the input AffineMap during forward substitution, even if that same identifier is used as a dimension identifier in the target AffineMap. Test case added. Going forward, we may want to explore solutions where we do not maintain this split between dimensions and symbols, and instead verify the validity of each use of each AffineMap operand AffineMap in the context where the AffineMap operand usage is required to be a symbol: in the denominator of floordiv/ceildiv/mod for semi-affine maps, and in instructions that can capture symbols (i.e. alloc) PiperOrigin-RevId: 224017364
*	[MLIR] Add VectorTransferOps	Nicolas Vasilache	2019-03-29	2	-17/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL implements and uses VectorTransferOps in lieu of the former custom call op. Tests are updated accordingly. VectorTransferOps come in 2 flavors: VectorTransferReadOp and VectorTransferWriteOp. VectorTransferOps can be thought of as a backend-independent pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before it can be lowered to backend-dependent IR. Note that the current implementation does not yet support a real permutation map. Proper support will come in a followup CL. VectorTransferReadOp ==================== VectorTransferReadOp performs a blocking read from a scalar memref location into a super-vector of the same elemental type. This operation is called 'read' by opposition to 'load' because the super-vector granularity is generally not representable with a single hardware register. As a consequence, memory transfers will generally be required when lowering VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile only code. A vector transfer read has semantics similar to a vector load, with additional support for: 1. an optional value of the elemental type of the MemRef. This value supports non-effecting padding and is inserted in places where the vector read exceeds the MemRef bounds. If the value is not specified, the access is statically guaranteed to be within bounds; 2. an attribute of type AffineMap to specify a slice of the original MemRef access and its transposition into the super-vector shape. The permutation_map is an unbounded AffineMap that must represent a permutation from the MemRef dim space projected onto the vector dim space. Example: ```mlir %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32> ... %val = `ssa-value` : f32 // let %i, %j, %k, %l be ssa-values of type index %v0 = vector_transfer_read %src, %i, %j, %k, %l {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (memref<?x?x?x?xf32>, index, index, index, index) -> vector<16x32x64xf32> %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (memref<?x?x?x?xf32>, index, index, index, index, f32) -> vector<16x32x64xf32> ``` VectorTransferWriteOp ===================== VectorTransferWriteOp performs a blocking write from a super-vector to a scalar memref of the same elemental type. This operation is called 'write' by opposition to 'store' because the super-vector granularity is generally not representable with a single hardware register. As a consequence, memory transfers will generally be required when lowering VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile only code. A vector transfer write has semantics similar to a vector store, with additional support for handling out-of-bounds situations. Example: ```mlir %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>. %val = `ssa-value` : vector<16x32x64xf32> // let %i, %j, %k, %l be ssa-values of type index vector_transfer_write %val, %src, %i, %j, %k, %l {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index) ``` PiperOrigin-RevId: 223873234
*	FlatAffineConstraints::composeMap: return failure instead of asserting on ↵	Uday Bondhugula	2019-03-29	2	-4/+11
\| \| \| \| \| \| \| \| \|	semi-affine maps FlatAffineConstraints::composeMap: should return false instead of asserting on a semi-affine map. Make getMemRefRegion just propagate false when encountering semi-affine maps (instead of crashing!) PiperOrigin-RevId: 223828743
*	Debug output / logging memref sizes in DMA generation + related changes	Uday Bondhugula	2019-03-29	1	-0/+12
\| \| \| \| \| \| \|	- Add method to get a memref's size in bytes - clean up a loop tiling pass helper (NFC) PiperOrigin-RevId: 223422077
*	[MLIR][MaterializeVectors] Add a MaterializeVector pass via unrolling.	Nicolas Vasilache	2019-03-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds an MLIR-MLIR pass which materializes super-vectors to hardware-dependent sized vectors. While the physical vector size is target-dependent, the pass is written in a target-independent way: the target vector size is specified as a parameter to the pass. This pass is thus a partial lowering that opens the "greybox" that is the super-vector abstraction. This first CL adds a first materilization pass iterates over vector_transfer_write operations and: 1. computes the program slice including the current vector_transfer_write; 2. computes the multi-dimensional ratio of super-vector shape to hardware vector shape; 3. for each possible multi-dimensional value within the bounds of ratio, a new slice is instantiated (i.e. cloned and rewritten) so that all operations in this instance operate on the hardware vector type. As a simple example, given: ```mlir mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> { %A = alloc (%M, %N) : memref<?x?xf32> %B = alloc (%M, %N) : memref<?x?xf32> %C = alloc (%M, %N) : memref<?x?xf32> for %i0 = 0 to %M { for %i1 = 0 to %N { %a1 = load %A[%i0, %i1] : memref<?x?xf32> %b1 = load %B[%i0, %i1] : memref<?x?xf32> %s1 = addf %a1, %b1 : f32 store %s1, %C[%i0, %i1] : memref<?x?xf32> } } return %C : memref<?x?xf32> } ``` and the following options: ``` -vectorize -virtual-vector-size 32 --test-fastest-varying=0 -materialize-vectors -vector-size=8 ``` materialization emits: ```mlir #map0 = (d0, d1) -> (d0, d1) #map1 = (d0, d1) -> (d0, d1 + 8) #map2 = (d0, d1) -> (d0, d1 + 16) #map3 = (d0, d1) -> (d0, d1 + 24) mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> { %0 = alloc(%arg0, %arg1) : memref<?x?xf32> %1 = alloc(%arg0, %arg1) : memref<?x?xf32> %2 = alloc(%arg0, %arg1) : memref<?x?xf32> for %i0 = 0 to %arg0 { for %i1 = 0 to %arg1 step 32 { %3 = affine_apply #map0(%i0, %i1) %4 = "vector_transfer_read"(%0, %3tensorflow/mlir#0, %3tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %5 = affine_apply #map1(%i0, %i1) %6 = "vector_transfer_read"(%0, %5tensorflow/mlir#0, %5tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %7 = affine_apply #map2(%i0, %i1) %8 = "vector_transfer_read"(%0, %7tensorflow/mlir#0, %7tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %9 = affine_apply #map3(%i0, %i1) %10 = "vector_transfer_read"(%0, %9tensorflow/mlir#0, %9tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %11 = affine_apply #map0(%i0, %i1) %12 = "vector_transfer_read"(%1, %11tensorflow/mlir#0, %11tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %13 = affine_apply #map1(%i0, %i1) %14 = "vector_transfer_read"(%1, %13tensorflow/mlir#0, %13tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %15 = affine_apply #map2(%i0, %i1) %16 = "vector_transfer_read"(%1, %15tensorflow/mlir#0, %15tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %17 = affine_apply #map3(%i0, %i1) %18 = "vector_transfer_read"(%1, %17tensorflow/mlir#0, %17tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %19 = addf %4, %12 : vector<8xf32> %20 = addf %6, %14 : vector<8xf32> %21 = addf %8, %16 : vector<8xf32> %22 = addf %10, %18 : vector<8xf32> %23 = affine_apply #map0(%i0, %i1) "vector_transfer_write"(%19, %2, %23tensorflow/mlir#0, %23tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () %24 = affine_apply #map1(%i0, %i1) "vector_transfer_write"(%20, %2, %24tensorflow/mlir#0, %24tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () %25 = affine_apply #map2(%i0, %i1) "vector_transfer_write"(%21, %2, %25tensorflow/mlir#0, %25tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () %26 = affine_apply #map3(%i0, %i1) "vector_transfer_write"(%22, %2, %26tensorflow/mlir#0, %26tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () } } return %2 : memref<?x?xf32> } ``` PiperOrigin-RevId: 222455351
*	[MLIR][Slicing] Apply cleanups	Nicolas Vasilache	2019-03-29	1	-87/+43
\| \| \| \| \| \| \|	This CL applies a few last cleanups from a previous CL that have been missed during the previous submit. PiperOrigin-RevId: 222454774
*	[MLIR][Slicing] Add utils for computing slices.	Nicolas Vasilache	2019-03-29	2	-27/+258
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds tooling for computing slices as an independent CL. The first consumer of this analysis will be super-vector materialization in a followup CL. In particular, this adds: 1. a getForwardStaticSlice function with documentation, example and a standalone unit test; 2. a getBackwardStaticSlice function with documentation, example and a standalone unit test; 3. a getStaticSlice function with documentation, example and a standalone unit test; 4. a topologicalSort function that is exercised through the getStaticSlice unit test. The getXXXStaticSlice functions take an additional root (resp. terminators) parameter which acts as a boundary that the transitive propagation algorithm is not allowed to cross. PiperOrigin-RevId: 222446208
*	Fix bugs in DMA generation and FlatAffineConstraints; add more test	Uday Bondhugula	2019-03-29	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cases. - fix bug in calculating index expressions for DMA buffers in certain cases (affected tiled loop nests); add more test cases for better coverage. - introduce an additional optional argument to replaceAllMemRefUsesWith; additional operands to the index remap AffineMap can now be supplied by the client. - FlatAffineConstraints::addBoundsForStmt - fix off by one upper bound, ::composeMap - fix position bug. - Some clean up and more comments PiperOrigin-RevId: 222434628
*	[MLIR][VectorAnalysis] Add a VectorAnalysis and standalone tests	Nicolas Vasilache	2019-03-29	2	-3/+168
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds some vector support in prevision of the upcoming vector materialization pass. In particular this CL adds 2 functions to: 1. compute the multiplicity of a subvector shape in a supervector shape; 2. help match operations on strict super-vectors. This is defined for a given subvector shape as an operation that manipulates a vector type that is an integral multiple of the subtype, with multiplicity at least 2. This CL also adds a TestUtil pass where we can dump arbitrary testing of functions and analysis that operate at a much smaller granularity than a pass (e.g. an analysis for which it is convenient to write a bit of artificial MLIR and write some custom test). This is in order to keep using Filecheck for things that essentially look and feel like C++ unit tests. PiperOrigin-RevId: 222250910
*	Updates to transformation/analysis passes/utilities. Update DMA generation pass	Uday Bondhugula	2019-03-29	4	-81/+384
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and getMemRefRegion() to work with specified loop depths; add support for outgoing DMAs, store op's. - add support for getMemRefRegion symbolic in outer loops - hence support for DMAs symbolic in outer surrounding loops. - add DMA generation support for outgoing DMAs (store op's to lower memory space); extend getMemoryRegion to store op's. -memref-bound-check now works with store op's as well. - fix dma-generate (references to the old memref in the dma_start op were also being replaced with the new buffer); we need replace all memref uses to work only on a subset of the uses - add a new optional argument for replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional 'operation' argument to serve as a filter - if provided, only those uses that are dominated by the filter are replaced. - Add missing print for attributes for dma_start, dma_wait op's. - update the FlatAffineConstraints API PiperOrigin-RevId: 221889223
*	Merge OperationInst functionality into Instruction.	River Riddle	2019-03-29	1	-6/+4
\| \| \| \| \| \|	We do some limited renaming here but define an alias for OperationInst so that a follow up cl can solely perform the large scale renaming. PiperOrigin-RevId: 221726963
*	[MLIR] Merge terminator and uses into BasicBlock operations list handling.	River Riddle	2019-03-29	1	-17/+1
\| \| \| \|	PiperOrigin-RevId: 221700132
*	Replace TerminatorInst with builtin terminator operations.	River Riddle	2019-03-29	2	-108/+9
\| \| \| \| \|	Note: Terminators will be merged into the operations list in a follow up patch. PiperOrigin-RevId: 221670037
*	[MLIR] Support for vectorizing operations.	Nicolas Vasilache	2019-03-29	2	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds support for and a vectorization test to perform scalar 2-D addf. The support extension notably comprises: 1. extend vectorizable test to exclude vector_transfer operations and expose them to LoopAnalysis where they are needed. This is a temporary solution a concrete MLIR Op exists; 2. add some more functional sugar mapKeys, apply and ScopeGuard (which became relevant again); 3. fix improper shifting during coarsening; 4. rename unaligned load/store to vector_transfer_read/write and simplify the design removing the unnecessary AllocOp that were introduced prematurely: vector_transfer_read currently has the form: (memref<?x?x?xf32>, index, index, index) -> vector<32x64x256xf32> vector_transfer_write currently has the form: (vector<32x64x256xf32>, memref<?x?x?xf32>, index, index, index) -> () 5. adds vectorizeOperations which traverses the operations in a ForStmt and rewrites them to their vector form; 6. add support for vector splat from a constant. The relevant tests are also updated. PiperOrigin-RevId: 221421426
*	Adds support for returning the direction of the dependence between memref ↵	MLIR Team	2019-03-29	2	-181/+472
\| \| \| \| \| \| \| \| \| \|	accesses (distance/direction vectors). Updates MemRefDependenceCheck to check and report on all memref access pairs at all loop nest depths. Updates old and adds new memref dependence check tests. Resolves multiple TODOs. PiperOrigin-RevId: 220816515
*	Automatic DMA generation for simple cases.	Uday Bondhugula	2019-03-29	3	-64/+69
\| \| \| \| \| \| \| \| \| \| \| \| \|	- constant bounded memory regions, static shapes, no handling of overlapping/duplicate regions (through union) for now; also only, load memory op's. - add build methods for DmaStartOp, DmaWaitOp. - move getMemoryRegion() into Analysis/Utils and expose it. - fix addIndexSet, getMemoryRegion() post switch to exclusive upper bounds; update test cases for memref-bound-check and memref-dependence-check for exclusive bounds (missed in a previous CL) PiperOrigin-RevId: 220729810
*	Add lookupPassInfo to enable querying the pass info for a pass.	Jacques Pienaar	2019-03-29	1	-0/+8
\| \| \| \| \| \|	The short term use would be in querying the pass name when reporting errors. PiperOrigin-RevId: 220665532
*	Bug fixes in FlatAffineConstraints. Tests cases that discovered these in ↵	MLIR Team	2019-03-29	1	-4/+3
\| \| \| \| \| \|	follow up CL on memref dependence checks. PiperOrigin-RevId: 220632386
*	Initialize Pass with PassID.	Jacques Pienaar	2019-03-29	2	-2/+3
\| \| \| \| \| \|	The passID is not currently stored in Pass but this avoids the unused variable warning. The passID is used to uniquely identify passes, currently this is only stored/used in PassInfo. PiperOrigin-RevId: 220485662
*	[MLIR] Make upper bound implementation exclusive	Nicolas Vasilache	2019-03-29	1	-4/+4
\| \| \| \| \| \| \|	This CL implement exclusive upper bound behavior as per b/116854378. A followup CL will update the semantics of the for loop. PiperOrigin-RevId: 220448963
*	Add static pass registration	Jacques Pienaar	2019-03-29	3	-0/+52
\| \| \| \| \| \| \| \|	Add static pass registration and change mlir-opt to use it. Future work is needed to refactor the registration for PassManager usage. Change build targets to alwayslink to enforce registration. PiperOrigin-RevId: 220390178
*	Introduce loop tiling code generation (hyper-rectangular case)	Uday Bondhugula	2019-03-29	1	-2/+29
\| \| \| \| \| \| \| \| \| \| \|	- simple perfectly nested band tiling with fixed tile sizes. - only the hyper-rectangular case is handled, with other limitations of getIndexSet applying (constant loop bounds, etc.); once the latter utility is extended, tiled code generation should become more general. - Add FlatAffineConstraints::isHyperRectangular() PiperOrigin-RevId: 220324933
*	Clean up memref dep check utilities; update FlatAffineConstraints API, add	Uday Bondhugula	2019-03-29	2	-29/+94
\| \| \| \| \| \| \| \| \| \|	simple utility methods. - clean up some of the analysis utilities used by memref dep checking - add additional asserts / comments at places in analysis utilities - add additional simple methods to the FlatAffineConstraints API. PiperOrigin-RevId: 220124523
*	Adds MemRefDependenceCheck analysis pass, plus multiple dependence check tests.	MLIR Team	2019-03-29	2	-30/+209
\| \| \| \| \| \|	Adds equality constraints to dependence constraint system for accesses using dims/symbols where the defining operation of the dim/symbol is a constant. PiperOrigin-RevId: 219814740
*	Complete memref bound checker for arbitrary affine expressions. Handle local	Uday Bondhugula	2019-03-29	3	-65/+280
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	variables from mod's and div's when converting to flat form. - propagate mod, floordiv, ceildiv / local variables constraint information when flattening affine expressions and converting them into flat affine constraints; resolve multiple TODOs. - enables memref bound checker to work with arbitrary affine expressions - update FlatAffineConstraints API with several new methods - test/exercise functionality mostly through -memref-bound-check - other analyses such as dependence tests, etc. should now be able to work in the presence of any affine composition of add, mul, floor, ceil, mod. PiperOrigin-RevId: 219711806
*	Adds a dependence check to test whether two accesses to the same memref ↵	MLIR Team	2019-03-29	3	-17/+497
\| \| \| \| \| \| \| \| \| \| \| \|	access the same element. - Builds access functions and iterations domains for each access. - Builds dependence polyhedron constraint system which has equality constraints for equated access functions and inequality constraints for iteration domain loop bounds. - Runs elimination on the dependence polyhedron to test if no dependence exists between the accesses. - Adds a trivial LoopFusion transformation pass with a simple test policy to test dependence between accesses to the same memref in adjacent loops. - The LoopFusion pass will be extended in subsequent CLs. PiperOrigin-RevId: 219630898
*	[MLIR] Extend vectorization to 2+-D patterns	Nicolas Vasilache	2019-03-29	1	-8/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds support for vectorization using more interesting 2-D and 3-D patterns. Note in particular the fact that we match some pretty complex imperfectly nested 2-D patterns with a quite minimal change to the implementation: we just add a bit of recursion to traverse the matched patterns and actually vectorize the loops. For instance, vectorizing the following loop by 128: ``` for %i3 = 0 to %0 { %7 = affine_apply (d0) -> (d0)(%i3) %8 = load %arg0[%c0_0, %7] : memref<?x?xf32> } ``` Currently generates: ``` #map0 = ()[s0] -> (s0 + 127) #map1 = (d0) -> (d0) for %i3 = 0 to #map0()[%0] step 128 { %9 = affine_apply #map1(%i3) %10 = alloc() : memref<1xvector<128xf32>> %11 = "n_d_unaligned_load"(%arg0, %c0_0, %9, %10, %c0) : (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) -> (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) %12 = load %10[%c0] : memref<1xvector<128xf32>> } ``` The above is subject to evolution. PiperOrigin-RevId: 219629745
*	Introduce memref bound checking.	Uday Bondhugula	2019-03-29	4	-2/+246
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce analysis to check memref accesses (in MLFunctions) for out of bound ones. It works as follows: $ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir /tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1 %y = load %B[%idy] : memref<128 x i32> ^ /tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1 %y = load %B[%idy] : memref<128 x i32> ^ #map0 = (d0, d1) -> (d0, d1) #map1 = (d0, d1) -> (d0 * 128 - d1) mlfunc @test() { %0 = alloc() : memref<9x9xi32> %1 = alloc() : memref<128xi32> for %i0 = -1 to 9 { for %i1 = -1 to 9 { %2 = affine_apply #map0(%i0, %i1) %3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32> %4 = affine_apply #map1(%i0, %i1) %5 = load %1[%4] : memref<128xi32> } } return } - Improves productivity while manually / semi-automatically developing MLIR for testing / prototyping; also provides an indirect way to catch errors in transformations. - This pass is an easy way to test the underlying affine analysis machinery including low level routines. Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256. While on this: - create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/ - fix a bug in AffineAnalysis.cpp::toAffineExpr TODO: extend to non-constant loop bounds (straightforward). Will transparently work for all accesses once floordiv, mod, ceildiv are supported in the AffineMap -> FlatAffineConstraints conversion. PiperOrigin-RevId: 219397961
*	Implement value type abstraction for types.	River Riddle	2019-03-29	2	-10/+10
\| \| \| \| \| \|	This is done by changing Type to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast. PiperOrigin-RevId: 219372163
*	FlatAffineConstraints API update - additional methods	Uday Bondhugula	2019-03-29	1	-66/+167
\| \| \| \| \| \| \| \| \| \| \| \| \|	- add methods addConstantLowerBound, addConstantUpperBound, setIdToConstant, addDimsForMap - update coefficient storage to use numReservedCols * rows instead of numCols * rows (makes the code simpler/natural; reduces movement of data when new columns are added, eliminates movement of data when columns are added to the end). (addDimsForMap is tested in the child CL on memref bound checking: cl/219000460) PiperOrigin-RevId: 219358376
*	[MLIR] Implement 1-D vectorization for fastest varying load/stores	Nicolas Vasilache	2019-03-29	2	-92/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL is a first in a series that implements early vectorization of increasingly complex patterns. In particular, early vectorization will support arbitrary loop nesting patterns (both perfectly and imperfectly nested), at arbitrary depths in the loop tree. This first CL builds the minimal support for applying 1-D patterns. It relies on an unaligned load/store op abstraction that can be inplemented differently on different HW. Future CLs will support higher dimensional patterns, but 1-D patterns already exhibit interesting properties. In particular, we want to separate pattern matching (i.e. legality both structural and dependency analysis based), from profitability analysis, from application of the transformation. As a consequence patterns may intersect and we need to verify that a pattern can still apply by the time we get to applying it. A non-greedy analysis on profitability that takes into account pattern intersection is left for future work. Additionally the CL makes the following cleanups: 1. the matches method now returns a value, not a reference; 2. added comments about the MLFunctionMatcher and MLFunctionMatches usage by value; 3. added size and empty methods to matches; 4. added a negative vectorization test with a conditional, this exhibited a but in the iterators. Iterators now return nullptr if the underlying storage is nullpt. PiperOrigin-RevId: 219299489
*	Add getMemRefType() accessors to LoadOp/StoreOp.	Uday Bondhugula	2019-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \|	- There are several places where we are casting the type of the memref obtained from the load/store op to a memref type, and this will become even more common (some upcoming CLs this week). Add a getMemRefType and use it at several places where the cast was being used. PiperOrigin-RevId: 219164326
*	FourierMotzkinEliminate trivial bug fix	Uday Bondhugula	2019-03-29	1	-1/+1
\| \| \| \|	PiperOrigin-RevId: 219148982
*	Move AffineMap.h/IntegerSet.h from Attributes.h to AttributeDetail.h where ↵	River Riddle	2019-03-29	1	-1/+1
\| \| \| \| \| \|	they belong. PiperOrigin-RevId: 218806426
*	Run GCD test before elimination. Adds test case with rational solutions, but ↵	MLIR Team	2019-03-29	1	-33/+32
\| \| \| \| \| \|	no integer solutions. PiperOrigin-RevId: 218772332
*	Implement value type abstraction for attributes.	River Riddle	2019-03-29	1	-8/+8
\| \| \| \| \| \|	This is done by changing Attribute to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast. PiperOrigin-RevId: 218764173