bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Begin the process of fully removing OperationInst. This patch cleans up ↵	River Riddle	2019-03-29	1	-26/+19
\| \| \| \| \| \|	references to OperationInst in the /include, /AffineOps, and lib/Analysis. PiperOrigin-RevId: 232199262
*	Fold the functionality of OperationInst into Instruction. OperationInst ↵	River Riddle	2019-03-29	1	-1/+1
\| \| \| \| \| \|	still exists as a forward declaration and will be removed incrementally in a set of followup cleanup patches. PiperOrigin-RevId: 232198540
*	Define the AffineForOp and replace ForInst with it. This patch is largely ↵	River Riddle	2019-03-29	1	-35/+40
\| \| \| \| \| \|	mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body. PiperOrigin-RevId: 232060516
*	Address Performance issue in NestedMatcher	Nicolas Vasilache	2019-03-29	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A performance issue was reported due to the usage of NestedMatcher in ComposeAffineMaps. The main culprit was the ubiquitous copies that were occuring when appending even a single element in `matchOne`. This CL generally simplifies the implementation and removes one level of indirection by getting rid of auxiliary storage as well as simplifying the API. The users of the API are updated accordingly. The implementation was tested on a heavily unrolled example with ComposeAffineMaps and is now close in performance with an implementation based on stateless InstWalker. As a reminder, the whole ComposeAffineMaps pass is slated to disappear but the bug report was very useful as a stress test for NestedMatchers. Lastly, the following cleanups reported by @aminim were addressed: 1. make NestedPatternContext scoped within runFunction rather than at the Pass level. This was caused by a previous misunderstanding of Pass lifetime; 2. use defensive assertions in the constructor of NestedPatternContext to make it clear a unique such locally scoped context is allowed to exist. PiperOrigin-RevId: 231781279
*	Recommit: Define a AffineOps dialect as well as an AffineIfOp operation. ↵	River Riddle	2019-03-29	1	-0/+11
\| \| \| \| \| \|	Replace all instances of IfInst with AffineIfOp and delete IfInst. PiperOrigin-RevId: 231342063
*	Automated rollback of changelist 231318632.	Nicolas Vasilache	2019-03-29	1	-11/+0
\| \| \| \|	PiperOrigin-RevId: 231327161
*	Define a AffineOps dialect as well as an AffineIfOp operation. Replace all ↵	River Riddle	2019-03-29	1	-0/+11
\| \| \| \| \| \|	instances of IfInst with AffineIfOp and delete IfInst. PiperOrigin-RevId: 231318632
*	Change AffineApplyOp to produce a single result, simplifying the code that	Chris Lattner	2019-03-29	1	-11/+1
\| \| \| \| \| \|	works with it, and updating the g3docs. PiperOrigin-RevId: 231120927
*	Change the ForInst induction variable to be a block argument of the body ↵	River Riddle	2019-03-29	1	-3/+5
\| \| \| \| \| \|	instead of the ForInst itself. This is a necessary step in converting ForInst into an operation. PiperOrigin-RevId: 231064139
*	Cleanup resource management and rename recursive matchers	Nicolas Vasilache	2019-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL follows up on a memory leak issue related to SmallVector growth that escapes the BumpPtrAllocator. The fix is to properly use ArrayRef and placement new to define away the issue. The following renaming is also applied: 1. MLFunctionMatcher -> NestedPattern 2. MLFunctionMatches -> NestedMatch As a consequence all allocations are now guaranteed to live on the BumpPtrAllocator. PiperOrigin-RevId: 231047766
*	Standardize naming of statements -> instructions, revisting the code base to be	Chris Lattner	2019-03-29	1	-51/+50
\| \| \| \| \| \| \| \| \|	consistent and moving the using declarations over. Hopefully this is the last truly massive patch in this refactoring. This is step 21/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227178245
*	Rename BasicBlock and StmtBlock to Block, and make a pass cleaning it up. I ↵	Chris Lattner	2019-03-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	did not make an effort to rename all of the 'bb' names in the codebase, since they are still correct and any specific missed once can be fixed up on demand. The last major renaming is Statement -> Instruction, which is why Statement and Stmt still appears in various places. This is step 19/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227163082
*	Rename BBArgument -> BlockArgument, Op::getOperation -> Op::getInst(),	Chris Lattner	2019-03-29	1	-2/+2
\| \| \| \| \| \| \| \|	StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names. This is step 17/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227121537
*	Merge Operation into OperationInst and standardize nomenclature around	Chris Lattner	2019-03-29	1	-7/+7
\| \| \| \| \| \| \| \|	OperationInst. This is a big mechanical patch. This is step 16/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227093712
*	Merge SSAValue, CFGValue, and MLValue together into a single Value class, which	Chris Lattner	2019-03-29	1	-11/+11
\| \| \| \| \| \| \| \| \|	is the new base of the SSA value hierarchy. This CL also standardizes all the nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing. This is step 11/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227064624
*	LoopAnalysis: isContiguousAccess fail gracefully	Alex Zinenko	2019-03-29	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Existing implementation of isContiguousAccess asserts that one of the function arguments is within certain range, depending on another parameter. However, the value of this argument may come from outside, in particular in the loop vectorization pass it may come from command line arguments. This leads to 'mlir-opt' crashing on an assertion depending on flags. Handle the error gracefully by reporting error returning a negative result instead. This negative result prevents any further transformation by the vectorizer so the IR remains valid. PiperOrigin-RevId: 227029496
*	Refactor ForStmt: having it contain a StmtBlock instead of subclassing	Chris Lattner	2019-03-29	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	StmtBlock. This is more consistent with IfStmt and also conceptually makes more sense - a forstmt "isn't" its body, it contains its body. This is step 1/N towards merging BasicBlock and StmtBlock. This is required because in the new regime StmtBlock will have a use list (just like BasicBlock does) of operands, and ForStmt already has a use list for its induction variable. This is a mechanical patch, NFC. PiperOrigin-RevId: 226684158
*	Extract vector_transfer_* Ops into a SuperVectorDialect.	Alex Zinenko	2019-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	From the beginning, vector_transfer_read and vector_transfer_write opreations were intended as a mid-level vectorization abstraction. In particular, they are lowered to the StandardOps dialect before further processing. As such, it does not make sense to keep them at the same level as StandardOps. Introduce the new SuperVectorOps dialect and move vector_transfer_* operations there. This will be used as a testbed for the generic lowering/legalization pass. PiperOrigin-RevId: 225554492
*	[MLIR] Remove NYI assertions in LoopAnalysis.cpp	Nicolas Vasilache	2019-03-29	1	-9/+18
\| \| \| \| \| \| \|	This CL also cleans up some loose ends and returns conservative answers while emitting errors in the NYI cases. PiperOrigin-RevId: 224377004
*	[MLIR] Add support for permutation_map	Nicolas Vasilache	2019-03-29	1	-38/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL hooks up and uses permutation_map in vector_transfer ops. In particular, when going into the nuts and bolts of the implementation, it became clear that cases arose that required supporting broadcast semantics. Broadcast semantics are thus added to the general permutation_map. The verify methods and tests are updated accordingly. Examples of interest include. Example 1: The following MLIR snippet: ```mlir for %i3 = 0 to %M { for %i4 = 0 to %N { for %i5 = 0 to %P { %a5 = load %A[%i4, %i5, %i3] : memref<?x?x?xf32> }}} ``` may vectorize with {permutation_map: (d0, d1, d2) -> (d2, d1)} into: ```mlir for %i3 = 0 to %0 step 32 { for %i4 = 0 to %1 { for %i5 = 0 to %2 step 256 { %4 = vector_transfer_read %arg0, %i4, %i5, %i3 {permutation_map: (d0, d1, d2) -> (d2, d1)} : (memref<?x?x?xf32>, index, index) -> vector<32x256xf32> }}} ```` Meaning that vector_transfer_read will be responsible for reading the 2-D slice: `%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>. This will require a transposition when vector_transfer_read is further lowered. Example 2: The following MLIR snippet: ```mlir %cst0 = constant 0 : index for %i0 = 0 to %M { %a0 = load %A[%cst0, %cst0] : memref<?x?xf32> } ``` may vectorize with {permutation_map: (d0) -> (0)} into: ```mlir for %i0 = 0 to %0 step 128 { %3 = vector_transfer_read %arg0, %c0_0, %c0_0 {permutation_map: (d0, d1) -> (0)} : (memref<?x?xf32>, index, index) -> vector<128xf32> } ```` Meaning that vector_transfer_read will be responsible of reading the 0-D slice `%arg0[%c0, %c0]` into vector<128xf32>. This will require a 1-D vector broadcast when vector_transfer_read is further lowered. Additionally, some minor cleanups and refactorings are performed. One notable thing missing here is the composition with a projection map during materialization. This is because I could not find an AffineMap composition that operates on AffineMap directly: everything related to composition seems to require going through SSAValue and only operates on AffinMap at a distance via AffineValueMap. I have raised this concern a bunch of times already, the followup CL will actually do something about it. In the meantime, the projection is hacked at a minimum to pass verification and materialiation tests are temporarily incorrect. PiperOrigin-RevId: 224376828
*	[MLIR] Add VectorTransferOps	Nicolas Vasilache	2019-03-29	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL implements and uses VectorTransferOps in lieu of the former custom call op. Tests are updated accordingly. VectorTransferOps come in 2 flavors: VectorTransferReadOp and VectorTransferWriteOp. VectorTransferOps can be thought of as a backend-independent pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before it can be lowered to backend-dependent IR. Note that the current implementation does not yet support a real permutation map. Proper support will come in a followup CL. VectorTransferReadOp ==================== VectorTransferReadOp performs a blocking read from a scalar memref location into a super-vector of the same elemental type. This operation is called 'read' by opposition to 'load' because the super-vector granularity is generally not representable with a single hardware register. As a consequence, memory transfers will generally be required when lowering VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile only code. A vector transfer read has semantics similar to a vector load, with additional support for: 1. an optional value of the elemental type of the MemRef. This value supports non-effecting padding and is inserted in places where the vector read exceeds the MemRef bounds. If the value is not specified, the access is statically guaranteed to be within bounds; 2. an attribute of type AffineMap to specify a slice of the original MemRef access and its transposition into the super-vector shape. The permutation_map is an unbounded AffineMap that must represent a permutation from the MemRef dim space projected onto the vector dim space. Example: ```mlir %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32> ... %val = `ssa-value` : f32 // let %i, %j, %k, %l be ssa-values of type index %v0 = vector_transfer_read %src, %i, %j, %k, %l {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (memref<?x?x?x?xf32>, index, index, index, index) -> vector<16x32x64xf32> %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (memref<?x?x?x?xf32>, index, index, index, index, f32) -> vector<16x32x64xf32> ``` VectorTransferWriteOp ===================== VectorTransferWriteOp performs a blocking write from a super-vector to a scalar memref of the same elemental type. This operation is called 'write' by opposition to 'store' because the super-vector granularity is generally not representable with a single hardware register. As a consequence, memory transfers will generally be required when lowering VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile only code. A vector transfer write has semantics similar to a vector store, with additional support for handling out-of-bounds situations. Example: ```mlir %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>. %val = `ssa-value` : vector<16x32x64xf32> // let %i, %j, %k, %l be ssa-values of type index vector_transfer_write %val, %src, %i, %j, %k, %l {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index) ``` PiperOrigin-RevId: 223873234
*	[MLIR][VectorAnalysis] Add a VectorAnalysis and standalone tests	Nicolas Vasilache	2019-03-29	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds some vector support in prevision of the upcoming vector materialization pass. In particular this CL adds 2 functions to: 1. compute the multiplicity of a subvector shape in a supervector shape; 2. help match operations on strict super-vectors. This is defined for a given subvector shape as an operation that manipulates a vector type that is an integral multiple of the subtype, with multiplicity at least 2. This CL also adds a TestUtil pass where we can dump arbitrary testing of functions and analysis that operate at a much smaller granularity than a pass (e.g. an analysis for which it is convenient to write a bit of artificial MLIR and write some custom test). This is in order to keep using Filecheck for things that essentially look and feel like C++ unit tests. PiperOrigin-RevId: 222250910
*	[MLIR] Support for vectorizing operations.	Nicolas Vasilache	2019-03-29	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds support for and a vectorization test to perform scalar 2-D addf. The support extension notably comprises: 1. extend vectorizable test to exclude vector_transfer operations and expose them to LoopAnalysis where they are needed. This is a temporary solution a concrete MLIR Op exists; 2. add some more functional sugar mapKeys, apply and ScopeGuard (which became relevant again); 3. fix improper shifting during coarsening; 4. rename unaligned load/store to vector_transfer_read/write and simplify the design removing the unnecessary AllocOp that were introduced prematurely: vector_transfer_read currently has the form: (memref<?x?x?xf32>, index, index, index) -> vector<32x64x256xf32> vector_transfer_write currently has the form: (vector<32x64x256xf32>, memref<?x?x?xf32>, index, index, index) -> () 5. adds vectorizeOperations which traverses the operations in a ForStmt and rewrites them to their vector form; 6. add support for vector splat from a constant. The relevant tests are also updated. PiperOrigin-RevId: 221421426
*	[MLIR] Make upper bound implementation exclusive	Nicolas Vasilache	2019-03-29	1	-4/+4
\| \| \| \| \| \| \|	This CL implement exclusive upper bound behavior as per b/116854378. A followup CL will update the semantics of the for loop. PiperOrigin-RevId: 220448963
*	[MLIR] Extend vectorization to 2+-D patterns	Nicolas Vasilache	2019-03-29	1	-8/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds support for vectorization using more interesting 2-D and 3-D patterns. Note in particular the fact that we match some pretty complex imperfectly nested 2-D patterns with a quite minimal change to the implementation: we just add a bit of recursion to traverse the matched patterns and actually vectorize the loops. For instance, vectorizing the following loop by 128: ``` for %i3 = 0 to %0 { %7 = affine_apply (d0) -> (d0)(%i3) %8 = load %arg0[%c0_0, %7] : memref<?x?xf32> } ``` Currently generates: ``` #map0 = ()[s0] -> (s0 + 127) #map1 = (d0) -> (d0) for %i3 = 0 to #map0()[%0] step 128 { %9 = affine_apply #map1(%i3) %10 = alloc() : memref<1xvector<128xf32>> %11 = "n_d_unaligned_load"(%arg0, %c0_0, %9, %10, %c0) : (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) -> (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) %12 = load %10[%c0] : memref<1xvector<128xf32>> } ``` The above is subject to evolution. PiperOrigin-RevId: 219629745
*	Implement value type abstraction for types.	River Riddle	2019-03-29	1	-8/+8
\| \| \| \| \| \|	This is done by changing Type to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast. PiperOrigin-RevId: 219372163
*	[MLIR] Implement 1-D vectorization for fastest varying load/stores	Nicolas Vasilache	2019-03-29	1	-32/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL is a first in a series that implements early vectorization of increasingly complex patterns. In particular, early vectorization will support arbitrary loop nesting patterns (both perfectly and imperfectly nested), at arbitrary depths in the loop tree. This first CL builds the minimal support for applying 1-D patterns. It relies on an unaligned load/store op abstraction that can be inplemented differently on different HW. Future CLs will support higher dimensional patterns, but 1-D patterns already exhibit interesting properties. In particular, we want to separate pattern matching (i.e. legality both structural and dependency analysis based), from profitability analysis, from application of the transformation. As a consequence patterns may intersect and we need to verify that a pattern can still apply by the time we get to applying it. A non-greedy analysis on profitability that takes into account pattern intersection is left for future work. Additionally the CL makes the following cleanups: 1. the matches method now returns a value, not a reference; 2. added comments about the MLFunctionMatcher and MLFunctionMatches usage by value; 3. added size and empty methods to matches; 4. added a negative vectorization test with a conditional, this exhibited a but in the iterators. Iterators now return nullptr if the underlying storage is nullpt. PiperOrigin-RevId: 219299489
*	Add getMemRefType() accessors to LoadOp/StoreOp.	Uday Bondhugula	2019-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \|	- There are several places where we are casting the type of the memref obtained from the load/store op to a memref type, and this will become even more common (some upcoming CLs this week). Add a getMemRefType and use it at several places where the cast was being used. PiperOrigin-RevId: 219164326
*	Rename Operation::getAs to Operation::dyn_cast	Feng Liu	2019-03-29	1	-3/+3
\| \| \| \| \| \| \| \| \|	Also rename Operation::is to Operation::isa Introduce Operation::cast All of these are for consistency with global dyn_cast/cast/isa operators. PiperOrigin-RevId: 217878786
*	Generalize / improve DMA transfer overlap; nested and multiple DMA support; ↵	Uday Bondhugula	2019-03-29	1	-3/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	resolve multiple TODOs. - replace the fake test pass (that worked on just the first loop in the MLFunction) to perform DMA pipelining on all suitable loops. - nested DMAs work now (DMAs in an outer loop, more DMAs in nested inner loops) - fix bugs / assumptions: correctly copy memory space and elemental type of source memref for double buffering. - correctly identify matching start/finish statements, handle multiple DMAs per loop. - introduce dominates/properlyDominates utitilies for MLFunction statements. - move checkDominancePreservationOnShifts to LoopAnalysis.h; rename it getShiftValidity - refactor getContainingStmtPos -> findAncestorStmtInBlock - move into Analysis/Utils.h; has two users. - other improvements / cleanup for related API/utilities - add size argument to dma_wait - for nested DMAs or in general, it makes it easy to obtain the size to use when lowering the dma_wait since we wouldn't want to identify the matching dma_start, and more importantly, in general/in the future, there may not always be a dma_start dominating the dma_wait. - add debug information in the pass PiperOrigin-RevId: 217734892
*	[MLIR] Basic infrastructure for vectorization test	Nicolas Vasilache	2019-03-29	1	-0/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL implements a very simple loop vectorization test and the basic infrastructure to support it. The test simply consists in: 1. matching the loops in the MLFunction and all the Load/Store operations nested under the loop; 2. testing whether all the Load/Store are contiguous along the innermost memory dimension along that particular loop. If any reference is non-contiguous (i.e. the ForStmt SSAValue appears in the expression), then the loop is not-vectorizable. The simple test above can gradually be extended with more interesting behaviors to account for the fact that a layout permutation may exist that enables contiguity etc. All these will come in due time but it is worthwhile noting that the test already supports detection of outer-vetorizable loops. In implementing this test, I also added a recursive MLFunctionMatcher and some sugar that can capture patterns such as `auto gemmLike = Doall(Doall(Red(LoadStore())))` and allows iterating on the matched IR structures. For now it just uses in order traversal but post-order DFS will be useful in the future once IR rewrites start occuring. One may note that the memory management design decision follows a different pattern from MLIR. After evaluating different designs and how they quickly increase cognitive overhead, I decided to opt for the simplest solution in my view: a class-wide (threadsafe) RAII context. This way, a pass that needs MLFunctionMatcher can just have its own locally scoped BumpPtrAllocator and everything is cleaned up when the pass is destroyed. If passes are expected to have a longer lifetime, then the contexts can easily be scoped inside the runOnMLFunction call and storage lifetime reduced. Lastly, whatever the scope of threading (module, function, pass), this is expected to also be future-proof wrt concurrency (but this is a detail atm). PiperOrigin-RevId: 217622889
*	[MLIR] AffineMap value type	Nicolas Vasilache	2019-03-29	1	-7/+7
\| \| \| \| \| \| \| \|	This CL applies the same pattern as AffineExpr to AffineMap: a simple struct that acts as the storage is allocated in the bump pointer. The AffineMap is immutable and accessed everywhere by value. PiperOrigin-RevId: 216445930
*	[MLIR] Sketch AffineExpr value type	Nicolas Vasilache	2019-03-29	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL sketches what it takes for AffineExpr to fully have by-value semantics and not be a not-so-smart pointer anymore. This essentially makes the underyling class a simple storage struct and implements the operations on the value type directly. Since there is no forwarding of operations anymore, we can full isolate the storage class and make a hard visibility barrier by moving detail::AffineExpr into AffineExprDetail.h. AffineExprDetail.h is only included where storage-related information is needed. PiperOrigin-RevId: 216385459
*	[MLIR] AffineExpr final cleanups	Nicolas Vasilache	2019-03-29	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	This CL: 1. performs the global codemod AffineXExpr->AffineXExprClass and AffineXExprRef -> AffineXExpr; 2. simplifies function calls by removing the redundant MLIRContext parameter; 3. adds missing binary operator versions of scalar op AffineExpr where it makes sense. PiperOrigin-RevId: 216242674
*	[MLIR] Cleanup AffineExpr	Nicolas Vasilache	2019-03-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL introduces a series of cleanups for AffineExpr value types: 1. to make it clear that the value types should be used, the pointer AffineExpr types are put in the detail namespace. Unfortunately, since the value type operator-> only forwards to the underlying pointer type, we still need to expose this in the include file for now; 2. AffineExprKind is ok to use, it thus comes out of detail and thus of AffineExpr 3. getAffineDimExpr, getAffineSymbolExpr, getAffineConstantExpr are similarly extracted as free functions and their naming is mande consistent across Builder, MLContext and AffineExpr 4. AffineBinaryOpEx::simplify functions are made into static free functions. In particular it is moved away from AffineMap.cpp where it does not belong 5. operator AffineExprType is made explicit 6. uses the binary operators everywhere possible 7. drops the pointer usage everywhere outside of AffineExpr.cpp, MLIRContext.cpp and AsmPrinter.cpp PiperOrigin-RevId: 216207212
*	[MLIR] Value types for AffineXXXExpr	Nicolas Vasilache	2019-03-29	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL makes AffineExprRef into a value type. Notably: 1. drops llvm isa, cast, dyn_cast on pointer type and uses member functions on the value type. It may be possible to still use classof (in a followup CL) 2. AffineBaseExprRef aggressively casts constness away: if we mean the type is immutable then let's jump in with both feet; 3. Drop implicit casts to the underlying pointer type because that always results in surprising behavior and is not needed in practice once enough cleanup has been applied. The remaining negative I see is that we still need to mix operator. and operator->. There is an ugly solution that forwards the methods but that ends up duplicating the class hierarchy which I tried to avoid as much as possible. But maybe it's not that bad anymore since AffineExpr.h would still contain a single class hierarchy (the duplication would be impl detail in.cpp) PiperOrigin-RevId: 216188003
*	[MLIR] Templated AffineExprBaseRef	Nicolas Vasilache	2019-03-29	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL implements AffineExprBaseRef as a templated type to allow LLVM-style casts to work properly. This also allows making AffineExprBaseRef::expr private. To achieve this, it is necessary to use llvm::simplify_type and make AffineConstExpr derive from both AffineExpr and llvm::simplify<AffineExprRef>. Note that llvm::simplify_type is just an interface to enable the proper template resolution of isa/cast/dyn_cast but it otherwise holds no value. Lastly note that certain dyn_cast operations wanted the const AffineExpr* form of AffineExprBaseRef so I made the implicit constructor take that by default and documented the immutable behavior. I think this is consistent with the decision to make unique'd type immutable by convention and never use const on them. PiperOrigin-RevId: 215642247
*	[MLIR] Remove uses of AffineExpr* outside of IR	Nicolas Vasilache	2019-03-29	1	-17/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	This CL uniformizes the uses of AffineExprWrap outside of IR. The public API of AffineExpr builder is modified to only use AffineExprWrap. A few places access AffineExprWrap.expr, this is only while the API is in transition to easily keep track (i.e. make expr private and let the compiler track the errors). Parser.cpp exhibits patterns that are dependent on nullptr values so converting it is left for another CL. PiperOrigin-RevId: 215642005
*	Fix MLIR's floordiv, ceildiv, and mod for constant inputs (for negative lhs's)	Uday Bondhugula	2019-03-29	1	-4/+3
\| \| \| \| \| \| \| \|	- introduce mlir::{floorDiv, ceilDiv, mod} for constant inputs in mlir/Support/MathExtras.h - consistently use these everywhere in IR, Analysis, and Transforms. PiperOrigin-RevId: 215580677
*	Change loop step to be a positive integral constant	Uday Bondhugula	2019-03-29	1	-3/+2
\| \| \| \| \| \|	Changing this per discussion on mlir-team. Spec updated. PiperOrigin-RevId: 214868483
*	Extend loop unroll/unroll-and-jam to affine bounds + refactor related code.	Uday Bondhugula	2019-03-29	1	-20/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- extend loop unroll-jam similar to loop unroll for affine bounds - extend both loop unroll/unroll-jam to deal with cleanup loop for non multiple of unroll factor. - extend promotion of single iteration loops to work with affine bounds - fix typo bugs in loop unroll - refactor common code b/w loop unroll and loop unroll-jam - move prototypes of non-pass transforms to LoopUtils.h - add additional builder methods. - introduce loopUnrollUpTo(factor) to unroll by either factor or trip count, whichever is less. - remove Statement::isInnermost (not used for now - will come back at the right place/in right form later) PiperOrigin-RevId: 213471227
*	Extend getConstantTripCount to deal with a larger subset of loop bounds; ↵	Uday Bondhugula	2019-03-29	1	-0/+128
	make loop unroll/unroll-and-jam more powerful; add additional affine expr builder methods - use previously added analysis/simplification to infer multiple of unroll factor trip counts, making loop unroll/unroll-and-jam more general. - for loop unroll, support bounds that are single result affine map's with the same set of operands. For unknown loop bounds, loop unroll will now work as long as trip count can be determined to be a multiple of unroll factor. - extend getConstantTripCount to deal with single result affine map's with the same operands. move it to mlir/Analysis/LoopAnalysis.cpp - add additional builder utility methods for affine expr arithmetic (difference, mod/floordiv/ceildiv w.r.t postitive constant). simplify code to use the utility methods. - move affine analysis routines to AffineAnalysis.cpp/.h from AffineStructures.cpp/.h. - Rename LoopUnrollJam to LoopUnrollAndJam to match class name. - add an additional simplification for simplifyFloorDiv, simplifyCeilDiv - Rename AffineMap::getNumOperands() getNumInputs: an affine map by itself does not have operands. Operands are passed to it through affine_apply, from loop bounds/if condition's, etc., operands are stored in the latter. This should be sufficiently powerful for now as far as unroll/unroll-and-jam go for TPU code generation, and can move to other analyses/transformations. Loop nests like these are now unrolled without any cleanup loop being generated. for %i = 1 to 100 { // unroll factor 4: no cleanup loop will be generated. for %j = (d0) -> (d0) (%i) to (d0) -> (5*d0 + 3) (%i) { %x = "foo"(%j) : (affineint) -> i32 } } for %i = 1 to 100 { // unroll factor 4: no cleanup loop will be generated. for %j = (d0) -> (d0) (%i) to (d0) -> (d0 - d mod 4 - 1) (%i) { %y = "foo"(%j) : (affineint) -> i32 } } for %i = 1 to 100 { for %j = (d0) -> (d0) (%i) to (d0) -> (d0 + 128) (%i) { %x = "foo"() : () -> i32 } } TODO(bondhugula): extend this to LoopUnrollAndJam as well in the next CL (with minor changes). PiperOrigin-RevId: 212661212