summaryrefslogtreecommitdiffstats
path: root/mlir/lib/Transforms/Vectorize.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Replace usages of "Op::operator->" with ".".River Riddle2019-03-291-10/+9
| | | | | | This is step 2/N of removing the temporary operator-> method as part of the de-const transition. PiperOrigin-RevId: 240200792
* Replace usages of "operator->" with "." for the AffineOps.River Riddle2019-03-291-9/+8
| | | | | Note: The "operator->" method is a temporary helper for the de-const transition and is gradually being phased out. PiperOrigin-RevId: 240179439
* NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for' and ↵River Riddle2019-03-291-25/+25
| | | | | | set the namespace of the AffineOps dialect to 'affine'. PiperOrigin-RevId: 240165792
* Remove OpPointer, cleaning up a ton of code. This also moves Ops to usingChris Lattner2019-03-291-5/+5
| | | | | | | | | inherited constructors, which is cleaner and means you can now use DimOp() to get a null op, instead of having to use Instruction::getNull<DimOp>(). This removes another 200 lines of code. PiperOrigin-RevId: 240068113
* Push a bunch of 'consts' out of the *Op structure, in prep for removingChris Lattner2019-03-291-1/+1
| | | | | | OpPointer. PiperOrigin-RevId: 240044712
* Remove const from Value, Instruction, Argument, and the various methods on theChris Lattner2019-03-291-6/+5
| | | | | | *Op classes. This is a net reduction by almost 400LOC. PiperOrigin-RevId: 239972443
* Cleanups Vectorize and SliceAnalysis - NFCNicolas Vasilache2019-03-291-188/+155
| | | | | | This CL cleans up and refactors super-vectorization and slice analysis. PiperOrigin-RevId: 238986866
* Rename BlockList into RegionAlex Zinenko2019-03-291-2/+2
| | | | | | NFC. This is step 1/n to specifying regions as parts of any operation. PiperOrigin-RevId: 238472370
* Change Pass:getFunction() to return pointer instead of ref - NFCUday Bondhugula2019-03-291-1/+1
| | | | | | | | | - change this for consistency - everything else similar takes/returns a Function pointer - the FuncBuilder ctor, Block/Value/Instruction::getFunction(), etc. - saves a whole bunch of &s everywhere PiperOrigin-RevId: 236928761
* NFC. Move all of the remaining operations left in BuiltinOps to StandardOps. ↵River Riddle2019-03-291-1/+0
| | | | | | The only thing left in BuiltinOps are the core MLIR types. The standard types can't be moved because they are referenced within the IR directory, e.g. in things like Builder. PiperOrigin-RevId: 236403665
* Use consistent names for dialect op source filesLei Zhang2019-03-291-1/+1
| | | | | | | | | | | | | | | This CL changes dialect op source files (.h, .cpp, .td) to follow the following convention: <full-dialect-name>/<dialect-namespace>Ops.{h|cpp|td} Builtin and standard dialects are specially treated, though. Both of them do not have dialect namespace; the former is still named as BuiltinOps.* and the latter is named as Ops.*. Purely mechanical. NFC. PiperOrigin-RevId: 236371358
* Provide a Builder::getNamedAttr and ↵River Riddle2019-03-291-4/+3
| | | | | | (Instruction|Function)::setAttr(StringRef, Attribute) to simplify attribute manipulation. PiperOrigin-RevId: 236222504
* Remove PassResult and have the runOnFunction/runOnModule functions return ↵River Riddle2019-03-291-3/+2
| | | | | | void instead. To signal a pass failure, passes should now invoke the 'signalPassFailure' method. This provides the equivalent functionality when needed, but isn't an intrusive part of the API like PassResult. PiperOrigin-RevId: 236202029
* Port all of the existing passes over to the new pass manager infrastructure. ↵River Riddle2019-03-291-8/+5
| | | | | | This is largely NFC. PiperOrigin-RevId: 235952357
* Internal changeMLIR Team2019-03-291-1/+1
| | | | PiperOrigin-RevId: 235191129
* Define a PassID class to use when defining a pass. This allows for the type ↵River Riddle2019-03-291-3/+1
| | | | | | used for the ID field to be self documenting. It also allows for the compiler to know the set alignment of the ID object, which is useful for storing pointer identifiers within llvm data structures. PiperOrigin-RevId: 235107957
* NFC: Refactor the files related to passes.River Riddle2019-03-291-1/+1
| | | | | | | * PassRegistry is split into its own source file. * Pass related files are moved to a new library 'Pass'. PiperOrigin-RevId: 234705771
* Automated rollback of changelist 232717775.Uday Bondhugula2019-03-291-25/+25
| | | | PiperOrigin-RevId: 232807986
* NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for'. ↵River Riddle2019-03-291-25/+25
| | | | | | The is the second step to adding a namespace to the AffineOps dialect. PiperOrigin-RevId: 232717775
* Remove remaining usages of OperationInst in lib/Transforms.River Riddle2019-03-291-49/+42
| | | | PiperOrigin-RevId: 232323671
* Fix the handling of the resizable operands bit of OperationState in a few ↵River Riddle2019-03-291-2/+2
| | | | | | places. PiperOrigin-RevId: 232163738
* Define the AffineForOp and replace ForInst with it. This patch is largely ↵River Riddle2019-03-291-28/+35
| | | | | | mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body. PiperOrigin-RevId: 232060516
* Define an detail::OperandStorage class to handle managing instruction ↵River Riddle2019-03-291-0/+1
| | | | | | operands. This class stores operands in a similar way to SmallVector except for two key differences. The first is the inline storage, which is a trailing objects array. The second is that being able to dynamically resize the operand list is optional. This means that we can enable the cases where operations need to change the number of operands after construction without losing the spatial locality benefits of the common case (operation instructions / non-control flow instructions with a lifetime fixed number of operands). PiperOrigin-RevId: 231910497
* Address Performance issue in NestedMatcherNicolas Vasilache2019-03-291-96/+96
| | | | | | | | | | | | | | | | | | | | | | A performance issue was reported due to the usage of NestedMatcher in ComposeAffineMaps. The main culprit was the ubiquitous copies that were occuring when appending even a single element in `matchOne`. This CL generally simplifies the implementation and removes one level of indirection by getting rid of auxiliary storage as well as simplifying the API. The users of the API are updated accordingly. The implementation was tested on a heavily unrolled example with ComposeAffineMaps and is now close in performance with an implementation based on stateless InstWalker. As a reminder, the whole ComposeAffineMaps pass is slated to disappear but the bug report was very useful as a stress test for NestedMatchers. Lastly, the following cleanups reported by @aminim were addressed: 1. make NestedPatternContext scoped within runFunction rather than at the Pass level. This was caused by a previous misunderstanding of Pass lifetime; 2. use defensive assertions in the constructor of NestedPatternContext to make it clear a unique such locally scoped context is allowed to exist. PiperOrigin-RevId: 231781279
* Change the ForInst induction variable to be a block argument of the body ↵River Riddle2019-03-291-3/+5
| | | | | | instead of the ForInst itself. This is a necessary step in converting ForInst into an operation. PiperOrigin-RevId: 231064139
* Cleanup resource management and rename recursive matchersNicolas Vasilache2019-03-291-17/+15
| | | | | | | | | | | | | | | This CL follows up on a memory leak issue related to SmallVector growth that escapes the BumpPtrAllocator. The fix is to properly use ArrayRef and placement new to define away the issue. The following renaming is also applied: 1. MLFunctionMatcher -> NestedPattern 2. MLFunctionMatches -> NestedMatch As a consequence all allocations are now guaranteed to live on the BumpPtrAllocator. PiperOrigin-RevId: 231047766
* Wrap cl::opt flags within passes in a category with the pass name. This ↵River Riddle2019-03-291-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | improves the help output of tools like mlir-opt. Example: dma-generate options: -dma-fast-mem-capacity - Set fast memory space ... -dma-fast-mem-space=<uint> - Set fast memory space ... loop-fusion options: -fusion-compute-tolerance=<number> - Fractional increase in ... -fusion-maximal - Enables maximal loop fusion loop-tile options: -tile-size=<uint> - Use this tile size for ... loop-unroll options: -unroll-factor=<uint> - Use this unroll factor ... -unroll-full - Fully unroll loops -unroll-full-threshold=<uint> - Unroll all loops with ... -unroll-num-reps=<uint> - Unroll innermost loops ... loop-unroll-jam options: -unroll-jam-factor=<uint> - Use this unroll jam factor ... PiperOrigin-RevId: 231019363
* Allow operations to hold a blocklist and add support for parsing/printing a ↵River Riddle2019-03-291-0/+2
| | | | | | block list for verbose printing. PiperOrigin-RevId: 230951462
* Add cloning functionality to Block and Function, this also adds support for ↵River Riddle2019-03-291-2/+1
| | | | | | remapping successor block operands of terminator operations. We define a new BlockAndValueMapping class to simplify mapping between cloned values. PiperOrigin-RevId: 230768759
* Migrate VectorOrTensorType/MemRefType shape api to use int64_t instead of int.River Riddle2019-03-291-2/+3
| | | | PiperOrigin-RevId: 230605756
* Extend InstVisitor and Walker to handle arbitrary CFG functions, expand theChris Lattner2019-03-291-2/+2
| | | | | | | | | | | Function::walk functionality into f->walkInsts/Ops which allows visiting all instructions, not just ops. Eliminate Function::getBody() and Function::getReturn() helpers which crash in CFG functions, and were only kept around as a bridge. This is step 25/n towards merging instructions and statements. PiperOrigin-RevId: 227243966
* Standardize naming of statements -> instructions, revisting the code base to beChris Lattner2019-03-291-103/+104
| | | | | | | | | consistent and moving the using declarations over. Hopefully this is the last truly massive patch in this refactoring. This is step 21/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227178245
* Eliminate the using decls for MLFunction and CFGFunction standardizing onChris Lattner2019-03-291-6/+6
| | | | | | | | Function. This is step 18/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227139399
* Rename BBArgument -> BlockArgument, Op::getOperation -> Op::getInst(),Chris Lattner2019-03-291-5/+4
| | | | | | | | StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names. This is step 17/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227121537
* Merge Operation into OperationInst and standardize nomenclature aroundChris Lattner2019-03-291-44/+44
| | | | | | | | OperationInst. This is a big mechanical patch. This is step 16/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227093712
* Merge CFGFuncBuilder/MLFuncBuilder/FuncBuilder together into a single newChris Lattner2019-03-291-7/+7
| | | | | | | | FuncBuilder class. Also rename SSAValue.cpp to Value.cpp This is step 12/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227067644
* Merge SSAValue, CFGValue, and MLValue together into a single Value class, whichChris Lattner2019-03-291-30/+24
| | | | | | | | | is the new base of the SSA value hierarchy. This CL also standardizes all the nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing. This is step 11/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227064624
* SuperVectorization: fix 'isa' assertionAlex Zinenko2019-03-291-1/+1
| | | | | | | | | | | | Supervectorization uses null pointers to SSA values as a means of communicating the failure to vectorize. In operation vectorization, all operations producing the values of operation arguments must be vectorized for the given operation to be vectorized. The existing check verified if any of the value "def" statements was vectorized instead, sometimes leading to assertions inside `isa` called on a null pointer. Fix this to check that all "def" statements were vectorized. PiperOrigin-RevId: 226941552
* Per review on the previous CL, drop MLFuncBuilder::createOperation, changingChris Lattner2019-03-291-7/+12
| | | | | | | | clients to use OperationState instead. This makes MLFuncBuilder more similiar to CFGFuncBuilder. This whole area will get tidied up more when cfg and ml worlds get unified. This patch is just gardening, NFC. PiperOrigin-RevId: 226701959
* Extract vector_transfer_* Ops into a SuperVectorDialect.Alex Zinenko2019-03-291-0/+1
| | | | | | | | | | | From the beginning, vector_transfer_read and vector_transfer_write opreations were intended as a mid-level vectorization abstraction. In particular, they are lowered to the StandardOps dialect before further processing. As such, it does not make sense to keep them at the same level as StandardOps. Introduce the new SuperVectorOps dialect and move vector_transfer_* operations there. This will be used as a testbed for the generic lowering/legalization pass. PiperOrigin-RevId: 225554492
* [MLIR] Drop assert for NYI in Vectorize.cppNicolas Vasilache2019-03-291-10/+16
| | | | | | | This CLs adds proper error emission, removes NYI assertions and documents assumptions that are required in the relevant functions. PiperOrigin-RevId: 224377207
* [MLIR] Add support for permutation_mapNicolas Vasilache2019-03-291-111/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This CL hooks up and uses permutation_map in vector_transfer ops. In particular, when going into the nuts and bolts of the implementation, it became clear that cases arose that required supporting broadcast semantics. Broadcast semantics are thus added to the general permutation_map. The verify methods and tests are updated accordingly. Examples of interest include. Example 1: The following MLIR snippet: ```mlir for %i3 = 0 to %M { for %i4 = 0 to %N { for %i5 = 0 to %P { %a5 = load %A[%i4, %i5, %i3] : memref<?x?x?xf32> }}} ``` may vectorize with {permutation_map: (d0, d1, d2) -> (d2, d1)} into: ```mlir for %i3 = 0 to %0 step 32 { for %i4 = 0 to %1 { for %i5 = 0 to %2 step 256 { %4 = vector_transfer_read %arg0, %i4, %i5, %i3 {permutation_map: (d0, d1, d2) -> (d2, d1)} : (memref<?x?x?xf32>, index, index) -> vector<32x256xf32> }}} ```` Meaning that vector_transfer_read will be responsible for reading the 2-D slice: `%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>. This will require a transposition when vector_transfer_read is further lowered. Example 2: The following MLIR snippet: ```mlir %cst0 = constant 0 : index for %i0 = 0 to %M { %a0 = load %A[%cst0, %cst0] : memref<?x?xf32> } ``` may vectorize with {permutation_map: (d0) -> (0)} into: ```mlir for %i0 = 0 to %0 step 128 { %3 = vector_transfer_read %arg0, %c0_0, %c0_0 {permutation_map: (d0, d1) -> (0)} : (memref<?x?xf32>, index, index) -> vector<128xf32> } ```` Meaning that vector_transfer_read will be responsible of reading the 0-D slice `%arg0[%c0, %c0]` into vector<128xf32>. This will require a 1-D vector broadcast when vector_transfer_read is further lowered. Additionally, some minor cleanups and refactorings are performed. One notable thing missing here is the composition with a projection map during materialization. This is because I could not find an AffineMap composition that operates on AffineMap directly: everything related to composition seems to require going through SSAValue and only operates on AffinMap at a distance via AffineValueMap. I have raised this concern a bunch of times already, the followup CL will actually do something about it. In the meantime, the projection is hacked at a minimum to pass verification and materialiation tests are temporarily incorrect. PiperOrigin-RevId: 224376828
* [MLIR] Add VectorTransferOpsNicolas Vasilache2019-03-291-31/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This CL implements and uses VectorTransferOps in lieu of the former custom call op. Tests are updated accordingly. VectorTransferOps come in 2 flavors: VectorTransferReadOp and VectorTransferWriteOp. VectorTransferOps can be thought of as a backend-independent pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before it can be lowered to backend-dependent IR. Note that the current implementation does not yet support a real permutation map. Proper support will come in a followup CL. VectorTransferReadOp ==================== VectorTransferReadOp performs a blocking read from a scalar memref location into a super-vector of the same elemental type. This operation is called 'read' by opposition to 'load' because the super-vector granularity is generally not representable with a single hardware register. As a consequence, memory transfers will generally be required when lowering VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile only code. A vector transfer read has semantics similar to a vector load, with additional support for: 1. an optional value of the elemental type of the MemRef. This value supports non-effecting padding and is inserted in places where the vector read exceeds the MemRef bounds. If the value is not specified, the access is statically guaranteed to be within bounds; 2. an attribute of type AffineMap to specify a slice of the original MemRef access and its transposition into the super-vector shape. The permutation_map is an unbounded AffineMap that must represent a permutation from the MemRef dim space projected onto the vector dim space. Example: ```mlir %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32> ... %val = `ssa-value` : f32 // let %i, %j, %k, %l be ssa-values of type index %v0 = vector_transfer_read %src, %i, %j, %k, %l {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (memref<?x?x?x?xf32>, index, index, index, index) -> vector<16x32x64xf32> %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (memref<?x?x?x?xf32>, index, index, index, index, f32) -> vector<16x32x64xf32> ``` VectorTransferWriteOp ===================== VectorTransferWriteOp performs a blocking write from a super-vector to a scalar memref of the same elemental type. This operation is called 'write' by opposition to 'store' because the super-vector granularity is generally not representable with a single hardware register. As a consequence, memory transfers will generally be required when lowering VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile only code. A vector transfer write has semantics similar to a vector store, with additional support for handling out-of-bounds situations. Example: ```mlir %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>. %val = `ssa-value` : vector<16x32x64xf32> // let %i, %j, %k, %l be ssa-values of type index vector_transfer_write %val, %src, %i, %j, %k, %l {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index) ``` PiperOrigin-RevId: 223873234
* [MLIR][Vectorize] Refactor Vectorize use-def propagation.Nicolas Vasilache2019-03-291-282/+817
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This CL refactors a few things in Vectorize.cpp: 1. a clear distinction is made between: a. the LoadOp are the roots of vectorization and must be vectorized eagerly and propagate their value; and b. the StoreOp which are the terminals of vectorization and must be vectorized late (i.e. they do not produce values that need to be propagated). 2. the StoreOp must be vectorized late because in general it can store a value that is not reachable from the subset of loads defined in the current pattern. One trivial such case is storing a constant defined at the top-level of the MLFunction and that needs to be turned into a splat. 3. a description of the algorithm is given; 4. the implementation matches the algorithm; 5. the last example is made parametric, in practice it will fully rely on the implementation of vector_transfer_read/write which will handle boundary conditions and padding. This will happen by lowering to a lower-level abstraction either: a. directly in MLIR (whether DMA or just loops or any async tasks in the future) (whiteboxing); b. in LLO/LLVM-IR/whatever blackbox library call/ search + swizzle inventor one may want to use; c. a partial mix of a. and b. (grey-boxing) 5. minor cleanups are applied; 6. mistakenly disabled unit tests are re-enabled (oopsie). With this CL, this MLIR snippet: ``` mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> { %A = alloc (%M, %N) : memref<?x?xf32> %B = alloc (%M, %N) : memref<?x?xf32> %C = alloc (%M, %N) : memref<?x?xf32> %f1 = constant 1.0 : f32 %f2 = constant 2.0 : f32 for %i0 = 0 to %M { for %i1 = 0 to %N { // non-scoped %f1 store %f1, %A[%i0, %i1] : memref<?x?xf32> } } for %i4 = 0 to %M { for %i5 = 0 to %N { %a5 = load %A[%i4, %i5] : memref<?x?xf32> %b5 = load %B[%i4, %i5] : memref<?x?xf32> %s5 = addf %a5, %b5 : f32 // non-scoped %f1 %s6 = addf %s5, %f1 : f32 store %s6, %C[%i4, %i5] : memref<?x?xf32> } } return %C : memref<?x?xf32> } ``` vectorized with these arguments: ``` -vectorize -virtual-vector-size 256 --test-fastest-varying=0 ``` vectorization produces this standard innermost-loop vectorized code: ``` mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> { %0 = alloc(%arg0, %arg1) : memref<?x?xf32> %1 = alloc(%arg0, %arg1) : memref<?x?xf32> %2 = alloc(%arg0, %arg1) : memref<?x?xf32> %cst = constant 1.000000e+00 : f32 %cst_0 = constant 2.000000e+00 : f32 for %i0 = 0 to %arg0 { for %i1 = 0 to %arg1 step 256 { %cst_1 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32> "vector_transfer_write"(%cst_1, %0, %i0, %i1) : (vector<256xf32>, memref<?x?xf32>, index, index) -> () } } for %i2 = 0 to %arg0 { for %i3 = 0 to %arg1 step 256 { %3 = "vector_transfer_read"(%0, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32> %4 = "vector_transfer_read"(%1, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32> %5 = addf %3, %4 : vector<256xf32> %cst_2 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32> %6 = addf %5, %cst_2 : vector<256xf32> "vector_transfer_write"(%6, %2, %i2, %i3) : (vector<256xf32>, memref<?x?xf32>, index, index) -> () } } return %2 : memref<?x?xf32> } ``` Of course, much more intricate n-D imperfectly-nested patterns can be emitted too in a fully declarative fashion, but this is enough for now. PiperOrigin-RevId: 222280209
* [MLIR][VectorAnalysis] Add a VectorAnalysis and standalone testsNicolas Vasilache2019-03-291-0/+1
| | | | | | | | | | | | | | | | | This CL adds some vector support in prevision of the upcoming vector materialization pass. In particular this CL adds 2 functions to: 1. compute the multiplicity of a subvector shape in a supervector shape; 2. help match operations on strict super-vectors. This is defined for a given subvector shape as an operation that manipulates a vector type that is an integral multiple of the subtype, with multiplicity at least 2. This CL also adds a TestUtil pass where we can dump arbitrary testing of functions and analysis that operate at a much smaller granularity than a pass (e.g. an analysis for which it is convenient to write a bit of artificial MLIR and write some custom test). This is in order to keep using Filecheck for things that essentially look and feel like C++ unit tests. PiperOrigin-RevId: 222250910
* [MLIR] Support for vectorizing operations.Nicolas Vasilache2019-03-291-255/+383
| | | | | | | | | | | | | | | | | | | | | | | | | This CL adds support for and a vectorization test to perform scalar 2-D addf. The support extension notably comprises: 1. extend vectorizable test to exclude vector_transfer operations and expose them to LoopAnalysis where they are needed. This is a temporary solution a concrete MLIR Op exists; 2. add some more functional sugar mapKeys, apply and ScopeGuard (which became relevant again); 3. fix improper shifting during coarsening; 4. rename unaligned load/store to vector_transfer_read/write and simplify the design removing the unnecessary AllocOp that were introduced prematurely: vector_transfer_read currently has the form: (memref<?x?x?xf32>, index, index, index) -> vector<32x64x256xf32> vector_transfer_write currently has the form: (vector<32x64x256xf32>, memref<?x?x?xf32>, index, index, index) -> () 5. adds vectorizeOperations which traverses the operations in a ForStmt and rewrites them to their vector form; 6. add support for vector splat from a constant. The relevant tests are also updated. PiperOrigin-RevId: 221421426
* Implement value type abstraction for locations.River Riddle2019-03-291-4/+4
| | | | | | Value type abstraction for locations differ from others in that a Location can NOT be null. NOTE: dyn_cast returns an Optional<T>. PiperOrigin-RevId: 220682078
* Initialize Pass with PassID.Jacques Pienaar2019-03-291-0/+2
| | | | | | The passID is not currently stored in Pass but this avoids the unused variable warning. The passID is used to uniquely identify passes, currently this is only stored/used in PassInfo. PiperOrigin-RevId: 220485662
* Add static pass registrationJacques Pienaar2019-03-291-0/+8
| | | | | | | | Add static pass registration and change mlir-opt to use it. Future work is needed to refactor the registration for PassManager usage. Change build targets to alwayslink to enforce registration. PiperOrigin-RevId: 220390178
* [MLIR] Extend vectorization to 2+-D patternsNicolas Vasilache2019-03-291-61/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This CL adds support for vectorization using more interesting 2-D and 3-D patterns. Note in particular the fact that we match some pretty complex imperfectly nested 2-D patterns with a quite minimal change to the implementation: we just add a bit of recursion to traverse the matched patterns and actually vectorize the loops. For instance, vectorizing the following loop by 128: ``` for %i3 = 0 to %0 { %7 = affine_apply (d0) -> (d0)(%i3) %8 = load %arg0[%c0_0, %7] : memref<?x?xf32> } ``` Currently generates: ``` #map0 = ()[s0] -> (s0 + 127) #map1 = (d0) -> (d0) for %i3 = 0 to #map0()[%0] step 128 { %9 = affine_apply #map1(%i3) %10 = alloc() : memref<1xvector<128xf32>> %11 = "n_d_unaligned_load"(%arg0, %c0_0, %9, %10, %c0) : (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) -> (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) %12 = load %10[%c0] : memref<1xvector<128xf32>> } ``` The above is subject to evolution. PiperOrigin-RevId: 219629745
OpenPOWER on IntegriCloud