bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Introduce splat op + provide its LLVM lowering	Uday Bondhugula	2019-09-24	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- introduce splat op in standard dialect (currently for int/float/index input type, output type can be vector or statically shaped tensor) - implement LLVM lowering (when result type is 1-d vector) - add constant folding hook for it - while on Ops.cpp, fix some stale names Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Closes tensorflow/mlir#141 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/141 from bondhugula:splat 48976a6aa0a75be6d91187db6418de989e03eb51 PiperOrigin-RevId: 270965304
*	Normalize lowering of MemRef types	Nicolas Vasilache	2019-09-24	4	-91/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The RFC for unifying Linalg and Affine compilation passes into an end-to-end flow with a predictable ABI and linkage to external function calls raised the question of why we have variable sized descriptors for memrefs depending on whether they have static or dynamic dimensions (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio). This CL standardizes the ABI on the rank of the memrefs. The LLVM struct for a memref becomes equivalent to: ``` template <typename Elem, size_t Rank> struct { Elem *ptr; int64_t sizes[Rank]; }; ``` PiperOrigin-RevId: 270947276
*	Outline GPU kernel function into a nested module.	Christian Sigg	2019-09-23	3	-38/+63
\| \| \| \| \| \| \| \|	Roll forward of commit 5684a12. When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module. PiperOrigin-RevId: 270639748
*	Add integer sign- and zero-extension and truncation to standard.	Manuel Freiberger	2019-09-21	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \|	This adds sign- and zero-extension and truncation of integer types to the standard dialects. This allows to perform integer type conversions without having to go to the LLVM dialect and introduce custom type casts (between standard and LLVM integer types). Closes tensorflow/mlir#134 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/134 from ombre5733:sext-zext-trunc-in-std c7657bc84c0ca66b304e53ec03797e09152e4d31 PiperOrigin-RevId: 270479722
*	Automated rollback of commit 5684a12434f923d03b6870f2aa16226bfb0b38b6	George Karpenkov	2019-09-19	3	-63/+38
\| \| \| \|	PiperOrigin-RevId: 270126672
*	Outline GPU kernel function into a nested module.	MLIR Team	2019-09-19	3	-38/+63
\| \| \| \| \| \|	When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module. PiperOrigin-RevId: 269987720
*	Update SPIR-V symbols and use GLSL450 instead of VulkanKHR	Lei Zhang	2019-09-13	3	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPIR-V recently publishes v1.5, which brings a bunch of symbols into core. So the suffix "KHR"/"EXT"/etc. is removed from the symbols. We use a script to pull information from the spec directly. Also changed conversion and tests to use GLSL450 instead of VulkanKHR memory model. GLSL450 is still the main memory model supported by Vulkan shaders and it does not require extra capability to enable. PiperOrigin-RevId: 268992661
*	Retain address space during MLIR > LLVM conversion.	MLIR Team	2019-09-04	1	-0/+11
\| \| \| \|	PiperOrigin-RevId: 267206460
*	Move LLVMIR dialect tests from test/LLVMIR to test/Dialect and test/Conversion	Alex Zinenko	2019-09-04	4	-0/+896
\| \| \| \| \| \| \| \| \| \| \|	This follows up on the recent restructuring that moved the dialects under lib/Dialect and inter-dialect conversions to lib/Conversion. Originally, the tests for both the LLVMIR dialect itself and the conversion from Standard to LLVMIR dialect lived under test/LLVMIR. This no longer reflects the code structure. Move the tests to either test/Dialect/LLVMIR or test/Conversion/StandardToLLVM depending on the features they exercise. PiperOrigin-RevId: 267159219
*	LLVM dialect: prefix auxiliary operations with "mlir."	Alex Zinenko	2019-09-03	3	-7/+7
\| \| \| \| \| \| \| \| \| \|	Some of the operations in the LLVM dialect are required to model the LLVM IR in MLIR, for example "constant" operations are needed to declare a constant value since MLIR, unlike LLVM, does not support immediate values as operands. To avoid confusion with actual LLVM operations, we prefix such axuiliary operations with "mlir.". PiperOrigin-RevId: 266942838
*	Enhance GPU To SPIR-V conversion to support builtins and load/store ops.	Mahesh Ravishankar	2019-08-27	2	-0/+165
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To support a conversion of a simple load-compute-store kernel from GPU dialect to SPIR-V dialect, the conversion of operations like "gpu.block_dim", "gpu.thread_id" which allow threads to get the launch conversion is needed. In SPIR-V these are specified as global variables with builin attributes. This CL adds support to specify builtin variables in SPIR-V conversion framework. This is used to convert the relevant operations from GPU dialect to SPIR-V dialect. Also add support for conversion of load/store operation in Standard dialect to SPIR-V dialect. To simplify the conversion add a method to build a spv.AccessChain operation that automatically determines the return type based on the base pointer type and the indices provided. PiperOrigin-RevId: 265718525
*	ConvertLaunchFuncToCudaCalls: use LLVM dialect globals	Alex Zinenko	2019-08-20	1	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \|	This conversion has been using a stack-allocated array of i8 to store the null-terminated kernel name in order to pass it to the CUDA wrappers expecting a C string because the LLVM dialect was missing support for globals. Now that the suport is introduced, use a global instead. Refactor global string construction from GenerateCubinAccessors into a common utility function living in the LLVM namespace. PiperOrigin-RevId: 264382489
*	LLVM dialect: prefix operations that correspond to intrinsics with "intr."	Alex Zinenko	2019-08-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	LLVM intrinsics have an open name space and their names can potentially overlap with names of LLVM instructions (LLVM intrinsics are functions, not instructions). In MLIR, LLVM intrinsics are modeled as operations, so it needs to make sure their names cannot clash with the instructions. Use the "intr." prefix for intrinsics in the LLVM dialect. PiperOrigin-RevId: 264372173
*	Fix parsing/printing of spv.globalVariable and spv._address_of	Mahesh Ravishankar	2019-08-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Change the prining/parsing of spv.globalVariable to print the type of the variable after the ':' to be consistent with MLIR convention. The spv._address_of should print the variable type after the ':'. It was mistakenly printing the address of the return value. Add a (missing) test that should have caught that. Also move spv.globalVariable and spv._address_of tests to structure-ops.mlir. PiperOrigin-RevId: 264204686
*	Add spirv::GlobalVariableOp that allows module level definition of variables	Mahesh Ravishankar	2019-08-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FuncOps in MLIR use explicit capture. So global variables defined in module scope need to have a symbol name and this should be used to refer to the variable within the function. This deviates from SPIR-V spec, which assigns an SSA value to variables at all scopes that can be used to refer to the variable, which requires SPIR-V functions to allow implicit capture. To handle this add a new op, spirv::GlobalVariableOp that can be used to define module scope variables. Since instructions need an SSA value, an new spirv::AddressOfOp is added to convert a symbol reference to an SSA value for use with other instructions. This also means the spirv::EntryPointOp instruction needs to change to allow initializers to be specified using symbol reference instead of SSA value The current spirv::VariableOp which returns an SSA value (as defined by SPIR-V spec) can still be used to define function-scope variables. PiperOrigin-RevId: 263951109
*	Extend vector.outerproduct with an optional 3rd argument	Nicolas Vasilache	2019-08-16	1	-26/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds an optional third argument to the vector.outerproduct instruction. When such a third argument is specified, it is added to the result of the outerproduct and is lowered to FMA intrinsic when the lowering supports it. In the future, we can add an attribute on the `vector.outerproduct` instruction to modify the operations for which to emit code (e.g. "+/", "max/+", "min/+", "log/exp" ...). This CL additionally performs minor cleanups in the vector lowering and adds tests to improve coverage. This has been independently verified to result in proper fma instructions for haswell as follows. Input: ``` func @outerproduct_add(%arg0: vector<17xf32>, %arg1: vector<8xf32>, %arg2: vector<17x8xf32>) -> vector<17x8xf32> { %2 = vector.outerproduct %arg0, %arg1, %arg2 : vector<17xf32>, vector<8xf32> return %2 : vector<17x8xf32> } } ``` Command: ``` mlir-opt vector-to-llvm.mlir -vector-lower-to-llvm-dialect --disable-pass-threading \| mlir-opt -lower-to-cfg -lower-to-llvm \| mlir-translate --mlir-to-llvmir \| opt -O3 \| llc -O3 -march=x86-64 -mcpu=haswell -mattr=fma,avx2 ``` Output: ``` outerproduct_add: # @outerproduct_add # %bb.0: ... vmovaps 112(%rbp), %ymm8 vbroadcastss %xmm0, %ymm0 ... vbroadcastss 64(%rbp), %ymm15 vfmadd213ps 144(%rbp), %ymm8, %ymm0 # ymm0 = (ymm8 ymm0) + mem ... vfmadd213ps 400(%rbp), %ymm8, %ymm9 # ymm9 = (ymm8 * ymm9) + mem ... ``` PiperOrigin-RevId: 263743359
*	GenerateCubinAccessors: use LLVM dialect constants	Alex Zinenko	2019-08-13	1	-26/+9
\| \| \| \| \| \| \| \| \| \| \| \|	The GenerateCubinAccessors was generating functions that fill dynamically-allocated memory with the binary constant of a CUBIN attached as a stirng attribute to the GPU kernel. This approach was taken to circumvent the missing support for global constants in the LLVM dialect (and MLIR in general). Global constants were recently added to the LLVM dialect. Change the GenerateCubinAccessors pass to emit a global constant array of characters and a function that returns a pointer to the first character in the array. PiperOrigin-RevId: 263092052
*	Add lowering of vector dialect to LLVM dialect.	Nicolas Vasilache	2019-08-12	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL is step 3/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools. This CL adds support for converting MLIR n-D vector types to (n-1)-D arrays of 1-D LLVM vectors and a conversion VectorToLLVM that lowers the `vector.extractelement` and `vector.outerproduct` instructions to the proper mix of `llvm.vectorshuffle`, `llvm.extractelement` and `llvm.mulf`. This has been independently verified to produce proper avx2 code. Input: ``` func @vec_1d(%arg0: vector<4xf32>, %arg1: vector<8xf32>) -> vector<8xf32> { %2 = vector.outerproduct %arg0, %arg1 : vector<4xf32>, vector<8xf32> %3 = vector.extractelement %2[0 : i32]: vector<4x8xf32> return %3 : vector<8xf32> } ``` Command: ``` mlir-opt vector-to-llvm.mlir -vector-lower-to-llvm-dialect --disable-pass-threading \| mlir-opt -lower-to-cfg -lower-to-llvm \| mlir-translate --mlir-to-llvmir \| opt -O3 \| llc -O3 -march=x86-64 -mcpu=haswell -mattr=fma,avx2 ``` Output: ``` vec_1d: # @vec_1d # %bb.0: vbroadcastss %xmm0, %ymm0 vmulps %ymm1, %ymm0, %ymm0 retq ``` PiperOrigin-RevId: 262895929
*	Initial implementation to translate kernel fn in GPU Dialect to SPIR-V Dialect	Mahesh Ravishankar	2019-07-30	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL adds an initial implementation for translation of kernel function in GPU Dialect (used with a gpu.launch_kernel) op to a spv.Module. The original function is translated into an entry function. Most of the heavy lifting is done by adding TypeConversion and other utility functions/classes that provide most of the functionality to translate from Standard Dialect to SPIR-V Dialect. These are intended to be reusable in implementation of different dialect conversion pipelines. Note : Some of the files for have been renamed to be consistent with the norm used by the other Conversion frameworks. PiperOrigin-RevId: 260759165
*	Replace linalg.for by loop.for	Nicolas Vasilache	2019-07-16	1	-2/+2
\| \| \| \| \| \| \|	With the introduction of the Loop dialect, uses of the `linalg.for` operation can now be subsumed 1-to-1 by `loop.for`. This CL performs the replacement and tests are updated accordingly. PiperOrigin-RevId: 258322565
*	Extract std.for std.if and std.terminator in their own dialect	Nicolas Vasilache	2019-07-16	1	-10/+10
\| \| \| \| \| \| \|	These ops should not belong to the std dialect. This CL extracts them in their own dialect and updates the corresponding conversions and tests. PiperOrigin-RevId: 258123853
*	Lower affine control flow to std control flow to LLVM dialect	Nicolas Vasilache	2019-07-12	1	-0/+149
\| \| \| \| \| \| \| \| \| \| \| \| \|	This CL splits the lowering of affine to LLVM into 2 parts: 1. affine -> std 2. std -> LLVM The conversions mostly consists of splitting concerns between the affine and non-affine worlds from existing conversions. Short-circuiting of affine `if` conditions was never tested or exercised and is removed in the process, it can be reintroduced later if needed. LoopParametricTiling.cpp is updated to reflect the newly added ForOp::build. PiperOrigin-RevId: 257794436
*	Standardize the value numbering in the AsmPrinter.	River Riddle	2019-07-09	3	-61/+61
\| \| \| \| \| \|	Change the AsmPrinter to number values breadth-first so that values in adjacent regions can have the same name. This allows for ModuleOp to contain operations that produce results. This also standardizes the special name of region entry arguments to "arg[0-9+]" now that Functions are also operations. PiperOrigin-RevId: 257225069
*	Extend AffineToGPU to support Linalg loops	Alex Zinenko	2019-07-09	3	-24/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extend the utility that converts affine loop nests to support other types of loops by abstracting away common behavior through templates. This also slightly simplifies the existing Affine to GPU conversion by always passing in the loop step as an additional kernel argument even though it is a known constant. If it is used, it will be propagated into the loop body by the existing canonicalization pattern and can be further constant-folded, otherwise it will be dropped by canonicalization. This prepares for the common loop abstraction that will be used for converting to GPU kernels, which is conceptually close to Linalg loops, while maintaining the existing conversion operational. PiperOrigin-RevId: 257172216
*	Add an mlir-cuda-runner tool.	Stephan Herhut	2019-07-04	1	-3/+3
\| \| \| \| \| \| \| \|	This tool allows to execute MLIR IR snippets written in the GPU dialect on a CUDA capable GPU. For this to work, a working CUDA install is required and the build has to be configured with MLIR_CUDA_RUNNER_ENABLED set to 1. PiperOrigin-RevId: 256551415
*	Add a pass that inserts getters for all cubins found via nvvm.cubin	Stephan Herhut	2019-06-26	1	-0/+31
\| \| \| \| \| \| \| \| \|	annotations. Getters are required as there are currently no global constants in MLIR and this is an easy way to unblock CUDA execution while waiting for those. PiperOrigin-RevId: 255169002
*	Make GPU to CUDA transformations independent of CUDA runtime.	Stephan Herhut	2019-06-26	3	-0/+35
\| \| \| \| \| \| \| \| \| \| \|	The actual transformation from PTX source to a CUDA binary is now factored out, enabling compiling and testing the transformations independently of a CUDA runtime. MLIR has still to be built with NVPTX target support for the conversions to be built and tested. PiperOrigin-RevId: 255167139
*	Change the attribute dictionary syntax to separate name and value with '='.	River Riddle	2019-06-25	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \|	The current syntax separates the name and value with ':', but ':' is already overloaded by several other things(e.g. trailing types). This makes the syntax difficult to parse in some situtations: Old: "foo: 10 : i32" New: "foo = 10 : i32" PiperOrigin-RevId: 255097928
*	GPUtoNVVM: adjust integer bitwidth when lowering special register ops	Alex Zinenko	2019-06-25	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	GPU dialect operations (launch and launch_func) use `index` type for thread and block index values inside the kernel, for compatibility with affine loops. NVVM dialect operations, following the NVVM intrinsics, use `!llvm.i32` type, which does not necessarily have the same bit width as the lowered `index` type. Optionally sign-extend (indices are signed) or truncate the result of the NVVM dialect operation to the bit width of the lowered `index` type before passing it to other operations. This behavior is consistent with `std.index_cast`. We cannot use the latter since we are targeting LLVM dialect types directly, rather than standard integer types. PiperOrigin-RevId: 254980868
*	Add lowering pass from GPU dialect operations to LLVM/NVVM intrinsics.	Stephan Herhut	2019-06-19	1	-0/+35
\| \| \| \|	PiperOrigin-RevId: 253551452
*	Convert a nest affine loops to a GPU kernel	Alex Zinenko	2019-06-19	2	-0/+112
	This converts entire loops into threads/blocks. No check on the size of the block or grid, or on the validity of parallelization is performed, it is under the responsibility of the caller to strip-mine the loops and to perform the dependence analysis before calling the conversion. PiperOrigin-RevId: 253189268