summaryrefslogtreecommitdiffstats
path: root/mlir/test/Conversion
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix ViewOp to have at most one offset operandAlex Zinenko2019-12-031-5/+5
| | | | | | | | | | | | | | | | | | | | | As described in the documentation, ViewOp is expected to take an optional dynamic offset followed by a list of dynamic sizes. However, the ViewOp parser did not include a check for the offset being a single value and accepeted a list of values instead. Furthermore, several tests have been exercising the wrong syntax of a ViewOp, passing multiple values to the dyanmic stride list, which was not caught by the parser. The trailing values could have been erronously interpreted as dynamic sizes. This is likely due to resyntaxing of the ViewOp, with the previous syntax taking the list of sizes before the offset. Update the tests to use the syntax with the offset preceding the sizes. Worse, the conversion of ViewOp to the LLVM dialect assumed the wrong order of operands with offset in the trailing position, and erronously relied on the permissive parsing that interpreted trailing dynamic offset values as leading dynamic sizes. Fix the lowering to use the correct order of operands. PiperOrigin-RevId: 283532506
* Extend conversion of SubViewOp to llvm to also support cases where size and ↵Stephan Herhut2019-12-031-0/+54
| | | | | | | | | | | | stride are constant (i.e., there are no size and stride operands). We recently added canonicalization that rewrites constant size and stride operands to SubViewOp into static information in the type, so these patterns now occur during code generation. PiperOrigin-RevId: 283524688
* Introduce Linkage attribute to the LLVM dialectAlex Zinenko2019-12-021-2/+2
| | | | | | | | | | | LLVM IR supports linkage on global objects such as global variables and functions. Introduce the Linkage attribute into the LLVM dialect, backed by an integer storage. Use this attribute on LLVM::GlobalOp and make it mandatory. Implement parsing/printing of the attribute and conversion to LLVM IR. See tensorflow/mlir#277. PiperOrigin-RevId: 283309328
* Misc changes to lowering to SPIR-V.Mahesh Ravishankar2019-11-263-12/+15
| | | | | | | | | | | | | | | These changes to SPIR-V lowering while adding support for lowering SUbViewOp, but are not directly related. - Change the lowering of MemRefType to !spv.ptr<!spv.struct<!spv.array<...>[offset]>, ..> This is consistent with the Vulkan spec. - To enable testing a simple pattern of lowering functions is added to ConvertStandardToSPIRVPass. This is just used to convert the type of the arguments of the function. The added function lowering itself is not meant to be the way functions are eventually lowered into SPIR-V dialect. PiperOrigin-RevId: 282589644
* Use vector.InsertStridedSlice in Vector -> Vector unrollingNicolas Vasilache2019-11-251-8/+15
| | | | | | | | This CL uses the recently added op to finish the implementation of Vector -> Vector unrolling by replacing the "fake join op" by a series of InsertStridedSliceOp. Test is updated accordingly PiperOrigin-RevId: 282451126
* Allow LLVM::ExtractElementOp to have non-i32 indices.MLIR Team2019-11-252-9/+9
| | | | | | Also change the text format a bit, so that indices are braced by squares. PiperOrigin-RevId: 282437095
* Introduce attributes that specify the final ABI for a spirv::ModuleOp.Mahesh Ravishankar2019-11-252-46/+12
| | | | | | | | | | | | | | | | | | | | | | | | To simplify the lowering into SPIR-V, while still respecting the ABI requirements of SPIR-V/Vulkan, split the process into two 1) While lowering a function to SPIR-V (when the function is an entry point function), allow specifying attributes on arguments and function itself that describe the ABI of the function. 2) Add a pass that materializes the ABI described in the function. Two attributes are needed. 1) Attribute on arguments of the entry point function that describe the descriptor_set, binding, storage class, etc, of the spv.globalVariable this argument will be replaced by 2) Attribute on function that specifies workgroup size, etc. (for now only workgroup size). Add the pass -spirv-lower-abi-attrs to materialize the ABI described by the attributes. This change makes the SPIRVBasicTypeConverter class unnecessary and is removed, further simplifying the SPIR-V lowering path. PiperOrigin-RevId: 282387587
* [spirv] NFC: rename test files and sort tests insideLei Zhang2019-11-232-33/+49
| | | | PiperOrigin-RevId: 282132339
* Changes to SubViewOp to make it more amenable to canonicalization.Mahesh Ravishankar2019-11-201-1/+1
| | | | | | | | | | | | | | The current SubViewOp specification allows for either all offsets, shape and stride to be dynamic or all of them to be static. There are opportunities for more fine-grained canonicalization based on which of these are static. For example, if the sizes are static, the result memref is of static shape. The specification of SubViewOp is modified to allow on or more of offsets, shapes and strides to be statically specified. The verification is updated to ensure that the result type of the subview op is consistent with which of these are static and which are dynamic. PiperOrigin-RevId: 281560457
* Implement unrolling of vector ops to finer-grained vector ops as a pattern.Nicolas Vasilache2019-11-201-0/+42
| | | | | | | | | This CL uses the pattern rewrite infrastructure to implement a simple VectorOps -> VectorOps legalization strategy to unroll coarse-grained vector operations into finer grained ones. The transformation is written using local pattern rewrites to allow composition with other rewrites. It proceeds by iteratively introducing fake cast ops and cleaning canonicalizing or lowering them away where appropriate. This is an example of writing transformations as compositions of local pattern rewrites that should enable us to make them significantly more declarative. PiperOrigin-RevId: 281555100
* Change conversion CLI flag from -lower-to-llvm to -convert-std-to-llvmAlex Zinenko2019-11-195-5/+5
| | | | | | | | | | | | The command-line flag name `lower-to-llvm` for the pass performing dialect conversion from the Standard dialect to the LLVM dialect is misleading and inconsistent with most of the conversion passses. It leads the user to believe that there are no restrictions on what can be converted, while in fact only a subset of the Standard dialect can be converted (with operations from other dialects converted by separate passes). Use `convert-std-to-llvm` that better reflects what the pass does and is consistent with most other conversions. PiperOrigin-RevId: 281238797
* Rename CLI flags -lower-gpu-ops-to-*-ops to -convert-gpu-to-*Alex Zinenko2019-11-182-2/+2
| | | | | | | This makes the flags consistent with the naming scheme used elsewhere in the codebase for dialect conversions. PiperOrigin-RevId: 281027517
* Refactor the LowerVectorTransfers pass to use the RewritePattern infra - NFCNicolas Vasilache2019-11-142-0/+213
| | | | | | This is step 1/n in refactoring infrastructure along the Vector dialect to make it ready for retargetability and composable progressive lowering. PiperOrigin-RevId: 280529784
* Use MemRefDescriptor in Vector-to-LLVM convresionAlex Zinenko2019-11-141-6/+5
| | | | | | | | | | Following up on the consolidation of MemRef descriptor conversion, update Vector-to-LLVM conversion to use the helper class that abstracts away the implementation details of the MemRef descriptor. This also makes the types of the attributes in emitted llvm.insert/extractelement operations consistently i64 instead of a mix of index and i64. PiperOrigin-RevId: 280441451
* Move VectorOps to Tablegen - (almost) NFCNicolas Vasilache2019-11-141-11/+7
| | | | | | | | | | | | | | | | This CL moves VectorOps to Tablegen and cleans up the implementation. This is almost NFC but 2 changes occur: 1. an interface change occurs in the padding value specification in vector_transfer_read: the value becomes non-optional. As a shortcut we currently use %f0 for all paddings. This should become an OpInterface for vectorization in the future. 2. the return type of vector.type_cast is trivial and simplified to `memref<vector<...>>` Relevant roundtrip and invalid tests that used to sit in core are moved to the vector dialect. The op documentation is moved to the .td file. PiperOrigin-RevId: 280430869
* Deprecate linalg.subview in favor of std.subviewNicolas Vasilache2019-11-131-17/+9
| | | | | | | | | | | This CL uses the now standard std.subview in linalg. Two shortcuts are currently taken to allow this port: 1. the type resulting from a view is currently degraded to fully dynamic to pass the SubViewOp verifier. 2. indexing into SubViewOp may access out of bounds since lowering to LLVM does not currently enforce it by construction. These will be fixed in subsequent commits after discussions. PiperOrigin-RevId: 280250129
* Add operations needed to support lowering of AffineExpr to SPIR-V.Mahesh Ravishankar2019-11-121-1/+44
| | | | | | | | Lowering of CmpIOp, DivISOp, RemISOp, SubIOp and SelectOp to SPIR-V dialect enables the lowering of operations generated by AffineExpr -> StandardOps conversion into the SPIR-V dialect. PiperOrigin-RevId: 280039204
* Add Conversion to lower loop::ForOp to spirv::LoopOp.Mahesh Ravishankar2019-11-121-0/+39
| | | | | | | | | loop::ForOp can be lowered to the structured control flow represented by spirv::LoopOp by making the continue block of the spirv::LoopOp the loop latch and the merge block the exit block. The resulting spirv::LoopOp has a single back edge from the continue to header block, and a single exit from header to merge. PiperOrigin-RevId: 280015614
* Add LLVM lowering of std.subviewNicolas Vasilache2019-11-121-0/+26
| | | | | | A followup CL will replace usage of linalg.subview by std.subview. PiperOrigin-RevId: 279961981
* Add support for alignment attribute in std.alloc.Nicolas Vasilache2019-11-125-174/+208
| | | | | | | | | | | | This CL adds an extra pointer to the memref descriptor to allow specifying alignment. In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing. After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers. This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ). PiperOrigin-RevId: 279959463
* Drop spurious test fileNicolas Vasilache2019-11-061-25/+0
| | | | PiperOrigin-RevId: 278959717
* Add lowering of std.view to LLVMNicolas Vasilache2019-11-062-0/+131
| | | | | | | | | | | | | This CL ports the lowering of linalg.view to the newly introduced std.view. Differences in implementation relate to std.view having slightly different semantics: 1. a static or dynamic offset can be specified. 2. the size of the (contiguous) shape is passed instead of a range. 3. static size and stride information is extracted from the memref type rather than the range. Besides these differences, lowering behaves the same. A future CL will update Linalg to use this unified infrastructure. PiperOrigin-RevId: 278948853
* Support lowering of imperfectly nested loops into GPU dialect.Mahesh Ravishankar2019-11-015-0/+326
| | | | | | | | | | | | | | | | | | | The current lowering of loops to GPU only supports lowering of loop nests where the loops mapped to workgroups and workitems are perfectly nested. Here a new lowering is added to handle lowering of imperfectly nested loop body with the following properties 1) The loops partitioned to workgroups are perfectly nested. 2) The loop body of the inner most loop partitioned to workgroups can contain one or more loop nests that are to be partitioned across workitems. Each individual loops nests partitioned to workitems should also be perfectly nested. 3) The number of workgroups and workitems are not deduced from the loop bounds but are passed in by the caller of the lowering as values. 4) For statements within the perfectly nested loop nest partitioned across workgroups that are not loops, it is valid to have all threads execute that statement. This is NOT verified. PiperOrigin-RevId: 277958868
* Add a test for lowering GPU ops that cover cases where the symbol table ↵Mehdi Amini2019-10-312-0/+45
| | | | | | isn't held by a ModuleOp (NFC) PiperOrigin-RevId: 277752004
* Fix include guards and add tests for OpToFuncCallLowering.Alexander Belyaev2019-10-262-1/+35
| | | | PiperOrigin-RevId: 276859463
* Use LLVM_Type instead of AnyType in the definition of LLVM_CallOpAlex Zinenko2019-10-211-2/+2
| | | | | | | | | | The type constraint had to be relaxed due to the order of lowering passes in the examples, that since has been fixed. The relaxed version was still used by the CUDA lowering for launch sizes of `index` type. This is not necessary since the GPU dialect does not restrict the type of the launch size operands. Use an LLVM type instead and restore the check in the LLVM_CallOp definition. PiperOrigin-RevId: 275920109
* Implement lowering of VectorTypeCastOp to LLVMNicolas Vasilache2019-10-181-2/+19
| | | | | | | | | | A VectorTypeCastOp can only be used to lower between statically sized contiguous memrefs of scalar and matching vector type. The sizes and strides are thus fully static and easy to determine. A relevant test is added. This is a step towards solving tensorflow/mlir#189. PiperOrigin-RevId: 275538981
* Add gpu.barrier op to synchronize invocations of a local workgroup.Christian Sigg2019-10-181-0/+11
| | | | | | | | Adding gen table for rewrite patterns from GPU to NVVM dialect. Copy missing op documentation from GPUOps.td to GPU.md. PiperOrigin-RevId: 275419588
* Makes spv.module generated by GPU->SPIRV conversion spec compliantMahesh Ravishankar2019-10-162-9/+9
| | | | | | | | | | | | | | | | Makes the spv.module generated by the GPU to SPIR-V conversion SPIR-V spec compliant (validated using spirv-val from Vulkan tools). 1) Separate out the VulkanLayoutUtils from DecorateSPIRVCompositeTypeLayoutPass to make it reusable within the Type converter in SPIR-V lowering infrastructure. This is used to compute the layout of the !spv.struct used in global variable type description. 2) Set the capabilities of the spv.module to Shader (needed for use of Logical Memory Model, and the extensions to SPV_KHR_storage_buffer_storage_class for use of Storage Buffer) PiperOrigin-RevId: 275081486
* Support custom accumulator provided as region to gpu.all_reduce.Christian Sigg2019-10-161-3/+33
| | | | | | | | | | In addition to specifying the type of accumulation through the 'op' attribute, the accumulation can now also be specified as arbitrary code region. Adds a gpu.yield op to specify the result of the accumulation. Also support more types (integers) and accumulations (mul). PiperOrigin-RevId: 275065447
* Emit LLVM IR equivalent of sizeof when lowering alloc operationsAlex Zinenko2019-10-111-13/+25
| | | | | | | | | | | | | | | | | Originally, the lowering of `alloc` operations has been computing the number of bytes to allocate when lowering based on the properties of MLIR type. This does not take into account type legalization that happens when compiling LLVM IR down to target assembly. This legalization can widen the type, potentially leading to out-of-bounds accesses to `alloc`ed data due to mismatches between address computation that takes the widening into account and allocation that does not. Use the LLVM IR's equivalent of `sizeof` to compute the number of bytes to be allocated: %0 = getelementptr %type* null, %indexType 0 %1 = ptrtoint %type* %0 to %indexType adapted from http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt PiperOrigin-RevId: 274159900
* Add unary ops and ExpOp to Standard Dialect.Alexander Belyaev2019-10-111-0/+2
| | | | PiperOrigin-RevId: 274152154
* LLVM conversion: harden a test to check for LLVM funcs rather than any funcsAlex Zinenko2019-10-111-10/+10
| | | | | | | | | This test was not updated in the original commit that switched to using LLVM functions since it wasn't broken by that change. FileCheck was able to match the `func` part of `llvm.func` to the expected pattern and continue as usual. Make sure the `llvm.` dialect prefix is included in the expected output. PiperOrigin-RevId: 274127281
* Standard-to-LLVM conversion: check that operands have LLVM typesAlex Zinenko2019-10-101-1/+12
| | | | | | | | | | | | In Standard to LLVM dialect conversion, the binary op conversion pattern implicitly assumed some operands were of LLVM IR dialect type. This is not necessarily true, for example if the Ops that produce those operands did not match the existing convresion patterns. Check if all operands are of LLVM IR dialect type and if not, fail to patch the binary op pattern. Closes tensorflow/mlir#168 PiperOrigin-RevId: 274063207
* Add lowering of constant ops to SPIR-V.Mahesh Ravishankar2019-10-101-0/+60
| | | | | | | | | | | The lowering is specified as a pattern and is done only if the result is a SPIR-V scalar type or vector type. Handling ConstantOp with index return type needs special handling since SPIR-V dialect does not have index types. Based on the bitwidth of the attribute value, either i32 or i64 is chosen. Other constant lowerings are left as a TODO. PiperOrigin-RevId: 274056805
* Use llvm.func to define functions with wrapped LLVM IR function typeAlex Zinenko2019-10-102-7/+8
| | | | | | | | | | | | | | This function-like operation allows one to define functions that have wrapped LLVM IR function type, in particular variadic functions. The operation was added in parallel to the existing lowering flow, this commit only switches the flow to use it. Using a custom function type makes the LLVM IR dialect type system more consistent and avoids complex conversion rules for functions that previously had to use the built-in function type instead of a wrapped LLVM IR dialect type and perform conversions during the analysis. PiperOrigin-RevId: 273910855
* Make SPIR-V lowering infrastructure follow Vulkan SPIR-V validation.Mahesh Ravishankar2019-10-092-13/+59
| | | | | | | | | | | | | | | | | | | | | | The lowering infrastructure needs to be enhanced to lower into a spv.Module that is consistent with the SPIR-V spec. The following changes are needed 1) The Vulkan/SPIR-V validation rules dictates entry functions to have signature of void(void). This requires changes to the function signature conversion infrastructure within the dialect conversion framework. When an argument is dropped from the original function signature, a function can be specified that when invoked will return the value to use as a replacement for the argument from the original function. 2) Some changes to the type converter to make the converted type consistent with the Vulkan/SPIR-V validation rules, a) Add support for converting dynamically shaped tensors to spv.rtarray type. b) Make the global variable of type !spv.ptr<!spv.struct<...>> 3) Generate the entry point operation for the kernel functions and automatically compute all the interface variables needed PiperOrigin-RevId: 273784229
* GPUToCUDA: attach CUBIN to the nested module rather than to the functionAlex Zinenko2019-10-082-5/+4
| | | | | | | | | | | Originally, we were attaching attributes containing CUBIN blobs to the kernel function called by `gpu.launch_func`. This kernel is now contained in a nested module that is used as a compilation unit. Attach compiled CUBIN blobs to the module rather than to the function since we were compiling the module. This also avoids duplication of the attribute on multiple kernels within the same module. PiperOrigin-RevId: 273497303
* GPUToCUDA: emit addressof directly instead of wrapping it into a getter functionAlex Zinenko2019-10-081-12/+8
| | | | | | | | | | | | | Originally, the CUBIN getter function was introduced as a mechanism to circumvent the absence of globals in the LLVM dialect. It would allocate memory and populate it with the CUBIN data. LLVM dialect now supports globals and they are already used to store CUBIN data, making the getter function a trivial address computation of a global. Emit the address computation directly at the place of `gpu.launch_func` instead of putting it in a function and calling it. This simplifies the conversion flow and prepares it for using the DialectConversion infrastructure. PiperOrigin-RevId: 273496221
* Fuse GenerateCubinAccessors pass into LaunchFunctToCudaAlex Zinenko2019-10-082-24/+10
| | | | | | | | | | | Now that the accessor function is a trivial getter of the global variable, it makes less sense to have the getter generation as a separate pass. Move the getter generation into the lowering of `gpu.launch_func` to CUDA calls. This change is mostly code motion, but the process can be simplified further by generating the addressof inplace instead of using a call. This is will be done in a follow-up. PiperOrigin-RevId: 273492517
* Use named modules for gpu.launch_funcAlex Zinenko2019-10-085-183/+224
| | | | | | | | | | | | | | | | | | The kernel function called by gpu.launch_func is now placed into an isolated nested module during the outlining stage to simplify separate compilation. Until recently, modules did not have names and could not be referenced. This limitation was circumvented by introducing a stub kernel at the same name at the same nesting level as the module containing the actual kernel. This relation is only effective in one direction: from actual kernel function to its launch_func "caller". Leverage the recently introduced symbol name attributes on modules to refer to a specific nested module from `gpu.launch_func`. This removes the implicit connection between the identically named stub and kernel functions. It also enables support for `gpu.launch_func`s to call different kernels located in the same module. PiperOrigin-RevId: 273491891
* Add fpext and fptrunc to the Standard dialect and includes conversion to LLVMMLIR Team2019-10-031-0/+24
| | | | PiperOrigin-RevId: 272768027
* NFC: rename Conversion/ControlFlowToCFG to Conversion/LoopToStandardAlex Zinenko2019-10-031-1/+1
| | | | | | | | This makes the name of the conversion pass more consistent with the naming scheme, since it actually converts from the Loop dialect to the Standard dialect rather than working with arbitrary control flow operations. PiperOrigin-RevId: 272612112
* Extract MemRefType::getStridesAndOffset as a free function and fix dynamic ↵Nicolas Vasilache2019-10-021-0/+6
| | | | | | | | offset determination. This also adds coverage with a missing test, which uncovered a bug in the conditional for testing whether an offset is dynamic or not. PiperOrigin-RevId: 272505798
* [ROCm] Adding pass to lower GPU Dialect to ROCDL Dialect.Deven Desai2019-10-021-0/+37
| | | | | | | | | This is a follow-up to the PRtensorflow/mlir#146 which introduced the ROCDL Dialect. This PR introduces a pass to lower GPU Dialect to the ROCDL Dialect. As with the previous PR, this one builds on the work done by @whchung, and addresses most of the review comments in the original PR. Closes tensorflow/mlir#154 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/154 from deven-amd:deven-lower-gpu-to-rocdl 809893e08236da5ab6a38e3459692fa04247773d PiperOrigin-RevId: 272390729
* Fix and simplify CallOp/CallIndirectOp to LLVM::CallOp conversionAlex Zinenko2019-10-011-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | A recent ABI compatibility change affected the conversion from standard CallOp/CallIndirectOp to LLVM::CallOp by changing its signature. In order to analyze the signature, the code was looking up the callee symbol in the module. This is incorrect since, during the conversion, the module may contain both the original and the converted function op that have the same symbol name. There is no strict guarantee on which of the two symbols will be found by the lookup. The conversion was not failing because the type legalizer converts the LLVM types to themselves making the original and the converted function signatures ultimately produce the same type. Instead of looking up the function signature to get the list of result types, use the types of the CallOp/CallIndirectOp results which must match those of the function in valid IR. These types are guaranteed to be the original, unconverted types when converting the operation. Furthermore, this avoids the need to perform a lookup of a symbol name in the module which may be expensive. Finally, propagate attributes as-is from the original op to the converted op since they share the attribute name for the callee of direct calls and the rest of attributes are not affected by the conversion. This removes the need for additional contorsions between direct and indirect calls to extract the name of the optional callee attribute only to insert it back. This also prevents the conversion from unintentionally dropping the other attributes of the op. PiperOrigin-RevId: 272218871
* Normalize MemRefType lowering to LLVM as strided MemRef descriptorNicolas Vasilache2019-09-304-146/+199
| | | | | | | | | | | | | | | | | | | | | This CL finishes the implementation of the lowering part of the [strided memref RFC](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio). Strided memrefs correspond conceptually to the following templated C++ struct: ``` template <typename Elem, size_t Rank> struct { Elem *ptr; int64_t offset; int64_t sizes[Rank]; int64_t strides[Rank]; }; ``` The linearization procedure for address calculation for strided memrefs is the same as for linalg views: `base_offset + SUM_i index_i * stride_i`. The following CL will unify Linalg and Standard by removing !linalg.view in favor of strided memrefs. PiperOrigin-RevId: 272033399
* Remove spurious debug spew in testsNicolas Vasilache2019-09-273-3/+0
| | | | PiperOrigin-RevId: 271624731
* Promote MemRefDescriptor to a pointer to struct when passing function ↵Nicolas Vasilache2019-09-274-88/+130
| | | | | | | | | | | | | boundaries in LLVMLowering. The strided MemRef RFC discusses a normalized descriptor and interaction with library calls (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio). Lowering of nested LLVM structs as value types does not play nicely with externally compiled C/C++ functions due to ABI issues. Solving the ABI problem generally is a very complex problem and most likely involves taking a dependence on clang that we do not want atm. A simple workaround is to pass pointers to memref descriptors at function boundaries, which this CL implement. PiperOrigin-RevId: 271591708
* Add AllReduceOp to GPU dialect with lowering to NVVM.Christian Sigg2019-09-261-0/+7
| | | | | | | | The reduction operation is currently fixed to "add", and the scope is fixed to "workgroup". The implementation is currently limited to sizes that are multiple 32 (warp size) and no larger than 1024. PiperOrigin-RevId: 271290265
OpenPOWER on IntegriCloud