Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Support lowering of imperfectly nested loops into GPU dialect. | Mahesh Ravishankar | 2019-11-01 | 1 | -6/+14 |
| | | | | | | | | | | | | | | | | | | | The current lowering of loops to GPU only supports lowering of loop nests where the loops mapped to workgroups and workitems are perfectly nested. Here a new lowering is added to handle lowering of imperfectly nested loop body with the following properties 1) The loops partitioned to workgroups are perfectly nested. 2) The loop body of the inner most loop partitioned to workgroups can contain one or more loop nests that are to be partitioned across workitems. Each individual loops nests partitioned to workitems should also be perfectly nested. 3) The number of workgroups and workitems are not deduced from the loop bounds but are passed in by the caller of the lowering as values. 4) For statements within the perfectly nested loop nest partitioned across workgroups that are not loops, it is valid to have all threads execute that statement. This is NOT verified. PiperOrigin-RevId: 277958868 | ||||
* | Utility function to map a loop on a parametric grid of virtual processors | Nicolas Vasilache | 2019-07-19 | 1 | -0/+39 |
This CL introduces a simple loop utility function which rewrites the bounds and step of a loop so as to become mappable on a regular grid of processors whose identifiers are given by SSA values. A corresponding unit test is added. For example, using CUDA terminology, and assuming a 2-d grid with processorIds = [blockIdx.x, threadIdx.x] and numProcessors = [gridDim.x, blockDim.x], the loop: ``` loop.for %i = %lb to %ub step %step { ... } ``` is rewritten into a version resembling the following pseudo-IR: ``` loop.for %i = %lb + threadIdx.x + blockIdx.x * blockDim.x to %ub step %gridDim.x * blockDim.x { ... } ``` PiperOrigin-RevId: 258945942 |