summaryrefslogtreecommitdiffstats
path: root/llvm/docs/Coroutines.rst
diff options
context:
space:
mode:
authorGor Nishanov <GorNishanov@gmail.com>2016-08-10 16:40:39 +0000
committerGor Nishanov <GorNishanov@gmail.com>2016-08-10 16:40:39 +0000
commitb2a9c0252179d9334967266182bad77c1d3e6579 (patch)
treebdb018dd1e57587914081302fa50b38020038a64 /llvm/docs/Coroutines.rst
parent17586582e7cd22cbfe343dc7321c746046912423 (diff)
downloadbcm5719-llvm-b2a9c0252179d9334967266182bad77c1d3e6579.tar.gz
bcm5719-llvm-b2a9c0252179d9334967266182bad77c1d3e6579.zip
[Coroutines] Part 6: Elide dynamic allocation of a coroutine frame when possible
Summary: A particular coroutine usage pattern, where a coroutine is created, manipulated and destroyed by the same calling function, is common for coroutines implementing RAII idiom and is suitable for allocation elision optimization which avoid dynamic allocation by storing the coroutine frame as a static `alloca` in its caller. coro.free and coro.alloc intrinsics are used to indicate which code needs to be suppressed when dynamic allocation elision happens: ``` entry: %elide = call i8* @llvm.coro.alloc() %need.dyn.alloc = icmp ne i8* %elide, null br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc dyn.alloc: %alloc = call i8* @CustomAlloc(i32 4) br label %coro.begin coro.begin: %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ] %hdl = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*)) ``` and ``` %mem = call i8* @llvm.coro.free(i8* %hdl) %need.dyn.free = icmp ne i8* %mem, null br i1 %need.dyn.free, label %dyn.free, label %if.end dyn.free: call void @CustomFree(i8* %mem) br label %if.end if.end: ... ``` If heap allocation elision is performed, we replace coro.alloc with a static alloca on the caller frame and coro.free with null constant. Also, we need to make sure that if there are any tail calls referencing the coroutine frame, we need to remote tail call attribute, since now coroutine frame lives on the stack. Documentation and overview is here: http://llvm.org/docs/Coroutines.html. Upstreaming sequence (rough plan) 1.Add documentation. (https://reviews.llvm.org/D22603) 2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659) 3.Add empty coroutine passes. (https://reviews.llvm.org/D22847) 4.Add coroutine devirtualization + tests. ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998) c) Do devirtualization (https://reviews.llvm.org/D23229) 5.Add CGSCC restart trigger + tests. (https://reviews.llvm.org/D23234) 6.Add coroutine heap elision + tests. <= we are here 7.Add the rest of the logic (split into more patches) Reviewers: mehdi_amini, majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D23245 llvm-svn: 278242
Diffstat (limited to 'llvm/docs/Coroutines.rst')
-rw-r--r--llvm/docs/Coroutines.rst73
1 files changed, 42 insertions, 31 deletions
diff --git a/llvm/docs/Coroutines.rst b/llvm/docs/Coroutines.rst
index a54c18a3537..7a12641babf 100644
--- a/llvm/docs/Coroutines.rst
+++ b/llvm/docs/Coroutines.rst
@@ -95,7 +95,8 @@ The LLVM IR for this coroutine looks like this:
entry:
%size = call i32 @llvm.coro.size.i32()
%alloc = call i8* @malloc(i32 %size)
- %hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* null, i8* null)
+ %beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* null, i8* null)
+ %hdl = call noalias i8* @llvm.coro.frame(token %beg)
br label %loop
loop:
%n.val = phi i32 [ %n, %entry ], [ %inc, %loop ]
@@ -115,9 +116,10 @@ The LLVM IR for this coroutine looks like this:
The `entry` block establishes the coroutine frame. The `coro.size`_ intrinsic is
lowered to a constant representing the size required for the coroutine frame.
-The `coro.begin`_ intrinsic initializes the coroutine frame and returns the
-coroutine handle. The first parameter of `coro.begin` is given a block of memory
-to be used if the coroutine frame needs to be allocated dynamically.
+The `coro.begin`_ intrinsic initializes the coroutine frame and returns the a
+token that is used to obtain the coroutine handle via `coro.frame` intrinsic.
+The first parameter of `coro.begin` is given a block of memory to be used if the
+coroutine frame needs to be allocated dynamically.
The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic,
given the coroutine handle, returns a pointer of the memory block to be freed or
@@ -160,12 +162,13 @@ After resume and destroy parts are outlined, function `f` will contain only the
code responsible for creation and initialization of the coroutine frame and
execution of the coroutine until a suspend point is reached:
-.. code-block:: llvm
+.. code-block:: none
define i8* @f(i32 %n) {
entry:
%alloc = call noalias i8* @malloc(i32 24)
- %0 = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* null, i8* null)
+ %beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* null, i8* null)
+ %0 = call i8* @llvm.coro.frame(token %beg)
%frame = bitcast i8* %0 to %f.frame*
%1 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 0
store void (%f.frame*)* @f.resume, void (%f.frame*)** %1
@@ -219,7 +222,7 @@ In the entry block, we will call `coro.alloc`_ intrinsic that will return `null`
when dynamic allocation is required, and an address of an alloca on the caller's
frame where coroutine frame can be stored if dynamic allocation is elided.
-.. code-block:: llvm
+.. code-block:: none
entry:
%elide = call i8* @llvm.coro.alloc()
@@ -231,7 +234,7 @@ frame where coroutine frame can be stored if dynamic allocation is elided.
br label %coro.begin
coro.begin:
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
- %hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* null)
+ %beg = call token @llvm.coro.begin(i8* %phi, i8* null, i32 0, i8* null, i8* null)
In the cleanup block, we will make freeing the coroutine frame conditional on
`coro.free`_ intrinsic. If allocation is elided, `coro.free`_ returns `null`
@@ -421,7 +424,8 @@ store the current value produced by a coroutine.
br label %coro.begin
coro.begin:
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
- %hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* %pv, i8* null)
+ %beg = call token @llvm.coro.begin(i8* %phi, i8* %elide, i32 0, i8* %pv, i8* null)
+ %hdl = call i8* @llvm.coro.frame(token %beg)
br label %loop
loop:
%n.val = phi i32 [ %n, %coro.begin ], [ %inc, %loop ]
@@ -687,15 +691,16 @@ a coroutine user are responsible to makes sure there is no data races.
Example:
""""""""
-.. code-block:: llvm
+.. code-block:: text
define i8* @f(i32 %n) {
entry:
%promise = alloca i32
%pv = bitcast i32* %promise to i8*
...
- ; the third argument to coro.begin points to the coroutine promise.
- %hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* %pv, i8* null)
+ ; the fourth argument to coro.begin points to the coroutine promise.
+ %beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* %pv, i8* null)
+ %hdl = call noalias i8* @llvm.coro.frame(token %beg)
...
store i32 42, i32* %promise ; store something into the promise
...
@@ -752,12 +757,14 @@ the coroutine frame.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
::
- declare i8* @llvm.coro.begin(i8* <mem>, i32 <align>, i8* <promise>, i8* <fnaddr>)
+ declare i8* @llvm.coro.begin(i8* <mem>, i8* <elide>, i32 <align>, i8* <promise>, i8* <fnaddr>)
Overview:
"""""""""
-The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame.
+The '``llvm.coro.begin``' intrinsic captures coroutine initialization
+information and returns a token that can be used by `coro.frame` intrinsic to
+return an address of the coroutine frame.
Arguments:
""""""""""
@@ -765,15 +772,17 @@ Arguments:
The first argument is a pointer to a block of memory where coroutine frame
will be stored.
-The second argument provides information on the alignment of the memory returned
+The second argument is either null or an SSA value of `coro.alloc` intrinsic.
+
+The third argument provides information on the alignment of the memory returned
by the allocation function and given to `coro.begin` by the first argument. If
this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*).
This argument only accepts constants.
-The third argument, if not `null`, designates a particular alloca instruction to
+The fourth argument, if not `null`, designates a particular alloca instruction to
be a `coroutine promise`_.
-The fourth argument is `null` before coroutine is split, and later is replaced
+The fifth argument is `null` before coroutine is split, and later is replaced
to point to a private global constant array containing function pointers to
outlined resume and destroy parts of the coroutine.
@@ -781,10 +790,10 @@ Semantics:
""""""""""
Depending on the alignment requirements of the objects in the coroutine frame
-and/or on the codegen compactness reasons the pointer returned from `coro.begin`
-may be at offset to the `%mem` argument. (This could be beneficial if
-instructions that express relative access to data can be more compactly encoded
-with small positive and negative offsets).
+and/or on the codegen compactness reasons the pointer returned from `coro.frame`
+associated with a particular `coro.begin` may be at offset to the `%mem`
+argument. (This could be beneficial if instructions that express relative access
+to data can be more compactly encoded with small positive and negative offsets).
A frontend should emit exactly one `coro.begin` intrinsic per coroutine.
@@ -807,7 +816,7 @@ Arguments:
""""""""""
A pointer to the coroutine frame. This should be the same pointer that was
-returned by prior `coro.begin` call.
+returned by prior `coro.frame` call.
Example (custom deallocation function):
"""""""""""""""""""""""""""""""""""""""
@@ -862,10 +871,13 @@ alloca storing the coroutine frame. Otherwise, it is lowered to constant `null`.
A frontend should emit at most one `coro.alloc` intrinsic per coroutine.
+If `coro.alloc` is present, the second parameter to `coro.begin` should refer
+to it.
+
Example:
""""""""
-.. code-block:: llvm
+.. code-block:: text
entry:
%elide = call i8* @llvm.coro.alloc()
@@ -879,7 +891,8 @@ Example:
coro.begin:
%phi = phi i8* [ %elide, %entry ], [ %alloc, %coro.alloc ]
- %frame = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* null)
+ %beg = call token @llvm.coro.begin(i8* %phi, i8* %elide, i32 0, i8* null, i8* null)
+ %frame = call i8* @llvm.coro.frame(token %beg)
.. _coro.frame:
@@ -898,14 +911,12 @@ the enclosing coroutine.
Arguments:
""""""""""
-None
+A token that refers to `coro.begin` instruction.
Semantics:
""""""""""
-This intrinsic is lowered to refer to the `coro.begin`_ instruction. This is
-a frontend convenience intrinsic that makes it easier to refer to the
-coroutine frame.
+This intrinsic is lowered to refer to address of the coroutine frame.
.. _coro.end:
@@ -1164,7 +1175,7 @@ CoroElide
---------
The pass CoroElide examines if the inlined coroutine is eligible for heap
allocation elision optimization. If so, it replaces `coro.alloc` and
-`coro.begin` intrinsic with an address of a coroutine frame placed on its caller
+`coro.frame` intrinsic with an address of a coroutine frame placed on its caller
and replaces `coro.free` intrinsics with `null` to remove the deallocation code.
This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct
calls to resume and destroy functions for a particular coroutine where possible.
@@ -1178,11 +1189,11 @@ Upstreaming sequence (rough plan)
=================================
#. Add documentation.
#. Add coroutine intrinsics.
-#. Add empty coroutine passes. <== we are here
+#. Add empty coroutine passes.
#. Add coroutine devirtualization + tests.
#. Add CGSCC restart trigger + tests.
#. Add coroutine heap elision + tests.
-#. Add custom allocation heap elision + tests.
+#. Add custom allocation heap elision + tests. <== we are here
#. Add coroutine splitting logic + tests.
#. Add simple coroutine frame builder + tests.
#. Add the rest of the logic + tests. (Maybe split further as needed).
OpenPOWER on IntegriCloud