diff options
author | Gor Nishanov <GorNishanov@gmail.com> | 2016-08-10 16:40:39 +0000 |
---|---|---|
committer | Gor Nishanov <GorNishanov@gmail.com> | 2016-08-10 16:40:39 +0000 |
commit | b2a9c0252179d9334967266182bad77c1d3e6579 (patch) | |
tree | bdb018dd1e57587914081302fa50b38020038a64 /llvm/docs/Coroutines.rst | |
parent | 17586582e7cd22cbfe343dc7321c746046912423 (diff) | |
download | bcm5719-llvm-b2a9c0252179d9334967266182bad77c1d3e6579.tar.gz bcm5719-llvm-b2a9c0252179d9334967266182bad77c1d3e6579.zip |
[Coroutines] Part 6: Elide dynamic allocation of a coroutine frame when possible
Summary:
A particular coroutine usage pattern, where a coroutine is created, manipulated and
destroyed by the same calling function, is common for coroutines implementing
RAII idiom and is suitable for allocation elision optimization which avoid
dynamic allocation by storing the coroutine frame as a static `alloca` in its
caller.
coro.free and coro.alloc intrinsics are used to indicate which code needs to be suppressed
when dynamic allocation elision happens:
```
entry:
%elide = call i8* @llvm.coro.alloc()
%need.dyn.alloc = icmp ne i8* %elide, null
br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
dyn.alloc:
%alloc = call i8* @CustomAlloc(i32 4)
br label %coro.begin
coro.begin:
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
%hdl = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null,
i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*))
```
and
```
%mem = call i8* @llvm.coro.free(i8* %hdl)
%need.dyn.free = icmp ne i8* %mem, null
br i1 %need.dyn.free, label %dyn.free, label %if.end
dyn.free:
call void @CustomFree(i8* %mem)
br label %if.end
if.end:
...
```
If heap allocation elision is performed, we replace coro.alloc with a static alloca on the caller frame and coro.free with null constant.
Also, we need to make sure that if there are any tail calls referencing the coroutine frame, we need to remote tail call attribute, since now coroutine frame lives on the stack.
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization (https://reviews.llvm.org/D23229)
5.Add CGSCC restart trigger + tests. (https://reviews.llvm.org/D23234)
6.Add coroutine heap elision + tests. <= we are here
7.Add the rest of the logic (split into more patches)
Reviewers: mehdi_amini, majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23245
llvm-svn: 278242
Diffstat (limited to 'llvm/docs/Coroutines.rst')
-rw-r--r-- | llvm/docs/Coroutines.rst | 73 |
1 files changed, 42 insertions, 31 deletions
diff --git a/llvm/docs/Coroutines.rst b/llvm/docs/Coroutines.rst index a54c18a3537..7a12641babf 100644 --- a/llvm/docs/Coroutines.rst +++ b/llvm/docs/Coroutines.rst @@ -95,7 +95,8 @@ The LLVM IR for this coroutine looks like this: entry: %size = call i32 @llvm.coro.size.i32() %alloc = call i8* @malloc(i32 %size) - %hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* null, i8* null) + %beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* null, i8* null) + %hdl = call noalias i8* @llvm.coro.frame(token %beg) br label %loop loop: %n.val = phi i32 [ %n, %entry ], [ %inc, %loop ] @@ -115,9 +116,10 @@ The LLVM IR for this coroutine looks like this: The `entry` block establishes the coroutine frame. The `coro.size`_ intrinsic is lowered to a constant representing the size required for the coroutine frame. -The `coro.begin`_ intrinsic initializes the coroutine frame and returns the -coroutine handle. The first parameter of `coro.begin` is given a block of memory -to be used if the coroutine frame needs to be allocated dynamically. +The `coro.begin`_ intrinsic initializes the coroutine frame and returns the a +token that is used to obtain the coroutine handle via `coro.frame` intrinsic. +The first parameter of `coro.begin` is given a block of memory to be used if the +coroutine frame needs to be allocated dynamically. The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic, given the coroutine handle, returns a pointer of the memory block to be freed or @@ -160,12 +162,13 @@ After resume and destroy parts are outlined, function `f` will contain only the code responsible for creation and initialization of the coroutine frame and execution of the coroutine until a suspend point is reached: -.. code-block:: llvm +.. code-block:: none define i8* @f(i32 %n) { entry: %alloc = call noalias i8* @malloc(i32 24) - %0 = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* null, i8* null) + %beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* null, i8* null) + %0 = call i8* @llvm.coro.frame(token %beg) %frame = bitcast i8* %0 to %f.frame* %1 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 0 store void (%f.frame*)* @f.resume, void (%f.frame*)** %1 @@ -219,7 +222,7 @@ In the entry block, we will call `coro.alloc`_ intrinsic that will return `null` when dynamic allocation is required, and an address of an alloca on the caller's frame where coroutine frame can be stored if dynamic allocation is elided. -.. code-block:: llvm +.. code-block:: none entry: %elide = call i8* @llvm.coro.alloc() @@ -231,7 +234,7 @@ frame where coroutine frame can be stored if dynamic allocation is elided. br label %coro.begin coro.begin: %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ] - %hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* null) + %beg = call token @llvm.coro.begin(i8* %phi, i8* null, i32 0, i8* null, i8* null) In the cleanup block, we will make freeing the coroutine frame conditional on `coro.free`_ intrinsic. If allocation is elided, `coro.free`_ returns `null` @@ -421,7 +424,8 @@ store the current value produced by a coroutine. br label %coro.begin coro.begin: %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ] - %hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* %pv, i8* null) + %beg = call token @llvm.coro.begin(i8* %phi, i8* %elide, i32 0, i8* %pv, i8* null) + %hdl = call i8* @llvm.coro.frame(token %beg) br label %loop loop: %n.val = phi i32 [ %n, %coro.begin ], [ %inc, %loop ] @@ -687,15 +691,16 @@ a coroutine user are responsible to makes sure there is no data races. Example: """""""" -.. code-block:: llvm +.. code-block:: text define i8* @f(i32 %n) { entry: %promise = alloca i32 %pv = bitcast i32* %promise to i8* ... - ; the third argument to coro.begin points to the coroutine promise. - %hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* %pv, i8* null) + ; the fourth argument to coro.begin points to the coroutine promise. + %beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* %pv, i8* null) + %hdl = call noalias i8* @llvm.coro.frame(token %beg) ... store i32 42, i32* %promise ; store something into the promise ... @@ -752,12 +757,14 @@ the coroutine frame. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: - declare i8* @llvm.coro.begin(i8* <mem>, i32 <align>, i8* <promise>, i8* <fnaddr>) + declare i8* @llvm.coro.begin(i8* <mem>, i8* <elide>, i32 <align>, i8* <promise>, i8* <fnaddr>) Overview: """"""""" -The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame. +The '``llvm.coro.begin``' intrinsic captures coroutine initialization +information and returns a token that can be used by `coro.frame` intrinsic to +return an address of the coroutine frame. Arguments: """""""""" @@ -765,15 +772,17 @@ Arguments: The first argument is a pointer to a block of memory where coroutine frame will be stored. -The second argument provides information on the alignment of the memory returned +The second argument is either null or an SSA value of `coro.alloc` intrinsic. + +The third argument provides information on the alignment of the memory returned by the allocation function and given to `coro.begin` by the first argument. If this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*). This argument only accepts constants. -The third argument, if not `null`, designates a particular alloca instruction to +The fourth argument, if not `null`, designates a particular alloca instruction to be a `coroutine promise`_. -The fourth argument is `null` before coroutine is split, and later is replaced +The fifth argument is `null` before coroutine is split, and later is replaced to point to a private global constant array containing function pointers to outlined resume and destroy parts of the coroutine. @@ -781,10 +790,10 @@ Semantics: """""""""" Depending on the alignment requirements of the objects in the coroutine frame -and/or on the codegen compactness reasons the pointer returned from `coro.begin` -may be at offset to the `%mem` argument. (This could be beneficial if -instructions that express relative access to data can be more compactly encoded -with small positive and negative offsets). +and/or on the codegen compactness reasons the pointer returned from `coro.frame` +associated with a particular `coro.begin` may be at offset to the `%mem` +argument. (This could be beneficial if instructions that express relative access +to data can be more compactly encoded with small positive and negative offsets). A frontend should emit exactly one `coro.begin` intrinsic per coroutine. @@ -807,7 +816,7 @@ Arguments: """""""""" A pointer to the coroutine frame. This should be the same pointer that was -returned by prior `coro.begin` call. +returned by prior `coro.frame` call. Example (custom deallocation function): """"""""""""""""""""""""""""""""""""""" @@ -862,10 +871,13 @@ alloca storing the coroutine frame. Otherwise, it is lowered to constant `null`. A frontend should emit at most one `coro.alloc` intrinsic per coroutine. +If `coro.alloc` is present, the second parameter to `coro.begin` should refer +to it. + Example: """""""" -.. code-block:: llvm +.. code-block:: text entry: %elide = call i8* @llvm.coro.alloc() @@ -879,7 +891,8 @@ Example: coro.begin: %phi = phi i8* [ %elide, %entry ], [ %alloc, %coro.alloc ] - %frame = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* null) + %beg = call token @llvm.coro.begin(i8* %phi, i8* %elide, i32 0, i8* null, i8* null) + %frame = call i8* @llvm.coro.frame(token %beg) .. _coro.frame: @@ -898,14 +911,12 @@ the enclosing coroutine. Arguments: """""""""" -None +A token that refers to `coro.begin` instruction. Semantics: """""""""" -This intrinsic is lowered to refer to the `coro.begin`_ instruction. This is -a frontend convenience intrinsic that makes it easier to refer to the -coroutine frame. +This intrinsic is lowered to refer to address of the coroutine frame. .. _coro.end: @@ -1164,7 +1175,7 @@ CoroElide --------- The pass CoroElide examines if the inlined coroutine is eligible for heap allocation elision optimization. If so, it replaces `coro.alloc` and -`coro.begin` intrinsic with an address of a coroutine frame placed on its caller +`coro.frame` intrinsic with an address of a coroutine frame placed on its caller and replaces `coro.free` intrinsics with `null` to remove the deallocation code. This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct calls to resume and destroy functions for a particular coroutine where possible. @@ -1178,11 +1189,11 @@ Upstreaming sequence (rough plan) ================================= #. Add documentation. #. Add coroutine intrinsics. -#. Add empty coroutine passes. <== we are here +#. Add empty coroutine passes. #. Add coroutine devirtualization + tests. #. Add CGSCC restart trigger + tests. #. Add coroutine heap elision + tests. -#. Add custom allocation heap elision + tests. +#. Add custom allocation heap elision + tests. <== we are here #. Add coroutine splitting logic + tests. #. Add simple coroutine frame builder + tests. #. Add the rest of the logic + tests. (Maybe split further as needed). |