diff options
Diffstat (limited to 'Documentation/RCU')
-rw-r--r-- | Documentation/RCU/00-INDEX | 4 | ||||
-rw-r--r-- | Documentation/RCU/Design/Data-Structures/Data-Structures.html | 233 | ||||
-rw-r--r-- | Documentation/RCU/Design/Data-Structures/nxtlist.svg | 34 | ||||
-rw-r--r-- | Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html | 47 | ||||
-rw-r--r-- | Documentation/RCU/Design/Requirements/Requirements.html | 229 | ||||
-rw-r--r-- | Documentation/RCU/checklist.txt | 8 | ||||
-rw-r--r-- | Documentation/RCU/rcu_dereference.txt | 9 | ||||
-rw-r--r-- | Documentation/RCU/rculist_nulls.txt | 6 | ||||
-rw-r--r-- | Documentation/RCU/stallwarn.txt | 190 | ||||
-rw-r--r-- | Documentation/RCU/trace.txt | 535 | ||||
-rw-r--r-- | Documentation/RCU/whatisRCU.txt | 32 |
11 files changed, 519 insertions, 808 deletions
diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX index f773a264ae02..f46980c060aa 100644 --- a/Documentation/RCU/00-INDEX +++ b/Documentation/RCU/00-INDEX @@ -17,7 +17,7 @@ rcu_dereference.txt rcubarrier.txt - RCU and Unloadable Modules rculist_nulls.txt - - RCU list primitives for use with SLAB_DESTROY_BY_RCU + - RCU list primitives for use with SLAB_TYPESAFE_BY_RCU rcuref.txt - Reference-count design for elements of lists/arrays protected by RCU rcu.txt @@ -28,8 +28,6 @@ stallwarn.txt - RCU CPU stall warnings (module parameter rcu_cpu_stall_suppress) torture.txt - RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST) -trace.txt - - CONFIG_RCU_TRACE debugfs files and formats UP.txt - RCU on Uniprocessor Systems whatisRCU.txt diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index d583c653a703..38d6d800761f 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -19,6 +19,8 @@ to each other. The <tt>rcu_state</tt> Structure</a> <li> <a href="#The rcu_node Structure"> The <tt>rcu_node</tt> Structure</a> +<li> <a href="#The rcu_segcblist Structure"> + The <tt>rcu_segcblist</tt> Structure</a> <li> <a href="#The rcu_data Structure"> The <tt>rcu_data</tt> Structure</a> <li> <a href="#The rcu_dynticks Structure"> @@ -841,6 +843,134 @@ for lockdep lock-class names. Finally, lines 64-66 produce an error if the maximum number of CPUs is too large for the specified fanout. +<h3><a name="The rcu_segcblist Structure"> +The <tt>rcu_segcblist</tt> Structure</a></h3> + +The <tt>rcu_segcblist</tt> structure maintains a segmented list of +callbacks as follows: + +<pre> + 1 #define RCU_DONE_TAIL 0 + 2 #define RCU_WAIT_TAIL 1 + 3 #define RCU_NEXT_READY_TAIL 2 + 4 #define RCU_NEXT_TAIL 3 + 5 #define RCU_CBLIST_NSEGS 4 + 6 + 7 struct rcu_segcblist { + 8 struct rcu_head *head; + 9 struct rcu_head **tails[RCU_CBLIST_NSEGS]; +10 unsigned long gp_seq[RCU_CBLIST_NSEGS]; +11 long len; +12 long len_lazy; +13 }; +</pre> + +<p> +The segments are as follows: + +<ol> +<li> <tt>RCU_DONE_TAIL</tt>: Callbacks whose grace periods have elapsed. + These callbacks are ready to be invoked. +<li> <tt>RCU_WAIT_TAIL</tt>: Callbacks that are waiting for the + current grace period. + Note that different CPUs can have different ideas about which + grace period is current, hence the <tt>->gp_seq</tt> field. +<li> <tt>RCU_NEXT_READY_TAIL</tt>: Callbacks waiting for the next + grace period to start. +<li> <tt>RCU_NEXT_TAIL</tt>: Callbacks that have not yet been + associated with a grace period. +</ol> + +<p> +The <tt>->head</tt> pointer references the first callback or +is <tt>NULL</tt> if the list contains no callbacks (which is +<i>not</i> the same as being empty). +Each element of the <tt>->tails[]</tt> array references the +<tt>->next</tt> pointer of the last callback in the corresponding +segment of the list, or the list's <tt>->head</tt> pointer if +that segment and all previous segments are empty. +If the corresponding segment is empty but some previous segment is +not empty, then the array element is identical to its predecessor. +Older callbacks are closer to the head of the list, and new callbacks +are added at the tail. +This relationship between the <tt>->head</tt> pointer, the +<tt>->tails[]</tt> array, and the callbacks is shown in this +diagram: + +</p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%"> + +</p><p>In this figure, the <tt>->head</tt> pointer references the +first +RCU callback in the list. +The <tt>->tails[RCU_DONE_TAIL]</tt> array element references +the <tt>->head</tt> pointer itself, indicating that none +of the callbacks is ready to invoke. +The <tt>->tails[RCU_WAIT_TAIL]</tt> array element references callback +CB 2's <tt>->next</tt> pointer, which indicates that +CB 1 and CB 2 are both waiting on the current grace period, +give or take possible disagreements about exactly which grace period +is the current one. +The <tt>->tails[RCU_NEXT_READY_TAIL]</tt> array element +references the same RCU callback that <tt>->tails[RCU_WAIT_TAIL]</tt> +does, which indicates that there are no callbacks waiting on the next +RCU grace period. +The <tt>->tails[RCU_NEXT_TAIL]</tt> array element references +CB 4's <tt>->next</tt> pointer, indicating that all the +remaining RCU callbacks have not yet been assigned to an RCU grace +period. +Note that the <tt>->tails[RCU_NEXT_TAIL]</tt> array element +always references the last RCU callback's <tt>->next</tt> pointer +unless the callback list is empty, in which case it references +the <tt>->head</tt> pointer. + +<p> +There is one additional important special case for the +<tt>->tails[RCU_NEXT_TAIL]</tt> array element: It can be <tt>NULL</tt> +when this list is <i>disabled</i>. +Lists are disabled when the corresponding CPU is offline or when +the corresponding CPU's callbacks are offloaded to a kthread, +both of which are described elsewhere. + +</p><p>CPUs advance their callbacks from the +<tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the +<tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments +as grace periods advance. + +</p><p>The <tt>->gp_seq[]</tt> array records grace-period +numbers corresponding to the list segments. +This is what allows different CPUs to have different ideas as to +which is the current grace period while still avoiding premature +invocation of their callbacks. +In particular, this allows CPUs that go idle for extended periods +to determine which of their callbacks are ready to be invoked after +reawakening. + +</p><p>The <tt>->len</tt> counter contains the number of +callbacks in <tt>->head</tt>, and the +<tt>->len_lazy</tt> contains the number of those callbacks that +are known to only free memory, and whose invocation can therefore +be safely deferred. + +<p><b>Important note</b>: It is the <tt>->len</tt> field that +determines whether or not there are callbacks associated with +this <tt>rcu_segcblist</tt> structure, <i>not</i> the <tt>->head</tt> +pointer. +The reason for this is that all the ready-to-invoke callbacks +(that is, those in the <tt>RCU_DONE_TAIL</tt> segment) are extracted +all at once at callback-invocation time. +If callback invocation must be postponed, for example, because a +high-priority process just woke up on this CPU, then the remaining +callbacks are placed back on the <tt>RCU_DONE_TAIL</tt> segment. +Either way, the <tt>->len</tt> and <tt>->len_lazy</tt> counts +are adjusted after the corresponding callbacks have been invoked, and so +again it is the <tt>->len</tt> count that accurately reflects whether +or not there are callbacks associated with this <tt>rcu_segcblist</tt> +structure. +Of course, off-CPU sampling of the <tt>->len</tt> count requires +the use of appropriate synchronization, for example, memory barriers. +This synchronization can be a bit subtle, particularly in the case +of <tt>rcu_barrier()</tt>. + <h3><a name="The rcu_data Structure"> The <tt>rcu_data</tt> Structure</a></h3> @@ -983,62 +1113,18 @@ choice. as follows: <pre> - 1 struct rcu_head *nxtlist; - 2 struct rcu_head **nxttail[RCU_NEXT_SIZE]; - 3 unsigned long nxtcompleted[RCU_NEXT_SIZE]; - 4 long qlen_lazy; - 5 long qlen; - 6 long qlen_last_fqs_check; + 1 struct rcu_segcblist cblist; + 2 long qlen_last_fqs_check; + 3 unsigned long n_cbs_invoked; + 4 unsigned long n_nocbs_invoked; + 5 unsigned long n_cbs_orphaned; + 6 unsigned long n_cbs_adopted; 7 unsigned long n_force_qs_snap; - 8 unsigned long n_cbs_invoked; - 9 unsigned long n_cbs_orphaned; -10 unsigned long n_cbs_adopted; -11 long blimit; + 8 long blimit; </pre> -<p>The <tt>->nxtlist</tt> pointer and the -<tt>->nxttail[]</tt> array form a four-segment list with -older callbacks near the head and newer ones near the tail. -Each segment contains callbacks with the corresponding relationship -to the current grace period. -The pointer out of the end of each of the four segments is referenced -by the element of the <tt>->nxttail[]</tt> array indexed by -<tt>RCU_DONE_TAIL</tt> (for callbacks handled by a prior grace period), -<tt>RCU_WAIT_TAIL</tt> (for callbacks waiting on the current grace period), -<tt>RCU_NEXT_READY_TAIL</tt> (for callbacks that will wait on the next -grace period), and -<tt>RCU_NEXT_TAIL</tt> (for callbacks that are not yet associated -with a specific grace period) -respectively, as shown in the following figure. - -</p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%"> - -</p><p>In this figure, the <tt>->nxtlist</tt> pointer references the -first -RCU callback in the list. -The <tt>->nxttail[RCU_DONE_TAIL]</tt> array element references -the <tt>->nxtlist</tt> pointer itself, indicating that none -of the callbacks is ready to invoke. -The <tt>->nxttail[RCU_WAIT_TAIL]</tt> array element references callback -CB 2's <tt>->next</tt> pointer, which indicates that -CB 1 and CB 2 are both waiting on the current grace period. -The <tt>->nxttail[RCU_NEXT_READY_TAIL]</tt> array element -references the same RCU callback that <tt>->nxttail[RCU_WAIT_TAIL]</tt> -does, which indicates that there are no callbacks waiting on the next -RCU grace period. -The <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element references -CB 4's <tt>->next</tt> pointer, indicating that all the -remaining RCU callbacks have not yet been assigned to an RCU grace -period. -Note that the <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element -always references the last RCU callback's <tt>->next</tt> pointer -unless the callback list is empty, in which case it references -the <tt>->nxtlist</tt> pointer. - -</p><p>CPUs advance their callbacks from the -<tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the -<tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments -as grace periods advance. +<p>The <tt>->cblist</tt> structure is the segmented callback list +described earlier. The CPU advances the callbacks in its <tt>rcu_data</tt> structure whenever it notices that another RCU grace period has completed. The CPU detects the completion of an RCU grace period by noticing @@ -1049,16 +1135,7 @@ Recall that each <tt>rcu_node</tt> structure's <tt>->completed</tt> field is updated at the end of each grace period. -</p><p>The <tt>->nxtcompleted[]</tt> array records grace-period -numbers corresponding to the list segments. -This allows CPUs that go idle for extended periods to determine -which of their callbacks are ready to be invoked after reawakening. - -</p><p>The <tt>->qlen</tt> counter contains the number of -callbacks in <tt>->nxtlist</tt>, and the -<tt>->qlen_lazy</tt> contains the number of those callbacks that -are known to only free memory, and whose invocation can therefore -be safely deferred. +<p> The <tt>->qlen_last_fqs_check</tt> and <tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent states from <tt>call_rcu()</tt> and friends when callback @@ -1069,6 +1146,10 @@ lists grow excessively long. fields count the number of callbacks invoked, sent to other CPUs when this CPU goes offline, and received from other CPUs when those other CPUs go offline. +The <tt>->n_nocbs_invoked</tt> is used when the CPU's callbacks +are offloaded to a kthread. + +<p> Finally, the <tt>->blimit</tt> counter is the maximum number of RCU callbacks that may be invoked at a given time. @@ -1104,6 +1185,9 @@ Its fields are as follows: 1 int dynticks_nesting; 2 int dynticks_nmi_nesting; 3 atomic_t dynticks; + 4 bool rcu_need_heavy_qs; + 5 unsigned long rcu_qs_ctr; + 6 bool rcu_urgent_qs; </pre> <p>The <tt>->dynticks_nesting</tt> field counts the @@ -1117,11 +1201,32 @@ NMIs are counted by the <tt>->dynticks_nmi_nesting</tt> field, except that NMIs that interrupt non-dyntick-idle execution are not counted. -</p><p>Finally, the <tt>->dynticks</tt> field counts the corresponding +</p><p>The <tt>->dynticks</tt> field counts the corresponding CPU's transitions to and from dyntick-idle mode, so that this counter has an even value when the CPU is in dyntick-idle mode and an odd value otherwise. +</p><p>The <tt>->rcu_need_heavy_qs</tt> field is used +to record the fact that the RCU core code would really like to +see a quiescent state from the corresponding CPU, so much so that +it is willing to call for heavy-weight dyntick-counter operations. +This flag is checked by RCU's context-switch and <tt>cond_resched()</tt> +code, which provide a momentary idle sojourn in response. + +</p><p>The <tt>->rcu_qs_ctr</tt> field is used to record +quiescent states from <tt>cond_resched()</tt>. +Because <tt>cond_resched()</tt> can execute quite frequently, this +must be quite lightweight, as in a non-atomic increment of this +per-CPU field. + +</p><p>Finally, the <tt>->rcu_urgent_qs</tt> field is used to record +the fact that the RCU core code would really like to see a quiescent +state from the corresponding CPU, with the various other fields indicating +just how badly RCU wants this quiescent state. +This flag is checked by RCU's context-switch and <tt>cond_resched()</tt> +code, which, if nothing else, non-atomically increment <tt>->rcu_qs_ctr</tt> +in response. + <table> <tr><th> </th></tr> <tr><th align="left">Quick Quiz:</th></tr> diff --git a/Documentation/RCU/Design/Data-Structures/nxtlist.svg b/Documentation/RCU/Design/Data-Structures/nxtlist.svg index abc4cc73a097..0223e79c38e0 100644 --- a/Documentation/RCU/Design/Data-Structures/nxtlist.svg +++ b/Documentation/RCU/Design/Data-Structures/nxtlist.svg @@ -19,7 +19,7 @@ id="svg2" version="1.1" inkscape:version="0.48.4 r9939" - sodipodi:docname="nxtlist.fig"> + sodipodi:docname="segcblist.svg"> <metadata id="metadata94"> <rdf:RDF> @@ -28,7 +28,7 @@ <dc:format>image/svg+xml</dc:format> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage" /> - <dc:title></dc:title> + <dc:title /> </cc:Work> </rdf:RDF> </metadata> @@ -241,61 +241,51 @@ xml:space="preserve" x="225" y="675" - fill="#000000" - font-family="Courier" font-style="normal" font-weight="bold" font-size="324" - text-anchor="start" - id="text64">nxtlist</text> + id="text64" + style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->head</text> <!-- Text --> <text xml:space="preserve" x="225" y="1800" - fill="#000000" - font-family="Courier" font-style="normal" font-weight="bold" font-size="324" - text-anchor="start" - id="text66">nxttail[RCU_DONE_TAIL]</text> + id="text66" + style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_DONE_TAIL]</text> <!-- Text --> <text xml:space="preserve" x="225" y="2925" - fill="#000000" - font-family="Courier" font-style="normal" font-weight="bold" font-size="324" - text-anchor="start" - id="text68">nxttail[RCU_WAIT_TAIL]</text> + id="text68" + style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_WAIT_TAIL]</text> <!-- Text --> <text xml:space="preserve" x="225" y="4050" - fill="#000000" - font-family="Courier" font-style="normal" font-weight="bold" font-size="324" - text-anchor="start" - id="text70">nxttail[RCU_NEXT_READY_TAIL]</text> + id="text70" + style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_NEXT_READY_TAIL]</text> <!-- Text --> <text xml:space="preserve" x="225" y="5175" - fill="#000000" - font-family="Courier" font-style="normal" font-weight="bold" font-size="324" - text-anchor="start" - id="text72">nxttail[RCU_NEXT_TAIL]</text> + id="text72" + style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_NEXT_TAIL]</text> <!-- Text --> <text xml:space="preserve" diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html index 7a3194c5559a..e5d0bbd0230b 100644 --- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html +++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html @@ -284,6 +284,7 @@ Expedited Grace Period Refinements</a></h2> Funnel locking and wait/wakeup</a>. <li> <a href="#Use of Workqueues">Use of Workqueues</a>. <li> <a href="#Stall Warnings">Stall warnings</a>. +<li> <a href="#Mid-Boot Operation">Mid-boot operation</a>. </ol> <h3><a name="Idle-CPU Checks">Idle-CPU Checks</a></h3> @@ -524,7 +525,7 @@ their grace periods and carrying out their wakeups. In earlier implementations, the task requesting the expedited grace period also drove it to completion. This straightforward approach had the disadvantage of needing to -account for signals sent to user tasks, +account for POSIX signals sent to user tasks, so more recent implemementations use the Linux kernel's <a href="https://www.kernel.org/doc/Documentation/workqueue.txt">workqueues</a>. @@ -533,8 +534,8 @@ The requesting task still does counter snapshotting and funnel-lock processing, but the task reaching the top of the funnel lock does a <tt>schedule_work()</tt> (from <tt>_synchronize_rcu_expedited()</tt> so that a workqueue kthread does the actual grace-period processing. -Because workqueue kthreads do not accept signals, grace-period-wait -processing need not allow for signals. +Because workqueue kthreads do not accept POSIX signals, grace-period-wait +processing need not allow for POSIX signals. In addition, this approach allows wakeups for the previous expedited grace period to be overlapped with processing for the next expedited @@ -586,6 +587,46 @@ blocking the current grace period are printed. Each stall warning results in another pass through the loop, but the second and subsequent passes use longer stall times. +<h3><a name="Mid-Boot Operation">Mid-boot operation</a></h3> + +<p> +The use of workqueues has the advantage that the expedited +grace-period code need not worry about POSIX signals. +Unfortunately, it has the +corresponding disadvantage that workqueues cannot be used until +they are initialized, which does not happen until some time after +the scheduler spawns the first task. +Given that there are parts of the kernel that really do want to +execute grace periods during this mid-boot “dead zone”, +expedited grace periods must do something else during thie time. + +<p> +What they do is to fall back to the old practice of requiring that the +requesting task drive the expedited grace period, as was the case +before the use of workqueues. +However, the requesting task is only required to drive the grace period +during the mid-boot dead zone. +Before mid-boot, a synchronous grace period is a no-op. +Some time after mid-boot, workqueues are used. + +<p> +Non-expedited non-SRCU synchronous grace periods must also operate +normally during mid-boot. +This is handled by causing non-expedited grace periods to take the +expedited code path during mid-boot. + +<p> +The current code assumes that there are no POSIX signals during +the mid-boot dead zone. +However, if an overwhelming need for POSIX signals somehow arises, +appropriate adjustments can be made to the expedited stall-warning code. +One such adjustment would reinstate the pre-workqueue stall-warning +checks, but only during the mid-boot dead zone. + +<p> +With this refinement, synchronous grace periods can now be used from +task context pretty much any time during the life of the kernel. + <h3><a name="Summary"> Summary</a></h3> diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 21593496aca6..95b30fa25d56 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -559,9 +559,7 @@ The <tt>rcu_access_pointer()</tt> on line 6 is similar to For <tt>remove_gp_synchronous()</tt>, as long as all modifications to <tt>gp</tt> are carried out while holding <tt>gp_lock</tt>, the above optimizations are harmless. - However, - with <tt>CONFIG_SPARSE_RCU_POINTER=y</tt>, - <tt>sparse</tt> will complain if you + However, <tt>sparse</tt> will complain if you define <tt>gp</tt> with <tt>__rcu</tt> and then access it without using either <tt>rcu_access_pointer()</tt> or <tt>rcu_dereference()</tt>. @@ -659,8 +657,9 @@ systems with more than one CPU: In other words, a given instance of <tt>synchronize_rcu()</tt> can avoid waiting on a given RCU read-side critical section only if it can prove that <tt>synchronize_rcu()</tt> started first. + </font> - <p> + <p><font color="ffffff"> A related question is “When <tt>rcu_read_lock()</tt> doesn't generate any code, why does it matter how it relates to a grace period?” @@ -675,8 +674,9 @@ systems with more than one CPU: within the critical section, in which case none of the accesses within the critical section may observe the effects of any access following the grace period. + </font> - <p> + <p><font color="ffffff"> As of late 2016, mathematical models of RCU take this viewpoint, for example, see slides 62 and 63 of the @@ -1616,8 +1616,8 @@ CPUs should at least make reasonable forward progress. In return for its shorter latencies, <tt>synchronize_rcu_expedited()</tt> is permitted to impose modest degradation of real-time latency on non-idle online CPUs. -That said, it will likely be necessary to take further steps to reduce this -degradation, hopefully to roughly that of a scheduling-clock interrupt. +Here, “modest” means roughly the same latency +degradation as a scheduling-clock interrupt. <p> There are a number of situations where even @@ -1847,7 +1847,8 @@ mass storage, or user patience, whichever comes first. If the nesting is not visible to the compiler, as is the case with mutually recursive functions each in its own translation unit, stack overflow will result. -If the nesting takes the form of loops, either the control variable +If the nesting takes the form of loops, perhaps in the guise of tail +recursion, either the control variable will overflow or (in the Linux kernel) you will get an RCU CPU stall warning. Nevertheless, this class of RCU implementations is one of the most composable constructs in existence. @@ -1913,12 +1914,9 @@ This requirement is another factor driving batching of grace periods, but it is also the driving force behind the checks for large numbers of queued RCU callbacks in the <tt>call_rcu()</tt> code path. Finally, high update rates should not delay RCU read-side critical -sections, although some read-side delays can occur when using +sections, although some small read-side delays can occur when using <tt>synchronize_rcu_expedited()</tt>, courtesy of this function's use -of <tt>try_stop_cpus()</tt>. -(In the future, <tt>synchronize_rcu_expedited()</tt> will be -converted to use lighter-weight inter-processor interrupts (IPIs), -but this will still disturb readers, though to a much smaller degree.) +of <tt>smp_call_function_single()</tt>. <p> Although all three of these corner cases were understood in the early @@ -1978,9 +1976,8 @@ guard against mishaps and misuse: and <tt>rcu_dereference()</tt>, perhaps (incorrectly) substituting a simple assignment. To catch this sort of error, a given RCU-protected pointer may be - tagged with <tt>__rcu</tt>, after which running sparse - with <tt>CONFIG_SPARSE_RCU_POINTER=y</tt> will complain - about simple-assignment accesses to that pointer. + tagged with <tt>__rcu</tt>, after which sparse + will complain about simple-assignment accesses to that pointer. Arnd Bergmann made me aware of this requirement, and also supplied the needed <a href="https://lwn.net/Articles/376011/">patch series</a>. @@ -2037,7 +2034,7 @@ guard against mishaps and misuse: some other synchronization mechanism, for example, reference counting. <li> In kernels built with <tt>CONFIG_RCU_TRACE=y</tt>, RCU-related - information is provided via both debugfs and event tracing. + information is provided via event tracing. <li> Open-coded use of <tt>rcu_assign_pointer()</tt> and <tt>rcu_dereference()</tt> to create typical linked data structures can be surprisingly error-prone. @@ -2154,7 +2151,8 @@ as will <tt>rcu_assign_pointer()</tt>. <p> Although <tt>call_rcu()</tt> may be invoked at any time during boot, callbacks are not guaranteed to be invoked until after -the scheduler is fully up and running. +all of RCU's kthreads have been spawned, which occurs at +<tt>early_initcall()</tt> time. This delay in callback invocation is due to the fact that RCU does not invoke callbacks until it is fully initialized, and this full initialization cannot occur until after the scheduler has initialized itself to the @@ -2167,8 +2165,10 @@ on what operations those callbacks could invoke. Perhaps surprisingly, <tt>synchronize_rcu()</tt>, <a href="#Bottom-Half Flavor"><tt>synchronize_rcu_bh()</tt></a> (<a href="#Bottom-Half Flavor">discussed below</a>), -and -<a href="#Sched Flavor"><tt>synchronize_sched()</tt></a> +<a href="#Sched Flavor"><tt>synchronize_sched()</tt></a>, +<tt>synchronize_rcu_expedited()</tt>, +<tt>synchronize_rcu_bh_expedited()</tt>, and +<tt>synchronize_sched_expedited()</tt> will all operate normally during very early boot, the reason being that there is only one CPU and preemption is disabled. @@ -2178,45 +2178,59 @@ state and thus a grace period, so the early-boot implementation can be a no-op. <p> -Both <tt>synchronize_rcu_bh()</tt> and <tt>synchronize_sched()</tt> -continue to operate normally through the remainder of boot, courtesy -of the fact that preemption is disabled across their RCU read-side -critical sections and also courtesy of the fact that there is still -only one CPU. -However, once the scheduler starts initializing, preemption is enabled. -There is still only a single CPU, but the fact that preemption is enabled -means that the no-op implementation of <tt>synchronize_rcu()</tt> no -longer works in <tt>CONFIG_PREEMPT=y</tt> kernels. -Therefore, as soon as the scheduler starts initializing, the early-boot -fastpath is disabled. -This means that <tt>synchronize_rcu()</tt> switches to its runtime -mode of operation where it posts callbacks, which in turn means that -any call to <tt>synchronize_rcu()</tt> will block until the corresponding -callback is invoked. -Unfortunately, the callback cannot be invoked until RCU's runtime -grace-period machinery is up and running, which cannot happen until -the scheduler has initialized itself sufficiently to allow RCU's -kthreads to be spawned. -Therefore, invoking <tt>synchronize_rcu()</tt> during scheduler -initialization can result in deadlock. +However, once the scheduler has spawned its first kthread, this early +boot trick fails for <tt>synchronize_rcu()</tt> (as well as for +<tt>synchronize_rcu_expedited()</tt>) in <tt>CONFIG_PREEMPT=y</tt> +kernels. +The reason is that an RCU read-side critical section might be preempted, +which means that a subsequent <tt>synchronize_rcu()</tt> really does have +to wait for something, as opposed to simply returning immediately. +Unfortunately, <tt>synchronize_rcu()</tt> can't do this until all of +its kthreads are spawned, which doesn't happen until some time during +<tt>early_initcalls()</tt> time. +But this is no excuse: RCU is nevertheless required to correctly handle +synchronous grace periods during this time period. +Once all of its kthreads are up and running, RCU starts running +normally. <table> <tr><th> </th></tr> <tr><th align="left">Quick Quiz:</th></tr> <tr><td> - So what happens with <tt>synchronize_rcu()</tt> during - scheduler initialization for <tt>CONFIG_PREEMPT=n</tt> - kernels? + How can RCU possibly handle grace periods before all of its + kthreads have been spawned??? </td></tr> <tr><th align="left">Answer:</th></tr> <tr><td bgcolor="#ffffff"><font color="ffffff"> - In <tt>CONFIG_PREEMPT=n</tt> kernel, <tt>synchronize_rcu()</tt> - maps directly to <tt>synchronize_sched()</tt>. - Therefore, <tt>synchronize_rcu()</tt> works normally throughout - boot in <tt>CONFIG_PREEMPT=n</tt> kernels. - However, your code must also work in <tt>CONFIG_PREEMPT=y</tt> kernels, - so it is still necessary to avoid invoking <tt>synchronize_rcu()</tt> - during scheduler initialization. + Very carefully! + </font> + + <p><font color="ffffff"> + During the “dead zone” between the time that the + scheduler spawns the first task and the time that all of RCU's + kthreads have been spawned, all synchronous grace periods are + handled by the expedited grace-period mechanism. + At runtime, this expedited mechanism relies on workqueues, but + during the dead zone the requesting task itself drives the + desired expedited grace period. + Because dead-zone execution takes place within task context, + everything works. + Once the dead zone ends, expedited grace periods go back to + using workqueues, as is required to avoid problems that would + otherwise occur when a user task received a POSIX signal while + driving an expedited grace period. + </font> + + <p><font color="ffffff"> + And yes, this does mean that it is unhelpful to send POSIX + signals to random tasks between the time that the scheduler + spawns its first kthread and the time that RCU's kthreads + have all been spawned. + If there ever turns out to be a good reason for sending POSIX + signals during that time, appropriate adjustments will be made. + (If it turns out that POSIX signals are sent during this time for + no good reason, other adjustments will be made, appropriate + or otherwise.) </font></td></tr> <tr><td> </td></tr> </table> @@ -2295,12 +2309,61 @@ situation, and Dipankar Sarma incorporated <tt>rcu_barrier()</tt> into RCU. The need for <tt>rcu_barrier()</tt> for module unloading became apparent later. +<p> +<b>Important note</b>: The <tt>rcu_barrier()</tt> function is not, +repeat, <i>not</i>, obligated to wait for a grace period. +It is instead only required to wait for RCU callbacks that have +already been posted. +Therefore, if there are no RCU callbacks posted anywhere in the system, +<tt>rcu_barrier()</tt> is within its rights to return immediately. +Even if there are callbacks posted, <tt>rcu_barrier()</tt> does not +necessarily need to wait for a grace period. + +<table> +<tr><th> </th></tr> +<tr><th align="left">Quick Quiz:</th></tr> +<tr><td> + Wait a minute! + Each RCU callbacks must wait for a grace period to complete, + and <tt>rcu_barrier()</tt> must wait for each pre-existing + callback to be invoked. + Doesn't <tt>rcu_barrier()</tt> therefore need to wait for + a full grace period if there is even one callback posted anywhere + in the system? +</td></tr> +<tr><th align="left">Answer:</th></tr> +<tr><td bgcolor="#ffffff"><font color="ffffff"> + Absolutely not!!! + </font> + + <p><font color="ffffff"> + Yes, each RCU callbacks must wait for a grace period to complete, + but it might well be partly (or even completely) finished waiting + by the time <tt>rcu_barrier()</tt> is invoked. + In that case, <tt>rcu_barrier()</tt> need only wait for the + remaining portion of the grace period to elapse. + So even if there are quite a few callbacks posted, + <tt>rcu_barrier()</tt> might well return quite quickly. + </font> + + <p><font color="ffffff"> + So if you need to wait for a grace period as well as for all + pre-existing callbacks, you will need to invoke both + <tt>synchronize_rcu()</tt> and <tt>rcu_barrier()</tt>. + If latency is a concern, you can always use workqueues + to invoke them concurrently. +</font></td></tr> +<tr><td> </td></tr> +</table> + <h3><a name="Hotplug CPU">Hotplug CPU</a></h3> <p> The Linux kernel supports CPU hotplug, which means that CPUs can come and go. -It is of course illegal to use any RCU API member from an offline CPU. +It is of course illegal to use any RCU API member from an offline CPU, +with the exception of <a href="#Sleepable RCU">SRCU</a> read-side +critical sections. This requirement was present from day one in DYNIX/ptx, but on the other hand, the Linux kernel's CPU-hotplug implementation is “interesting.” @@ -2310,19 +2373,18 @@ The Linux-kernel CPU-hotplug implementation has notifiers that are used to allow the various kernel subsystems (including RCU) to respond appropriately to a given CPU-hotplug operation. Most RCU operations may be invoked from CPU-hotplug notifiers, -including even normal synchronous grace-period operations -such as <tt>synchronize_rcu()</tt>. -However, expedited grace-period operations such as -<tt>synchronize_rcu_expedited()</tt> are not supported, -due to the fact that current implementations block CPU-hotplug -operations, which could result in deadlock. +including even synchronous grace-period operations such as +<tt>synchronize_rcu()</tt> and <tt>synchronize_rcu_expedited()</tt>. <p> -In addition, all-callback-wait operations such as +However, all-callback-wait operations such as <tt>rcu_barrier()</tt> are also not supported, due to the fact that there are phases of CPU-hotplug operations where the outgoing CPU's callbacks will not be invoked until after the CPU-hotplug operation ends, which could also result in deadlock. +Furthermore, <tt>rcu_barrier()</tt> blocks CPU-hotplug operations +during its execution, which results in another type of deadlock +when invoked from a CPU-hotplug notifier. <h3><a name="Scheduler and RCU">Scheduler and RCU</a></h3> @@ -2455,11 +2517,7 @@ It is similarly socially unacceptable to interrupt an <tt>nohz_full</tt> CPU running in userspace. RCU must therefore track <tt>nohz_full</tt> userspace execution. -And in -<a href="https://lwn.net/Articles/558284/"><tt>CONFIG_NO_HZ_FULL_SYSIDLE=y</tt></a> -kernels, RCU must separately track idle CPUs on the one hand and -CPUs that are either idle or executing in userspace on the other. -In both cases, RCU must be able to sample state at two points in +RCU must therefore be able to sample state at two points in time, and be able to determine whether or not some other CPU spent any time idle and/or executing in userspace. @@ -2864,6 +2922,41 @@ API, which, in combination with <tt>srcu_read_unlock()</tt>, guarantees a full memory barrier. <p> +Also unlike other RCU flavors, SRCU's callbacks-wait function +<tt>srcu_barrier()</tt> may be invoked from CPU-hotplug notifiers, +though this is not necessarily a good idea. +The reason that this is possible is that SRCU is insensitive +to whether or not a CPU is online, which means that <tt>srcu_barrier()</tt> +need not exclude CPU-hotplug operations. + +<p> +SRCU also differs from other RCU flavors in that SRCU's expedited and +non-expedited grace periods are implemented by the same mechanism. +This means that in the current SRCU implementation, expediting a +future grace period has the side effect of expediting all prior +grace periods that have not yet completed. +(But please note that this is a property of the current implementation, +not necessarily of future implementations.) +In addition, if SRCU has been idle for longer than the interval +specified by the <tt>srcutree.exp_holdoff</tt> kernel boot parameter +(25 microseconds by default), +and if a <tt>synchronize_srcu()</tt> invocation ends this idle period, +that invocation will be automatically expedited. + +<p> +As of v4.12, SRCU's callbacks are maintained per-CPU, eliminating +a locking bottleneck present in prior kernel versions. +Although this will allow users to put much heavier stress on +<tt>call_srcu()</tt>, it is important to note that SRCU does not +yet take any special steps to deal with callback flooding. +So if you are posting (say) 10,000 SRCU callbacks per second per CPU, +you are probably totally OK, but if you intend to post (say) 1,000,000 +SRCU callbacks per second per CPU, please run some tests first. +SRCU just might need a few adjustment to deal with that sort of load. +Of course, your mileage may vary based on the speed of your CPUs and +the size of your memory. + +<p> The <a href="https://lwn.net/Articles/609973/#RCU Per-Flavor API Table">SRCU API</a> includes @@ -3021,8 +3114,8 @@ to do some redesign to avoid this scalability problem. <p> RCU disables CPU hotplug in a few places, perhaps most notably in the -expedited grace-period and <tt>rcu_barrier()</tt> operations. -If there is a strong reason to use expedited grace periods in CPU-hotplug +<tt>rcu_barrier()</tt> operations. +If there is a strong reason to use <tt>rcu_barrier()</tt> in CPU-hotplug notifiers, it will be necessary to avoid disabling CPU hotplug. This would introduce some complexity, so there had better be a <i>very</i> good reason. @@ -3096,9 +3189,5 @@ Andy Lutomirski for their help in rendering this article human readable, and to Michelle Rankin for her support of this effort. Other contributions are acknowledged in the Linux kernel's git archive. -The cartoon is copyright (c) 2013 by Melissa Broussard, -and is provided -under the terms of the Creative Commons Attribution-Share Alike 3.0 -United States license. </body></html> diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 877947130ebe..6beda556faf3 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -413,11 +413,11 @@ over a rather long period of time, but improvements are always welcome! read-side critical sections. It is the responsibility of the RCU update-side primitives to deal with this. -17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the - __rcu sparse checks (enabled by CONFIG_SPARSE_RCU_POINTER) to - validate your RCU code. These can help find problems as follows: +17. Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the + __rcu sparse checks to validate your RCU code. These can help + find problems as follows: - CONFIG_PROVE_RCU: check that accesses to RCU-protected data + CONFIG_PROVE_LOCKING: check that accesses to RCU-protected data structures are carried out under the proper RCU read-side critical section, while holding the right combination of locks, or whatever other conditions diff --git a/Documentation/RCU/rcu_dereference.txt b/Documentation/RCU/rcu_dereference.txt index c0bf2441a2ba..b2a613f16d74 100644 --- a/Documentation/RCU/rcu_dereference.txt +++ b/Documentation/RCU/rcu_dereference.txt @@ -138,6 +138,15 @@ o Be very careful about comparing pointers obtained from This sort of comparison occurs frequently when scanning RCU-protected circular linked lists. + Note that if checks for being within an RCU read-side + critical section are not required and the pointer is never + dereferenced, rcu_access_pointer() should be used in place + of rcu_dereference(). The rcu_access_pointer() primitive + does not require an enclosing read-side critical section, + and also omits the smp_read_barrier_depends() included in + rcu_dereference(), which in turn should provide a small + performance gain in some CPUs (e.g., the DEC Alpha). + o The comparison is against a pointer that references memory that was initialized "a long time ago." The reason this is safe is that even if misordering occurs, the diff --git a/Documentation/RCU/rculist_nulls.txt b/Documentation/RCU/rculist_nulls.txt index 18f9651ff23d..8151f0195f76 100644 --- a/Documentation/RCU/rculist_nulls.txt +++ b/Documentation/RCU/rculist_nulls.txt @@ -1,5 +1,5 @@ Using hlist_nulls to protect read-mostly linked lists and -objects using SLAB_DESTROY_BY_RCU allocations. +objects using SLAB_TYPESAFE_BY_RCU allocations. Please read the basics in Documentation/RCU/listRCU.txt @@ -7,7 +7,7 @@ Using special makers (called 'nulls') is a convenient way to solve following problem : A typical RCU linked list managing objects which are -allocated with SLAB_DESTROY_BY_RCU kmem_cache can +allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can use following algos : 1) Lookup algo @@ -96,7 +96,7 @@ unlock_chain(); // typically a spin_unlock() 3) Remove algo -------------- Nothing special here, we can use a standard RCU hlist deletion. -But thanks to SLAB_DESTROY_BY_RCU, beware a deleted object can be reused +But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused very very fast (before the end of RCU grace period) if (put_last_reference_on(obj) { diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index e93d04133fe7..96a3d81837e1 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -1,9 +1,102 @@ Using RCU's CPU Stall Detector -The rcu_cpu_stall_suppress module parameter enables RCU's CPU stall -detector, which detects conditions that unduly delay RCU grace periods. -This module parameter enables CPU stall detection by default, but -may be overridden via boot-time parameter or at runtime via sysfs. +This document first discusses what sorts of issues RCU's CPU stall +detector can locate, and then discusses kernel parameters and Kconfig +options that can be used to fine-tune the detector's operation. Finally, +this document explains the stall detector's "splat" format. + + +What Causes RCU CPU Stall Warnings? + +So your kernel printed an RCU CPU stall warning. The next question is +"What caused it?" The following problems can result in RCU CPU stall +warnings: + +o A CPU looping in an RCU read-side critical section. + +o A CPU looping with interrupts disabled. + +o A CPU looping with preemption disabled. This condition can + result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh + stalls. + +o A CPU looping with bottom halves disabled. This condition can + result in RCU-sched and RCU-bh stalls. + +o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the + kernel without invoking schedule(). Note that cond_resched() + does not necessarily prevent RCU CPU stall warnings. Therefore, + if the looping in the kernel is really expected and desirable + behavior, you might need to replace some of the cond_resched() + calls with calls to cond_resched_rcu_qs(). + +o Booting Linux using a console connection that is too slow to + keep up with the boot-time console-message rate. For example, + a 115Kbaud serial console can be -way- too slow to keep up + with boot-time message rates, and will frequently result in + RCU CPU stall warning messages. Especially if you have added + debug printk()s. + +o Anything that prevents RCU's grace-period kthreads from running. + This can result in the "All QSes seen" console-log message. + This message will include information on when the kthread last + ran and how often it should be expected to run. + +o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might + happen to preempt a low-priority task in the middle of an RCU + read-side critical section. This is especially damaging if + that low-priority task is not permitted to run on any other CPU, + in which case the next RCU grace period can never complete, which + will eventually cause the system to run out of memory and hang. + While the system is in the process of running itself out of + memory, you might see stall-warning messages. + +o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that + is running at a higher priority than the RCU softirq threads. + This will prevent RCU callbacks from ever being invoked, + and in a CONFIG_PREEMPT_RCU kernel will further prevent + RCU grace periods from ever completing. Either way, the + system will eventually run out of memory and hang. In the + CONFIG_PREEMPT_RCU case, you might see stall-warning + messages. + +o A hardware or software issue shuts off the scheduler-clock + interrupt on a CPU that is not in dyntick-idle mode. This + problem really has happened, and seems to be most likely to + result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels. + +o A bug in the RCU implementation. + +o A hardware failure. This is quite unlikely, but has occurred + at least once in real life. A CPU failed in a running system, + becoming unresponsive, but not causing an immediate crash. + This resulted in a series of RCU CPU stall warnings, eventually + leading the realization that the CPU had failed. + +The RCU, RCU-sched, RCU-bh, and RCU-tasks implementations have CPU stall +warning. Note that SRCU does -not- have CPU stall warnings. Please note +that RCU only detects CPU stalls when there is a grace period in progress. +No grace period, no CPU stall warnings. + +To diagnose the cause of the stall, inspect the stack traces. +The offending function will usually be near the top of the stack. +If you have a series of stall warnings from a single extended stall, +comparing the stack traces can often help determine where the stall +is occurring, which will usually be in the function nearest the top of +that portion of the stack which remains the same from trace to trace. +If you can reliably trigger the stall, ftrace can be quite helpful. + +RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE +and with RCU's event tracing. For information on RCU's event tracing, +see include/trace/events/rcu.h. + + +Fine-Tuning the RCU CPU Stall Detector + +The rcuupdate.rcu_cpu_stall_suppress module parameter disables RCU's +CPU stall detector, which detects conditions that unduly delay RCU grace +periods. This module parameter enables CPU stall detection by default, +but may be overridden via boot-time parameter or at runtime via sysfs. The stall detector's idea of what constitutes "unduly delayed" is controlled by a set of kernel configuration variables and cpp macros: @@ -56,6 +149,9 @@ rcupdate.rcu_task_stall_timeout And continues with the output of sched_show_task() for each task stalling the current RCU-tasks grace period. + +Interpreting RCU's CPU Stall-Detector "Splats" + For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling, it will print a message similar to the following: @@ -178,89 +274,3 @@ grace period is in flight. It is entirely possible to see stall warnings from normal and from expedited grace periods at about the same time from the same run. - - -What Causes RCU CPU Stall Warnings? - -So your kernel printed an RCU CPU stall warning. The next question is -"What caused it?" The following problems can result in RCU CPU stall -warnings: - -o A CPU looping in an RCU read-side critical section. - -o A CPU looping with interrupts disabled. This condition can - result in RCU-sched and RCU-bh stalls. - -o A CPU looping with preemption disabled. This condition can - result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh - stalls. - -o A CPU looping with bottom halves disabled. This condition can - result in RCU-sched and RCU-bh stalls. - -o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the - kernel without invoking schedule(). Note that cond_resched() - does not necessarily prevent RCU CPU stall warnings. Therefore, - if the looping in the kernel is really expected and desirable - behavior, you might need to replace some of the cond_resched() - calls with calls to cond_resched_rcu_qs(). - -o Booting Linux using a console connection that is too slow to - keep up with the boot-time console-message rate. For example, - a 115Kbaud serial console can be -way- too slow to keep up - with boot-time message rates, and will frequently result in - RCU CPU stall warning messages. Especially if you have added - debug printk()s. - -o Anything that prevents RCU's grace-period kthreads from running. - This can result in the "All QSes seen" console-log message. - This message will include information on when the kthread last - ran and how often it should be expected to run. - -o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might - happen to preempt a low-priority task in the middle of an RCU - read-side critical section. This is especially damaging if - that low-priority task is not permitted to run on any other CPU, - in which case the next RCU grace period can never complete, which - will eventually cause the system to run out of memory and hang. - While the system is in the process of running itself out of - memory, you might see stall-warning messages. - -o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that - is running at a higher priority than the RCU softirq threads. - This will prevent RCU callbacks from ever being invoked, - and in a CONFIG_PREEMPT_RCU kernel will further prevent - RCU grace periods from ever completing. Either way, the - system will eventually run out of memory and hang. In the - CONFIG_PREEMPT_RCU case, you might see stall-warning - messages. - -o A hardware or software issue shuts off the scheduler-clock - interrupt on a CPU that is not in dyntick-idle mode. This - problem really has happened, and seems to be most likely to - result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels. - -o A bug in the RCU implementation. - -o A hardware failure. This is quite unlikely, but has occurred - at least once in real life. A CPU failed in a running system, - becoming unresponsive, but not causing an immediate crash. - This resulted in a series of RCU CPU stall warnings, eventually - leading the realization that the CPU had failed. - -The RCU, RCU-sched, RCU-bh, and RCU-tasks implementations have CPU stall -warning. Note that SRCU does -not- have CPU stall warnings. Please note -that RCU only detects CPU stalls when there is a grace period in progress. -No grace period, no CPU stall warnings. - -To diagnose the cause of the stall, inspect the stack traces. -The offending function will usually be near the top of the stack. -If you have a series of stall warnings from a single extended stall, -comparing the stack traces can often help determine where the stall -is occurring, which will usually be in the function nearest the top of -that portion of the stack which remains the same from trace to trace. -If you can reliably trigger the stall, ftrace can be quite helpful. - -RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE -and with RCU's event tracing. For information on RCU's event tracing, -see include/trace/events/rcu.h. diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt deleted file mode 100644 index 6549012033f9..000000000000 --- a/Documentation/RCU/trace.txt +++ /dev/null @@ -1,535 +0,0 @@ -CONFIG_RCU_TRACE debugfs Files and Formats - - -The rcutree and rcutiny implementations of RCU provide debugfs trace -output that summarizes counters and state. This information is useful for -debugging RCU itself, and can sometimes also help to debug abuses of RCU. -The following sections describe the debugfs files and formats, first -for rcutree and next for rcutiny. - - -CONFIG_TREE_RCU and CONFIG_PREEMPT_RCU debugfs Files and Formats - -These implementations of RCU provide several debugfs directories under the -top-level directory "rcu": - -rcu/rcu_bh -rcu/rcu_preempt -rcu/rcu_sched - -Each directory contains files for the corresponding flavor of RCU. -Note that rcu/rcu_preempt is only present for CONFIG_PREEMPT_RCU. -For CONFIG_TREE_RCU, the RCU flavor maps onto the RCU-sched flavor, -so that activity for both appears in rcu/rcu_sched. - -In addition, the following file appears in the top-level directory: -rcu/rcutorture. This file displays rcutorture test progress. The output -of "cat rcu/rcutorture" looks as follows: - -rcutorture test sequence: 0 (test in progress) -rcutorture update version number: 615 - -The first line shows the number of rcutorture tests that have completed -since boot. If a test is currently running, the "(test in progress)" -string will appear as shown above. The second line shows the number of -update cycles that the current test has started, or zero if there is -no test in progress. - - -Within each flavor directory (rcu/rcu_bh, rcu/rcu_sched, and possibly -also rcu/rcu_preempt) the following files will be present: - -rcudata: - Displays fields in struct rcu_data. -rcuexp: - Displays statistics for expedited grace periods. -rcugp: - Displays grace-period counters. -rcuhier: - Displays the struct rcu_node hierarchy. -rcu_pending: - Displays counts of the reasons rcu_pending() decided that RCU had - work to do. -rcuboost: - Displays RCU boosting statistics. Only present if - CONFIG_RCU_BOOST=y. - -The output of "cat rcu/rcu_preempt/rcudata" looks as follows: - - 0!c=30455 g=30456 cnq=1/0:1 dt=126535/140000000000000/0 df=2002 of=4 ql=0/0 qs=N... b=10 ci=74572 nci=0 co=1131 ca=716 - 1!c=30719 g=30720 cnq=1/0:0 dt=132007/140000000000000/0 df=1874 of=10 ql=0/0 qs=N... b=10 ci=123209 nci=0 co=685 ca=982 - 2!c=30150 g=30151 cnq=1/1:1 dt=138537/140000000000000/0 df=1707 of=8 ql=0/0 qs=N... b=10 ci=80132 nci=0 co=1328 ca=1458 - 3 c=31249 g=31250 cnq=1/1:0 dt=107255/140000000000000/0 df=1749 of=6 ql=0/450 qs=NRW. b=10 ci=151700 nci=0 co=509 ca=622 - 4!c=29502 g=29503 cnq=1/0:1 dt=83647/140000000000000/0 df=965 of=5 ql=0/0 qs=N... b=10 ci=65643 nci=0 co=1373 ca=1521 - 5 c=31201 g=31202 cnq=1/0:1 dt=70422/0/0 df=535 of=7 ql=0/0 qs=.... b=10 ci=58500 nci=0 co=764 ca=698 - 6!c=30253 g=30254 cnq=1/0:1 dt=95363/140000000000000/0 df=780 of=5 ql=0/0 qs=N... b=10 ci=100607 nci=0 co=1414 ca=1353 - 7 c=31178 g=31178 cnq=1/0:0 dt=91536/0/0 df=547 of=4 ql=0/0 qs=.... b=10 ci=109819 nci=0 co=1115 ca=969 - -This file has one line per CPU, or eight for this 8-CPU system. -The fields are as follows: - -o The number at the beginning of each line is the CPU number. - CPUs numbers followed by an exclamation mark are offline, - but have been online at least once since boot. There will be - no output for CPUs that have never been online, which can be - a good thing in the surprisingly common case where NR_CPUS is - substantially larger than the number of actual CPUs. - -o "c" is the count of grace periods that this CPU believes have - completed. Offlined CPUs and CPUs in dynticks idle mode may lag - quite a ways behind, for example, CPU 4 under "rcu_sched" above, - which has been offline through 16 RCU grace periods. It is not - unusual to see offline CPUs lagging by thousands of grace periods. - Note that although the grace-period number is an unsigned long, - it is printed out as a signed long to allow more human-friendly - representation near boot time. - -o "g" is the count of grace periods that this CPU believes have - started. Again, offlined CPUs and CPUs in dynticks idle mode - may lag behind. If the "c" and "g" values are equal, this CPU - has already reported a quiescent state for the last RCU grace - period that it is aware of, otherwise, the CPU believes that it - owes RCU a quiescent state. - -o "pq" indicates that this CPU has passed through a quiescent state - for the current grace period. It is possible for "pq" to be - "1" and "c" different than "g", which indicates that although - the CPU has passed through a quiescent state, either (1) this - CPU has not yet reported that fact, (2) some other CPU has not - yet reported for this grace period, or (3) both. - -o "qp" indicates that RCU still expects a quiescent state from - this CPU. Offlined CPUs and CPUs in dyntick idle mode might - well have qp=1, which is OK: RCU is still ignoring them. - -o "dt" is the current value of the dyntick counter that is incremented - when entering or leaving idle, either due to a context switch or - due to an interrupt. This number is even if the CPU is in idle - from RCU's viewpoint and odd otherwise. The number after the - first "/" is the interrupt nesting depth when in idle state, - or a large number added to the interrupt-nesting depth when - running a non-idle task. Some architectures do not accurately - count interrupt nesting when running in non-idle kernel context, - which can result in interesting anomalies such as negative - interrupt-nesting levels. The number after the second "/" - is the NMI nesting depth. - -o "df" is the number of times that some other CPU has forced a - quiescent state on behalf of this CPU due to this CPU being in - idle state. - -o "of" is the number of times that some other CPU has forced a - quiescent state on behalf of this CPU due to this CPU being - offline. In a perfect world, this might never happen, but it - turns out that offlining and onlining a CPU can take several grace - periods, and so there is likely to be an extended period of time - when RCU believes that the CPU is online when it really is not. - Please note that erring in the other direction (RCU believing a - CPU is offline when it is really alive and kicking) is a fatal - error, so it makes sense to err conservatively. - -o "ql" is the number of RCU callbacks currently residing on - this CPU. The first number is the number of "lazy" callbacks - that are known to RCU to only be freeing memory, and the number - after the "/" is the total number of callbacks, lazy or not. - These counters count callbacks regardless of what phase of - grace-period processing that they are in (new, waiting for - grace period to start, waiting for grace period to end, ready - to invoke). - -o "qs" gives an indication of the state of the callback queue - with four characters: - - "N" Indicates that there are callbacks queued that are not - ready to be handled by the next grace period, and thus - will be handled by the grace period following the next - one. - - "R" Indicates that there are callbacks queued that are - ready to be handled by the next grace period. - - "W" Indicates that there are callbacks queued that are - waiting on the current grace period. - - "D" Indicates that there are callbacks queued that have - already been handled by a prior grace period, and are - thus waiting to be invoked. Note that callbacks in - the process of being invoked are not counted here. - Callbacks in the process of being invoked are those - that have been removed from the rcu_data structures - queues by rcu_do_batch(), but which have not yet been - invoked. - - If there are no callbacks in a given one of the above states, - the corresponding character is replaced by ".". - -o "b" is the batch limit for this CPU. If more than this number - of RCU callbacks is ready to invoke, then the remainder will - be deferred. - -o "ci" is the number of RCU callbacks that have been invoked for - this CPU. Note that ci+nci+ql is the number of callbacks that have - been registered in absence of CPU-hotplug activity. - -o "nci" is the number of RCU callbacks that have been offloaded from - this CPU. This will always be zero unless the kernel was built - with CONFIG_RCU_NOCB_CPU=y and the "rcu_nocbs=" kernel boot - parameter was specified. - -o "co" is the number of RCU callbacks that have been orphaned due to - this CPU going offline. These orphaned callbacks have been moved - to an arbitrarily chosen online CPU. - -o "ca" is the number of RCU callbacks that have been adopted by this - CPU due to other CPUs going offline. Note that ci+co-ca+ql is - the number of RCU callbacks registered on this CPU. - - -Kernels compiled with CONFIG_RCU_BOOST=y display the following from -/debug/rcu/rcu_preempt/rcudata: - - 0!c=12865 g=12866 cnq=1/0:1 dt=83113/140000000000000/0 df=288 of=11 ql=0/0 qs=N... kt=0/O ktl=944 b=10 ci=60709 nci=0 co=748 ca=871 - 1 c=14407 g=14408 cnq=1/0:0 dt=100679/140000000000000/0 df=378 of=7 ql=0/119 qs=NRW. kt=0/W ktl=9b6 b=10 ci=109740 nci=0 co=589 ca=485 - 2 c=14407 g=14408 cnq=1/0:0 dt=105486/0/0 df=90 of=9 ql=0/89 qs=NRW. kt=0/W ktl=c0c b=10 ci=83113 nci=0 co=533 ca=490 - 3 c=14407 g=14408 cnq=1/0:0 dt=107138/0/0 df=142 of=8 ql=0/188 qs=NRW. kt=0/W ktl=b96 b=10 ci=121114 nci=0 co=426 ca=290 - 4 c=14405 g=14406 cnq=1/0:1 dt=50238/0/0 df=706 of=7 ql=0/0 qs=.... kt=0/W ktl=812 b=10 ci=34929 nci=0 co=643 ca=114 - 5!c=14168 g=14169 cnq=1/0:0 dt=45465/140000000000000/0 df=161 of=11 ql=0/0 qs=N... kt=0/O ktl=b4d b=10 ci=47712 nci=0 co=677 ca=722 - 6 c=14404 g=14405 cnq=1/0:0 dt=59454/0/0 df=94 of=6 ql=0/0 qs=.... kt=0/W ktl=e57 b=10 ci=55597 nci=0 co=701 ca=811 - 7 c=14407 g=14408 cnq=1/0:1 dt=68850/0/0 df=31 of=8 ql=0/0 qs=.... kt=0/W ktl=14bd b=10 ci=77475 nci=0 co=508 ca=1042 - -This is similar to the output discussed above, but contains the following -additional fields: - -o "kt" is the per-CPU kernel-thread state. The digit preceding - the first slash is zero if there is no work pending and 1 - otherwise. The character between the first pair of slashes is - as follows: - - "S" The kernel thread is stopped, in other words, all - CPUs corresponding to this rcu_node structure are - offline. - - "R" The kernel thread is running. - - "W" The kernel thread is waiting because there is no work - for it to do. - - "O" The kernel thread is waiting because it has been - forced off of its designated CPU or because its - ->cpus_allowed mask permits it to run on other than - its designated CPU. - - "Y" The kernel thread is yielding to avoid hogging CPU. - - "?" Unknown value, indicates a bug. - - The number after the final slash is the CPU that the kthread - is actually running on. - - This field is displayed only for CONFIG_RCU_BOOST kernels. - -o "ktl" is the low-order 16 bits (in hexadecimal) of the count of - the number of times that this CPU's per-CPU kthread has gone - through its loop servicing invoke_rcu_cpu_kthread() requests. - - This field is displayed only for CONFIG_RCU_BOOST kernels. - - -The output of "cat rcu/rcu_preempt/rcuexp" looks as follows: - -s=21872 wd1=0 wd2=0 wd3=5 enq=0 sc=21872 - -These fields are as follows: - -o "s" is the sequence number, with an odd number indicating that - an expedited grace period is in progress. - -o "wd1", "wd2", and "wd3" are the number of times that an attempt - to start an expedited grace period found that someone else had - completed an expedited grace period that satisfies the attempted - request. "Our work is done." - -o "enq" is the number of quiescent states still outstanding. - -o "sc" is the number of times that the attempt to start a - new expedited grace period succeeded. - - -The output of "cat rcu/rcu_preempt/rcugp" looks as follows: - -completed=31249 gpnum=31250 age=1 max=18 - -These fields are taken from the rcu_state structure, and are as follows: - -o "completed" is the number of grace periods that have completed. - It is comparable to the "c" field from rcu/rcudata in that a - CPU whose "c" field matches the value of "completed" is aware - that the corresponding RCU grace period has completed. - -o "gpnum" is the number of grace periods that have started. It is - similarly comparable to the "g" field from rcu/rcudata in that - a CPU whose "g" field matches the value of "gpnum" is aware that - the corresponding RCU grace period has started. - - If these two fields are equal, then there is no grace period - in progress, in other words, RCU is idle. On the other hand, - if the two fields differ (as they are above), then an RCU grace - period is in progress. - -o "age" is the number of jiffies that the current grace period - has extended for, or zero if there is no grace period currently - in effect. - -o "max" is the age in jiffies of the longest-duration grace period - thus far. - -The output of "cat rcu/rcu_preempt/rcuhier" looks as follows: - -c=14407 g=14408 s=0 jfq=2 j=c863 nfqs=12040/nfqsng=0(12040) fqlh=1051 oqlen=0/0 -3/3 ..>. 0:7 ^0 -e/e ..>. 0:3 ^0 d/d ..>. 4:7 ^1 - -The fields are as follows: - -o "c" is exactly the same as "completed" under rcu/rcu_preempt/rcugp. - -o "g" is exactly the same as "gpnum" under rcu/rcu_preempt/rcugp. - -o "s" is the current state of the force_quiescent_state() - state machine. - -o "jfq" is the number of jiffies remaining for this grace period - before force_quiescent_state() is invoked to help push things - along. Note that CPUs in idle mode throughout the grace period - will not report on their own, but rather must be check by some - other CPU via force_quiescent_state(). - -o "j" is the low-order four hex digits of the jiffies counter. - Yes, Paul did run into a number of problems that turned out to - be due to the jiffies counter no longer counting. Why do you ask? - -o "nfqs" is the number of calls to force_quiescent_state() since - boot. - -o "nfqsng" is the number of useless calls to force_quiescent_state(), - where there wasn't actually a grace period active. This can - no longer happen due to grace-period processing being pushed - into a kthread. The number in parentheses is the difference - between "nfqs" and "nfqsng", or the number of times that - force_quiescent_state() actually did some real work. - -o "fqlh" is the number of calls to force_quiescent_state() that - exited immediately (without even being counted in nfqs above) - due to contention on ->fqslock. - -o Each element of the form "3/3 ..>. 0:7 ^0" represents one rcu_node - structure. Each line represents one level of the hierarchy, - from root to leaves. It is best to think of the rcu_data - structures as forming yet another level after the leaves. - Note that there might be either one, two, three, or even four - levels of rcu_node structures, depending on the relationship - between CONFIG_RCU_FANOUT, CONFIG_RCU_FANOUT_LEAF (possibly - adjusted using the rcu_fanout_leaf kernel boot parameter), and - CONFIG_NR_CPUS (possibly adjusted using the nr_cpu_ids count of - possible CPUs for the booting hardware). - - o The numbers separated by the "/" are the qsmask followed - by the qsmaskinit. The qsmask will have one bit - set for each entity in the next lower level that has - not yet checked in for the current grace period ("e" - indicating CPUs 5, 6, and 7 in the example above). - The qsmaskinit will have one bit for each entity that is - currently expected to check in during each grace period. - The value of qsmaskinit is assigned to that of qsmask - at the beginning of each grace period. - - o The characters separated by the ">" indicate the state - of the blocked-tasks lists. A "G" preceding the ">" - indicates that at least one task blocked in an RCU - read-side critical section blocks the current grace - period, while a "E" preceding the ">" indicates that - at least one task blocked in an RCU read-side critical - section blocks the current expedited grace period. - A "T" character following the ">" indicates that at - least one task is blocked within an RCU read-side - critical section, regardless of whether any current - grace period (expedited or normal) is inconvenienced. - A "." character appears if the corresponding condition - does not hold, so that "..>." indicates that no tasks - are blocked. In contrast, "GE>T" indicates maximal - inconvenience from blocked tasks. CONFIG_TREE_RCU - builds of the kernel will always show "..>.". - - o The numbers separated by the ":" are the range of CPUs - served by this struct rcu_node. This can be helpful - in working out how the hierarchy is wired together. - - For example, the example rcu_node structure shown above - has "0:7", indicating that it covers CPUs 0 through 7. - - o The number after the "^" indicates the bit in the - next higher level rcu_node structure that this rcu_node - structure corresponds to. For example, the "d/d ..>. 4:7 - ^1" has a "1" in this position, indicating that it - corresponds to the "1" bit in the "3" shown in the - "3/3 ..>. 0:7 ^0" entry on the next level up. - - -The output of "cat rcu/rcu_sched/rcu_pending" looks as follows: - - 0!np=26111 qsp=29 rpq=5386 cbr=1 cng=570 gpc=3674 gps=577 nn=15903 ndw=0 - 1!np=28913 qsp=35 rpq=6097 cbr=1 cng=448 gpc=3700 gps=554 nn=18113 ndw=0 - 2!np=32740 qsp=37 rpq=6202 cbr=0 cng=476 gpc=4627 gps=546 nn=20889 ndw=0 - 3 np=23679 qsp=22 rpq=5044 cbr=1 cng=415 gpc=3403 gps=347 nn=14469 ndw=0 - 4!np=30714 qsp=4 rpq=5574 cbr=0 cng=528 gpc=3931 gps=639 nn=20042 ndw=0 - 5 np=28910 qsp=2 rpq=5246 cbr=0 cng=428 gpc=4105 gps=709 nn=18422 ndw=0 - 6!np=38648 qsp=5 rpq=7076 cbr=0 cng=840 gpc=4072 gps=961 nn=25699 ndw=0 - 7 np=37275 qsp=2 rpq=6873 cbr=0 cng=868 gpc=3416 gps=971 nn=25147 ndw=0 - -The fields are as follows: - -o The leading number is the CPU number, with "!" indicating - an offline CPU. - -o "np" is the number of times that __rcu_pending() has been invoked - for the corresponding flavor of RCU. - -o "qsp" is the number of times that the RCU was waiting for a - quiescent state from this CPU. - -o "rpq" is the number of times that the CPU had passed through - a quiescent state, but not yet reported it to RCU. - -o "cbr" is the number of times that this CPU had RCU callbacks - that had passed through a grace period, and were thus ready - to be invoked. - -o "cng" is the number of times that this CPU needed another - grace period while RCU was idle. - -o "gpc" is the number of times that an old grace period had - completed, but this CPU was not yet aware of it. - -o "gps" is the number of times that a new grace period had started, - but this CPU was not yet aware of it. - -o "ndw" is the number of times that a wakeup of an rcuo - callback-offload kthread had to be deferred in order to avoid - deadlock. - -o "nn" is the number of times that this CPU needed nothing. - - -The output of "cat rcu/rcuboost" looks as follows: - -0:3 tasks=.... kt=W ntb=0 neb=0 nnb=0 j=c864 bt=c894 - balk: nt=0 egt=4695 bt=0 nb=0 ny=56 nos=0 -4:7 tasks=.... kt=W ntb=0 neb=0 nnb=0 j=c864 bt=c894 - balk: nt=0 egt=6541 bt=0 nb=0 ny=126 nos=0 - -This information is output only for rcu_preempt. Each two-line entry -corresponds to a leaf rcu_node structure. The fields are as follows: - -o "n:m" is the CPU-number range for the corresponding two-line - entry. In the sample output above, the first entry covers - CPUs zero through three and the second entry covers CPUs four - through seven. - -o "tasks=TNEB" gives the state of the various segments of the - rnp->blocked_tasks list: - - "T" This indicates that there are some tasks that blocked - while running on one of the corresponding CPUs while - in an RCU read-side critical section. - - "N" This indicates that some of the blocked tasks are preventing - the current normal (non-expedited) grace period from - completing. - - "E" This indicates that some of the blocked tasks are preventing - the current expedited grace period from completing. - - "B" This indicates that some of the blocked tasks are in - need of RCU priority boosting. - - Each character is replaced with "." if the corresponding - condition does not hold. - -o "kt" is the state of the RCU priority-boosting kernel - thread associated with the corresponding rcu_node structure. - The state can be one of the following: - - "S" The kernel thread is stopped, in other words, all - CPUs corresponding to this rcu_node structure are - offline. - - "R" The kernel thread is running. - - "W" The kernel thread is waiting because there is no work - for it to do. - - "Y" The kernel thread is yielding to avoid hogging CPU. - - "?" Unknown value, indicates a bug. - -o "ntb" is the number of tasks boosted. - -o "neb" is the number of tasks boosted in order to complete an - expedited grace period. - -o "nnb" is the number of tasks boosted in order to complete a - normal (non-expedited) grace period. When boosting a task - that was blocking both an expedited and a normal grace period, - it is counted against the expedited total above. - -o "j" is the low-order 16 bits of the jiffies counter in - hexadecimal. - -o "bt" is the low-order 16 bits of the value that the jiffies - counter will have when we next start boosting, assuming that - the current grace period does not end beforehand. This is - also in hexadecimal. - -o "balk: nt" counts the number of times we didn't boost (in - other words, we balked) even though it was time to boost because - there were no blocked tasks to boost. This situation occurs - when there is one blocked task on one rcu_node structure and - none on some other rcu_node structure. - -o "egt" counts the number of times we balked because although - there were blocked tasks, none of them were blocking the - current grace period, whether expedited or otherwise. - -o "bt" counts the number of times we balked because boosting - had already been initiated for the current grace period. - -o "nb" counts the number of times we balked because there - was at least one task blocking the current non-expedited grace - period that never had blocked. If it is already running, it - just won't help to boost its priority! - -o "ny" counts the number of times we balked because it was - not yet time to start boosting. - -o "nos" counts the number of times we balked for other - reasons, e.g., the grace period ended first. - - -CONFIG_TINY_RCU debugfs Files and Formats - -These implementations of RCU provides a single debugfs file under the -top-level directory RCU, namely rcu/rcudata, which displays fields in -rcu_bh_ctrlblk and rcu_sched_ctrlblk. - -The output of "cat rcu/rcudata" is as follows: - -rcu_sched: qlen: 0 -rcu_bh: qlen: 0 - -This is split into rcu_sched and rcu_bh sections. The field is as -follows: - -o "qlen" is the number of RCU callbacks currently waiting either - for an RCU grace period or waiting to be invoked. This is the - only field present for rcu_sched and rcu_bh, due to the - short-circuiting of grace period in those two cases. diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 5cbd8b2395b8..8ed6c9f6133c 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -562,7 +562,9 @@ This section presents a "toy" RCU implementation that is based on familiar locking primitives. Its overhead makes it a non-starter for real-life use, as does its lack of scalability. It is also unsuitable for realtime use, since it allows scheduling latency to "bleed" from -one read-side critical section to another. +one read-side critical section to another. It also assumes recursive +reader-writer locks: If you try this with non-recursive locks, and +you allow nested rcu_read_lock() calls, you can deadlock. However, it is probably the easiest implementation to relate to, so is a good starting point. @@ -587,20 +589,21 @@ It is extremely simple: write_unlock(&rcu_gp_mutex); } -[You can ignore rcu_assign_pointer() and rcu_dereference() without -missing much. But here they are anyway. And whatever you do, don't -forget about them when submitting patches making use of RCU!] +[You can ignore rcu_assign_pointer() and rcu_dereference() without missing +much. But here are simplified versions anyway. And whatever you do, +don't forget about them when submitting patches making use of RCU!] - #define rcu_assign_pointer(p, v) ({ \ - smp_wmb(); \ - (p) = (v); \ - }) + #define rcu_assign_pointer(p, v) \ + ({ \ + smp_store_release(&(p), (v)); \ + }) - #define rcu_dereference(p) ({ \ - typeof(p) _________p1 = p; \ - smp_read_barrier_depends(); \ - (_________p1); \ - }) + #define rcu_dereference(p) \ + ({ \ + typeof(p) _________p1 = p; \ + smp_read_barrier_depends(); \ + (_________p1); \ + }) The rcu_read_lock() and rcu_read_unlock() primitive read-acquire @@ -925,7 +928,8 @@ d. Do you need RCU grace periods to complete even in the face e. Is your workload too update-intensive for normal use of RCU, but inappropriate for other synchronization mechanisms? - If so, consider SLAB_DESTROY_BY_RCU. But please be careful! + If so, consider SLAB_TYPESAFE_BY_RCU (which was originally + named SLAB_DESTROY_BY_RCU). But please be careful! f. Do you need read-side critical sections that are respected even though they are in the middle of the idle loop, during |